tsseg.algorithms.hidalgo package
Hidalgo — Heterogeneous Intrinsic Dimensionality Algorithm.
Description
Hidalgo performs Bayesian clustering by estimating the local intrinsic
dimensionality of data manifolds. It assigns each observation to one of
K_states manifolds using Gibbs sampling, a Potts-model spatial prior and
nearest-neighbour distance statistics.
The algorithm is designed for high-dimensional data and is particularly suited when different states occupy manifolds of different dimensionality.
K_states required)Parameters
Name |
Type |
Default |
Description |
|---|---|---|---|
|
str / callable |
|
Distance metric for sklearn |
|
int |
|
Number of manifolds / states. |
|
float |
|
Local homogeneity level, in \((0, 1)\). |
|
int |
|
Number of neighbours for local Z interaction. |
|
int |
|
Number of Gibbs sampling iterations. |
|
int |
|
Number of random restarts. |
|
float |
|
Fraction of iterations discarded as burn-in. |
|
bool |
|
Estimate parameters with fixed allocation Z. |
|
bool |
|
Enable local Potts interaction between assignments. |
|
bool |
|
Update zeta during sampling. |
|
int |
|
Save samples every k iterations. |
|
int |
|
Random seed. |
Usage
from tsseg.algorithms import HidalgoDetector
detector = HidalgoDetector(K_states=3, n_iter=500)
states = detector.fit_predict(X)
Implementation: Adapted from aeon with numerical stability fix (log-domain
sample_p). BSD 3-Clause.
Reference: Allegra, Facco, Denti, Laio & Mira (2020), Data segmentation based on the local intrinsic dimension, Scientific Reports.
Submodules
tsseg.algorithms.hidalgo.detector module
Hidalgo (Heterogeneous Intrinsic Dimensionality Algorithm) Segmentation.
- class tsseg.algorithms.hidalgo.detector.HidalgoDetector(metric='euclidean', K_states=1, zeta=0.8, q=3, n_iter=1000, n_replicas=1, burn_in=0.9, fixed_Z=False, use_Potts=True, estimate_zeta=False, sampling_rate=10, a=None, b=None, c=None, f=None, seed=0)[source]
Bases:
BaseSegmenterHeteregeneous Intrinsic Dimensionality Algorithm (Hidalgo) model.
Hidalgo is a robust approach in discriminating regions with different local intrinsic dimensionality (topological feature measuring complexity). Hidalgo offers unsupervised segmentation of high-dimensional data.
- Parameters:
metric (str, or callable, optional, default="euclidean") – directly passed to sklearn KNearestNeighbors, must be str or callable that can be passed to KNearestNeighbors distance used in the nearest neighbors part of the algorithm
K_states (int, optional, default=2) – number of manifolds used in algorithm
zeta (float, optional, default=0.8) – “local homogeneity level” used in the algorithm, see equation (4)
q (int, optional, default=3) – number of points for local Z interaction, “local homogeneity range” see equation (4)
n_iter (int, optional, default=1000) – number of Gibbs sampling iterations
n_replicas (int, optional, default=1) – number of random starts to run Gibbs sampling
burn_in (float, optional, default=0.9) – percentage of Gibbs sampling iterations discarded, “burn-in fraction”
fixed_Z (bool, optional, default=False) – estimate parameters with fixed z (joint posterior approximation via Gibbs) z = (z_1, …, z_K) is a latent variable introduced, where z_i = k indicates point i belongs to manifold K
use_Potts (bool, optional, default=True) – if using local interaction between z, see equation (4)
estimate_zeta (bool, optional, default=False) – update zeta in the sampling
sampling_rate (int, optional, default=10) – rate at which to save samples for each n_iter
a (np.ArrayLike, optional, default=None) – prior parameters of d, the dimensionality of manifold k
b (np.ArrayLike, optional, default=None) – prior parameters of d, the dimensionality of manifold k
c (np.ArrayLike, optional, default=None) – prior parameters of p, the probability that point belongs to manifold k
f (np.ArrayLike, optional, default=None) – parameters of zeta
seed (int, optional, default = 0) – generate random numbers with seed
References
Allegra, Michele, et al. “Data segmentation based on the local intrinsic dimension.” Scientific reports 10.1 (2020): 1-12. https://www.nature.com/articles/s41598-020-72222-0
Examples
>>> from aeon.segmentation import HidalgoSegmenter >>> import numpy as np >>> np.random.seed(123) >>> X = np.random.rand(10,3) >>> X[:6, 1:] += 10 >>> X[6:, 1:] = 0 >>> model = HidalgoSegmenter(K_states=2, burn_in=0.8, n_iter=100, seed=10) >>> seg = model.fit_predict(X, axis=0) >>> seg.tolist() [1, 1, 1, 1, 1, 1, 0, 0, 0, 0]
- set_fit_request(*, axis: bool | None | str = '$UNCHANGED$') HidalgoDetector
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- set_predict_request(*, axis: bool | None | str = '$UNCHANGED$') HidalgoDetector
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.