tsseg.algorithms.hidalgo package

Hidalgo — Heterogeneous Intrinsic Dimensionality Algorithm.

Description

Hidalgo performs Bayesian clustering by estimating the local intrinsic dimensionality of data manifolds. It assigns each observation to one of K_states manifolds using Gibbs sampling, a Potts-model spatial prior and nearest-neighbour distance statistics.

The algorithm is designed for high-dimensional data and is particularly suited when different states occupy manifolds of different dimensionality.

Type: state detection
Supervision: semi-supervised (K_states required)
Scope: multivariate (uses nearest-neighbour distances)

Parameters

Name

Type

Default

Description

metric

str / callable

"euclidean"

Distance metric for sklearn NearestNeighbors.

K_states

int

1

Number of manifolds / states.

zeta

float

0.8

Local homogeneity level, in \((0, 1)\).

q

int

3

Number of neighbours for local Z interaction.

n_iter

int

1000

Number of Gibbs sampling iterations.

n_replicas

int

1

Number of random restarts.

burn_in

float

0.9

Fraction of iterations discarded as burn-in.

fixed_Z

bool

False

Estimate parameters with fixed allocation Z.

use_Potts

bool

True

Enable local Potts interaction between assignments.

estimate_zeta

bool

False

Update zeta during sampling.

sampling_rate

int

10

Save samples every k iterations.

seed

int

0

Random seed.

Usage

from tsseg.algorithms import HidalgoDetector

detector = HidalgoDetector(K_states=3, n_iter=500)
states = detector.fit_predict(X)

Implementation: Adapted from aeon with numerical stability fix (log-domain sample_p). BSD 3-Clause.

Reference: Allegra, Facco, Denti, Laio & Mira (2020), Data segmentation based on the local intrinsic dimension, Scientific Reports.

Submodules

tsseg.algorithms.hidalgo.detector module

Hidalgo (Heterogeneous Intrinsic Dimensionality Algorithm) Segmentation.

class tsseg.algorithms.hidalgo.detector.HidalgoDetector(metric='euclidean', K_states=1, zeta=0.8, q=3, n_iter=1000, n_replicas=1, burn_in=0.9, fixed_Z=False, use_Potts=True, estimate_zeta=False, sampling_rate=10, a=None, b=None, c=None, f=None, seed=0)[source]

Bases: BaseSegmenter

Heteregeneous Intrinsic Dimensionality Algorithm (Hidalgo) model.

Hidalgo is a robust approach in discriminating regions with different local intrinsic dimensionality (topological feature measuring complexity). Hidalgo offers unsupervised segmentation of high-dimensional data.

Parameters:
  • metric (str, or callable, optional, default="euclidean") – directly passed to sklearn KNearestNeighbors, must be str or callable that can be passed to KNearestNeighbors distance used in the nearest neighbors part of the algorithm

  • K_states (int, optional, default=2) – number of manifolds used in algorithm

  • zeta (float, optional, default=0.8) – “local homogeneity level” used in the algorithm, see equation (4)

  • q (int, optional, default=3) – number of points for local Z interaction, “local homogeneity range” see equation (4)

  • n_iter (int, optional, default=1000) – number of Gibbs sampling iterations

  • n_replicas (int, optional, default=1) – number of random starts to run Gibbs sampling

  • burn_in (float, optional, default=0.9) – percentage of Gibbs sampling iterations discarded, “burn-in fraction”

  • fixed_Z (bool, optional, default=False) – estimate parameters with fixed z (joint posterior approximation via Gibbs) z = (z_1, …, z_K) is a latent variable introduced, where z_i = k indicates point i belongs to manifold K

  • use_Potts (bool, optional, default=True) – if using local interaction between z, see equation (4)

  • estimate_zeta (bool, optional, default=False) – update zeta in the sampling

  • sampling_rate (int, optional, default=10) – rate at which to save samples for each n_iter

  • a (np.ArrayLike, optional, default=None) – prior parameters of d, the dimensionality of manifold k

  • b (np.ArrayLike, optional, default=None) – prior parameters of d, the dimensionality of manifold k

  • c (np.ArrayLike, optional, default=None) – prior parameters of p, the probability that point belongs to manifold k

  • f (np.ArrayLike, optional, default=None) – parameters of zeta

  • seed (int, optional, default = 0) – generate random numbers with seed

References

Allegra, Michele, et al. “Data segmentation based on the local intrinsic dimension.” Scientific reports 10.1 (2020): 1-12. https://www.nature.com/articles/s41598-020-72222-0

Examples

>>> from aeon.segmentation import HidalgoSegmenter
>>> import numpy as np
>>> np.random.seed(123)
>>> X = np.random.rand(10,3)
>>> X[:6, 1:] += 10
>>> X[6:, 1:] = 0
>>> model = HidalgoSegmenter(K_states=2, burn_in=0.8, n_iter=100, seed=10)
>>> seg = model.fit_predict(X, axis=0)
>>> seg.tolist()
[1, 1, 1, 1, 1, 1, 0, 0, 0, 0]
set_fit_request(*, axis: bool | None | str = '$UNCHANGED$') HidalgoDetector

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

axis (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for axis parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, axis: bool | None | str = '$UNCHANGED$') HidalgoDetector

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

axis (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for axis parameter in predict.

Returns:

self – The updated object.

Return type:

object

Module contents