tsseg.algorithms.eagglo package
E-Agglo — energy-based agglomerative change point detection.
Description
E-Agglo is a non-parametric, hierarchical agglomerative algorithm for detecting multiple change points in multivariate time series. Neighbouring segments are sequentially merged when the merge maximises a goodness-of-fit statistic based on energy distances. Unlike classical agglomerative clustering, this procedure preserves the temporal ordering.
A divergence parameter \(\alpha\in(0,2]\) controls the distance exponent. An optional penalty function can regularise against over-segmentation.
Parameters
Name |
Type |
Default |
Description |
|---|---|---|---|
|
array-like / None |
|
Initial cluster membership. |
|
float |
|
Divergence exponent in \((0, 2]\). |
|
str / callable / None |
|
Penalty function ( |
Usage
from tsseg.algorithms import EAggloDetector
detector = EAggloDetector(alpha=1.0)
labels = detector.fit_predict(X)
Implementation: Adapted from aeon. BSD 3-Clause.
Reference: Matteson & James (2014), A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data, JASA.
Submodules
tsseg.algorithms.eagglo.detector module
E-Agglo: agglomerative clustering algorithm that preserves observation order.
- class tsseg.algorithms.eagglo.detector.EAggloDetector(member=None, alpha=1.0, penalty=None)[source]
Bases:
BaseSegmenterHierarchical agglomerative estimation of multiple change points.
E-Agglo is a non-parametric clustering approach for multivariate timeseries[1]_, where neighboring segments are sequentially merged to maximize a goodness-of-fit statistic. Unlike most general purpose agglomerative clustering algorithms, this procedure preserves the time ordering of the observations.
This method can detect distributional change within an independent sequence, and does not make any distributional assumptions (beyond the existence of an alpha-th moment). Estimation is performed in a manner that simultaneously identifies both the number and locations of change points.
This implementation is based on the aeon package.
- Parameters:
member (array_like (default=None)) – Assigns points to the initial cluster membership, therefore the first dimension should be the same as for data. If
Noneit will be initialized to dummy vector where each point is assigned to separate cluster.alpha (float (default=1.0)) – Fixed constant alpha in (0, 2] used in the divergence measure, as the alpha-th absolute moment, see equation (4) in [1].
penalty (str or callable or None (default=None)) – Function that defines a penalization of the sequence of goodness-of-fit statistic, when overfitting is a concern. If
Noneno penalty is applied. Could also be an existing penalty name, eitherlen_penaltyormean_diff_penalty.
- merged_
2D
array_likeoutlining which clusters were merged at each step.- Type:
array_like
- cluster_
1D
array_likespecifying which cluster each row of input data X belongs to.- Type:
array_like
Notes
Based on the work from [1]. Requires
numpy,pandasandnumba.source code inspired by: https://github.com/cran/ecp/blob/master/R/e_agglomerative.R
paper available at: https://www.tandfonline.com/doi/full/10.1080/01621459. 2013.849605
References
Examples
>>> import numpy as np >>> import pandas as pd >>> from tsseg.algorithms.eagglo.detector import EAggloDetector >>> rng = np.random.default_rng(0) >>> X = pd.DataFrame(rng.standard_normal((20, 2))) >>> model = EAggloDetector().fit(X) >>> model.cluster_.shape == (20,) True
- set_fit_request(*, axis: bool | None | str = '$UNCHANGED$') EAggloDetector
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- set_predict_request(*, axis: bool | None | str = '$UNCHANGED$') EAggloDetector
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.