tsseg.algorithms.eagglo package

E-Agglo — energy-based agglomerative change point detection.

Description

E-Agglo is a non-parametric, hierarchical agglomerative algorithm for detecting multiple change points in multivariate time series. Neighbouring segments are sequentially merged when the merge maximises a goodness-of-fit statistic based on energy distances. Unlike classical agglomerative clustering, this procedure preserves the temporal ordering.

A divergence parameter $\alpha\in(0,2]$ controls the distance exponent. An optional penalty function can regularise against over-segmentation.

Type: change point detection
Supervision: fully unsupervised
Scope: univariate and multivariate
Complexity: \(O(n^{2})\)
Requires: numba

Parameters

Name	Type	Default	Description
`member`	array-like / None	`None`	Initial cluster membership. `None` = one cluster per point.
`alpha`	float	`1.0`	Divergence exponent in $(0, 2]$.
`penalty`	str / callable / None	`None`	Penalty function (`"len_penalty"`, `"mean_diff_penalty"` or callable).

Usage

from tsseg.algorithms import EAggloDetector

detector = EAggloDetector(alpha=1.0)
labels = detector.fit_predict(X)

Implementation: Adapted from aeon. BSD 3-Clause.

Reference: Matteson & James (2014), A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data, JASA.

Submodules

tsseg.algorithms.eagglo.detector module

E-Agglo: agglomerative clustering algorithm that preserves observation order.

class tsseg.algorithms.eagglo.detector.EAggloDetector(member=None, alpha=1.0, penalty=None)[source]

Bases: BaseSegmenter

Hierarchical agglomerative estimation of multiple change points.

E-Agglo is a non-parametric clustering approach for multivariate timeseries[1]_, where neighboring segments are sequentially merged to maximize a goodness-of-fit statistic. Unlike most general purpose agglomerative clustering algorithms, this procedure preserves the time ordering of the observations.

This method can detect distributional change within an independent sequence, and does not make any distributional assumptions (beyond the existence of an alpha-th moment). Estimation is performed in a manner that simultaneously identifies both the number and locations of change points.

This implementation is based on the aeon package.

Parameters:

member (array_like (default=None)) – Assigns points to the initial cluster membership, therefore the first dimension should be the same as for data. If None it will be initialized to dummy vector where each point is assigned to separate cluster.
alpha (float (default=1.0)) – Fixed constant alpha in (0, 2] used in the divergence measure, as the alpha-th absolute moment, see equation (4) in [1].
penalty (str or callable or None (default=None)) – Function that defines a penalization of the sequence of goodness-of-fit statistic, when overfitting is a concern. If None no penalty is applied. Could also be an existing penalty name, either len_penalty or mean_diff_penalty.

merged_

2D array_like outlining which clusters were merged at each step.

Type:: array_like

gof_

goodness-of-fit statistic for current clustering.

Type:: float

cluster_

1D array_like specifying which cluster each row of input data X belongs to.

Type:: array_like

Notes

Based on the work from [1]. Requires numpy, pandas and numba.

source code inspired by: https://github.com/cran/ecp/blob/master/R/e_agglomerative.R
paper available at: https://www.tandfonline.com/doi/full/10.1080/01621459. 2013.849605

References

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from tsseg.algorithms.eagglo.detector import EAggloDetector
>>> rng = np.random.default_rng(0)
>>> X = pd.DataFrame(rng.standard_normal((20, 2)))
>>> model = EAggloDetector().fit(X)
>>> model.cluster_.shape == (20,)
True

set_fit_request(*, axis: bool | None | str = '$UNCHANGED$') → EAggloDetector

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: axis (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for axis parameter in fit.
Returns:: self – The updated object.
Return type:: object

set_predict_request(*, axis: bool | None | str = '$UNCHANGED$') → EAggloDetector

Configure whether metadata should be requested to be passed to the predict method.