tsseg.algorithms.ggs package

GGS — Greedy Gaussian Segmentation.

Description

GGS segments a multivariate time series by modelling each segment as i.i.d. samples from a multivariate Gaussian distribution. The algorithm maximises the total log-likelihood minus a covariance-regularisation term \(\lambda\). A greedy dynamic-programming search finds an approximate solution in linear time.

The maximum number of change points k_max is an upper bound; the algorithm may return fewer if additional splits do not improve the regularised likelihood.

Type: state detection
Supervision: unsupervised or semi-supervised
Scope: univariate and multivariate

Parameters

Name

Type

Default

Description

k_max

int

10

Maximum number of change points.

lamb

float

1.0

Regularisation \(\lambda\) (>= 0). Controls covariance shrinkage.

max_shuffles

int

250

Maximum number of shuffles during the greedy search.

verbose

bool

False

Print progress information.

random_state

int / None

None

Random seed.

Usage

from tsseg.algorithms import GreedyGaussianDetector

detector = GreedyGaussianDetector(k_max=8, lamb=1.0)
labels = detector.fit_predict(X)

Implementation: Adapted from aeon. BSD 3-Clause.

Reference: Hallac, Nystrup & Boyd (2019), Greedy Gaussian Segmentation of Multivariate Time Series, Advances in Data Analysis and Classification.

Submodules

tsseg.algorithms.ggs.detector module

Greedy Gaussian Segmentation (_GGS).

The method approximates solutions for the problem of breaking a multivariate time series into segments, where the data in each segment could be modeled as independent samples from a multivariate Gaussian distribution. It uses a dynamic programming search algorithm with a heuristic that allows finding approximate solution in linear time with respect to the data length and always yields locally optimal choice.

This module is structured with the _GGS that implements the actual segmentation algorithm and a GreedyGaussianSegmentation that interfaces the algorithm with the sklearn/aeon api. The benefit behind that design is looser coupling between the logic and the interface introduced to allow for easier changes of either part since segmentation still has an experimental nature. When making algorithm changes you probably want to look into _GGS when evolving the aeon/sklearn interface look into GreedyGaussianSegmentation. This design also allows adapting _GGS to other interfaces.

Notes

Based on the work from Hallac et al. (2019).

class tsseg.algorithms.ggs.detector.GreedyGaussianDetector(k_max=10, lamb=1.0, max_shuffles=250, verbose=False, random_state=None)[source]

Bases: BaseSegmenter

Greedy Gaussian Segmentation Estimator.

The method approximates solutions for the problem of breaking a multivariate time series into segments, where the data in each segment could be modeled as independent samples from a multivariate Gaussian distribution. It uses a dynamic programming search algorithm with a heuristic that allows finding approximate solution in linear time with respect to the data length and always yields locally optimal choice.

Greedy Gaussian Segmentation (GGS) fits a segmented gaussian model (SGM) to the data by computing the approximate solution to the combinatorial problem of finding the approximate covariance-regularized maximum log-likelihood for fixed number of change points and a reagularization strength. It follows an iterative procedure where a new breakpoint is added and then adjusting all breakpoints to (approximately) maximize the objective. It is similar to the top-down search used in other change point detection problems.

Parameters:
  • k_max (int) – Maximum number of change points to find. The number of segments is thus k+1.

  • lamb (float) – Regularization parameter lambda (>= 0), which controls the amount of (inverse) covariance regularization, see Eq (1) in [1]. Regularization is introduced to reduce issues for high-dimensional problems. Setting lamb to zero will ignore regularization, whereas large values of lambda will favour simpler models.

  • max_shuffles (int) – Maximum number of shuffles.

  • verbose (bool) – If True verbose output is enabled.

  • random_state (int | None) – Either random seed or an instance of np.random.RandomState.

change_points_

Locations of change points as integer indexes. By convention change points include the identity segmentation, i.e. first and last index + 1 values.

Type:

array_like, default=[]

_intermediate_change_points

Intermediate values of change points for each value of k = 1…k_max

Type:

List[List[int]], default=[]

_intermediate_ll

Intermediate values for log-likelihood for each value of k = 1…k_max

Type:

List[float], default=[]

Notes

Based on the work from [1].

References

Examples

>>> import numpy as np
>>> from sklearn.preprocessing import MinMaxScaler
>>> from tsseg.algorithms import GreedyGaussianDetector
>>> X = np.random.default_rng(10).standard_normal((100, 2))
>>> X_scaled = MinMaxScaler(feature_range=(0, 1)).fit_transform(X)
>>> ggs = GreedyGaussianDetector(k_max=3, max_shuffles=5)
>>> y = ggs.fit_predict(X_scaled, axis=0)
set_fit_request(*, axis: bool | None | str = '$UNCHANGED$') GreedyGaussianDetector

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

axis (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for axis parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, axis: bool | None | str = '$UNCHANGED$') GreedyGaussianDetector

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

axis (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for axis parameter in predict.

Returns:

self – The updated object.

Return type:

object

Module contents