tsseg.algorithms.ggs package

GGS — Greedy Gaussian Segmentation.

Description

GGS segments a multivariate time series by modelling each segment as i.i.d. samples from a multivariate Gaussian distribution. The algorithm maximises the total log-likelihood minus a covariance-regularisation term $\lambda$. A greedy dynamic-programming search finds an approximate solution in linear time.

The maximum number of change points k_max is an upper bound; the algorithm may return fewer if additional splits do not improve the regularised likelihood.

Type: state detection
Supervision: unsupervised or semi-supervised
Scope: univariate and multivariate

Parameters

Name	Type	Default	Description
`k_max`	int	`10`	Maximum number of change points.
`lamb`	float	`1.0`	Regularisation $\lambda$ (>= 0). Controls covariance shrinkage.
`max_shuffles`	int	`250`	Maximum number of shuffles during the greedy search.
`verbose`	bool	`False`	Print progress information.
`random_state`	int / None	`None`	Random seed.

Usage

from tsseg.algorithms import GreedyGaussianDetector

detector = GreedyGaussianDetector(k_max=8, lamb=1.0)
labels = detector.fit_predict(X)

Implementation: Adapted from aeon. BSD 3-Clause.

Reference: Hallac, Nystrup & Boyd (2019), Greedy Gaussian Segmentation of Multivariate Time Series, Advances in Data Analysis and Classification.

Submodules

tsseg.algorithms.ggs.detector module

Greedy Gaussian Segmentation (_GGS).

The method approximates solutions for the problem of breaking a multivariate time series into segments, where the data in each segment could be modeled as independent samples from a multivariate Gaussian distribution. It uses a dynamic programming search algorithm with a heuristic that allows finding approximate solution in linear time with respect to the data length and always yields locally optimal choice.

This module is structured with the _GGS that implements the actual segmentation algorithm and a GreedyGaussianSegmentation that interfaces the algorithm with the sklearn/aeon api. The benefit behind that design is looser coupling between the logic and the interface introduced to allow for easier changes of either part since segmentation still has an experimental nature. When making algorithm changes you probably want to look into _GGS when evolving the aeon/sklearn interface look into GreedyGaussianSegmentation. This design also allows adapting _GGS to other interfaces.

Notes

Based on the work from Hallac et al. (2019).

source code adapted based on: https://github.com/cvxgrp/GGS Copyright (c) 2018, Stanford University Convex Optimization Group, BSD-2
paper available at: https://stanford.edu/~boyd/papers/pdf/ggs.pdf

class tsseg.algorithms.ggs.detector.GreedyGaussianDetector(k_max=10, lamb=1.0, max_shuffles=250, verbose=False, random_state=None)[source]

Bases: BaseSegmenter

Greedy Gaussian Segmentation Estimator.

The method approximates solutions for the problem of breaking a multivariate time series into segments, where the data in each segment could be modeled as independent samples from a multivariate Gaussian distribution. It uses a dynamic programming search algorithm with a heuristic that allows finding approximate solution in linear time with respect to the data length and always yields locally optimal choice.

Greedy Gaussian Segmentation (GGS) fits a segmented gaussian model (SGM) to the data by computing the approximate solution to the combinatorial problem of finding the approximate covariance-regularized maximum log-likelihood for fixed number of change points and a reagularization strength. It follows an iterative procedure where a new breakpoint is added and then adjusting all breakpoints to (approximately) maximize the objective. It is similar to the top-down search used in other change point detection problems.

Parameters:

k_max (int) – Maximum number of change points to find. The number of segments is thus k+1.
lamb (float) – Regularization parameter lambda (>= 0), which controls the amount of (inverse) covariance regularization, see Eq (1) in [1]. Regularization is introduced to reduce issues for high-dimensional problems. Setting lamb to zero will ignore regularization, whereas large values of lambda will favour simpler models.
max_shuffles (int) – Maximum number of shuffles.
verbose (bool) – If True verbose output is enabled.
random_state (int | None) – Either random seed or an instance of np.random.RandomState.

change_points_

Locations of change points as integer indexes. By convention change points include the identity segmentation, i.e. first and last index + 1 values.

Type:: array_like, default=[]

_intermediate_change_points

Intermediate values of change points for each value of k = 1…k_max

Type:: List[List[int]], default=[]

_intermediate_ll

Intermediate values for log-likelihood for each value of k = 1…k_max

Type:: List[float], default=[]

Notes

Based on the work from [1].

source code adapted based on: https://github.com/cvxgrp/GGS
paper available at: https://stanford.edu/~boyd/papers/pdf/ggs.pdf

References

Examples

>>> import numpy as np
>>> from sklearn.preprocessing import MinMaxScaler
>>> from tsseg.algorithms import GreedyGaussianDetector
>>> X = np.random.default_rng(10).standard_normal((100, 2))
>>> X_scaled = MinMaxScaler(feature_range=(0, 1)).fit_transform(X)
>>> ggs = GreedyGaussianDetector(k_max=3, max_shuffles=5)
>>> y = ggs.fit_predict(X_scaled, axis=0)

set_fit_request(*, axis: bool | None | str = '$UNCHANGED$') → GreedyGaussianDetector

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: axis (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for axis parameter in fit.
Returns:: self – The updated object.
Return type:: object

set_predict_request(*, axis: bool | None | str = '$UNCHANGED$') → GreedyGaussianDetector

Configure whether metadata should be requested to be passed to the predict method.