tsseg.algorithms.ggs package
GGS — Greedy Gaussian Segmentation.
Description
GGS segments a multivariate time series by modelling each segment as i.i.d. samples from a multivariate Gaussian distribution. The algorithm maximises the total log-likelihood minus a covariance-regularisation term \(\lambda\). A greedy dynamic-programming search finds an approximate solution in linear time.
The maximum number of change points k_max is an upper bound; the algorithm
may return fewer if additional splits do not improve the regularised likelihood.
Parameters
Name |
Type |
Default |
Description |
|---|---|---|---|
|
int |
|
Maximum number of change points. |
|
float |
|
Regularisation \(\lambda\) (>= 0). Controls covariance shrinkage. |
|
int |
|
Maximum number of shuffles during the greedy search. |
|
bool |
|
Print progress information. |
|
int / None |
|
Random seed. |
Usage
from tsseg.algorithms import GreedyGaussianDetector
detector = GreedyGaussianDetector(k_max=8, lamb=1.0)
labels = detector.fit_predict(X)
Implementation: Adapted from aeon. BSD 3-Clause.
Reference: Hallac, Nystrup & Boyd (2019), Greedy Gaussian Segmentation of Multivariate Time Series, Advances in Data Analysis and Classification.
Submodules
tsseg.algorithms.ggs.detector module
Greedy Gaussian Segmentation (_GGS).
The method approximates solutions for the problem of breaking a multivariate time series into segments, where the data in each segment could be modeled as independent samples from a multivariate Gaussian distribution. It uses a dynamic programming search algorithm with a heuristic that allows finding approximate solution in linear time with respect to the data length and always yields locally optimal choice.
This module is structured with the _GGS that implements the actual
segmentation algorithm and a GreedyGaussianSegmentation that
interfaces the algorithm with the sklearn/aeon api. The benefit
behind that design is looser coupling between the logic and the
interface introduced to allow for easier changes of either part
since segmentation still has an experimental nature. When making
algorithm changes you probably want to look into _GGS when
evolving the aeon/sklearn interface look into GreedyGaussianSegmentation.
This design also allows adapting _GGS to other interfaces.
Notes
Based on the work from Hallac et al. (2019).
source code adapted based on: https://github.com/cvxgrp/GGS Copyright (c) 2018, Stanford University Convex Optimization Group, BSD-2
paper available at: https://stanford.edu/~boyd/papers/pdf/ggs.pdf
- class tsseg.algorithms.ggs.detector.GreedyGaussianDetector(k_max=10, lamb=1.0, max_shuffles=250, verbose=False, random_state=None)[source]
Bases:
BaseSegmenterGreedy Gaussian Segmentation Estimator.
The method approximates solutions for the problem of breaking a multivariate time series into segments, where the data in each segment could be modeled as independent samples from a multivariate Gaussian distribution. It uses a dynamic programming search algorithm with a heuristic that allows finding approximate solution in linear time with respect to the data length and always yields locally optimal choice.
Greedy Gaussian Segmentation (GGS) fits a segmented gaussian model (SGM) to the data by computing the approximate solution to the combinatorial problem of finding the approximate covariance-regularized maximum log-likelihood for fixed number of change points and a reagularization strength. It follows an iterative procedure where a new breakpoint is added and then adjusting all breakpoints to (approximately) maximize the objective. It is similar to the top-down search used in other change point detection problems.
- Parameters:
k_max (
int) – Maximum number of change points to find. The number of segments is thus k+1.lamb (
float) – Regularization parameter lambda (>= 0), which controls the amount of (inverse) covariance regularization, see Eq (1) in [1]. Regularization is introduced to reduce issues for high-dimensional problems. Settinglambto zero will ignore regularization, whereas large values of lambda will favour simpler models.max_shuffles (
int) – Maximum number of shuffles.verbose (
bool) – IfTrueverbose output is enabled.random_state (
int|None) – Either random seed or an instance ofnp.random.RandomState.
- change_points_
Locations of change points as integer indexes. By convention change points include the identity segmentation, i.e. first and last index + 1 values.
- Type:
array_like, default=[]
- _intermediate_change_points
Intermediate values of change points for each value of k = 1…k_max
- Type:
List[List[int]], default=[]
- _intermediate_ll
Intermediate values for log-likelihood for each value of k = 1…k_max
- Type:
List[float], default=[]
Notes
Based on the work from [1].
source code adapted based on: https://github.com/cvxgrp/GGS
paper available at: https://stanford.edu/~boyd/papers/pdf/ggs.pdf
References
Examples
>>> import numpy as np >>> from sklearn.preprocessing import MinMaxScaler >>> from tsseg.algorithms import GreedyGaussianDetector >>> X = np.random.default_rng(10).standard_normal((100, 2)) >>> X_scaled = MinMaxScaler(feature_range=(0, 1)).fit_transform(X) >>> ggs = GreedyGaussianDetector(k_max=3, max_shuffles=5) >>> y = ggs.fit_predict(X_scaled, axis=0)
- set_fit_request(*, axis: bool | None | str = '$UNCHANGED$') GreedyGaussianDetector
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- set_predict_request(*, axis: bool | None | str = '$UNCHANGED$') GreedyGaussianDetector
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.