tsseg.algorithms.vsax package

VSAX — Variable-length SAX state detection.

Description

VSAX converts each channel to Symbolic Aggregate approXimation (SAX) symbols, then finds the variable-length segmentation that minimises PAA reconstruction error plus an additive penalty per segment. The pipeline:

  1. Z-normalisation — optionally standardise each channel.

  2. PAA — reduce each candidate segment to paa_segments frames.

  3. SAX — discretise PAA values into alphabet_size symbols using Gaussian breakpoints (or adaptive empirical quantiles).

  4. DP segmentation — dynamic programming over num_lengths candidate segment lengths minimises reconstruction error + penalty per segment.

  5. Symbol merging — per-channel SAX symbol tuples are clustered via agglomerative clustering on Hamming distance (threshold symbol_merge_threshold).

Type: state detection
Supervision: semi-supervised or unsupervised
Scope: univariate

Parameters

Name

Type

Default

Description

alphabet_size

int

6

Number of SAX symbols per channel.

paa_segments

int

8

Number of PAA frames per segment.

min_segment_length

int

20

Minimum admissible segment length.

max_segment_length

int

180

Maximum admissible segment length.

num_lengths

int

6

Number of candidate lengths (linearly spaced min..max).

penalty

float

0.8

Cost per new segment. Larger values produce longer segments.

symbol_merge_threshold

float

0.2

Normalised Hamming distance threshold for merging symbols. 0 = exact match only, 1 = single global state.

zscore

bool

True

Apply per-channel z-normalisation.

adaptive_breakpoints

bool

True

Learn SAX breakpoints from empirical quantiles.

axis

int

0

Time axis.

Usage

from tsseg.algorithms import VSAXDetector

detector = VSAXDetector(
    alphabet_size=8, penalty=1.0, min_segment_length=30) states =
detector.fit_predict(X)

Implementation: Origin: new code.

Reference:

Submodules

tsseg.algorithms.vsax.detector module

Variable-length SAX baseline detector.

Segmentation via dynamic programming over per-channel SAX symbols with agglomerative symbol clustering. Reconstruction error is computed in O(1) per candidate via prefix sums.

class tsseg.algorithms.vsax.detector.VSAXDetector(*, axis=0, alphabet_size=6, paa_segments=8, min_segment_length=20, max_segment_length=180, num_lengths=6, penalty=0.8, symbol_merge_threshold=0.2, zscore=True, adaptive_breakpoints=True, random_state=0)[source]

Bases: BaseSegmenter

Baseline for state detection using variable-length SAX symbols.

The detector uses dynamic programming over variable-length Symbolic Aggregate approXimation (SAX) representations to find the segmentation that minimises PAA reconstruction error with an additive penalty controlling fragmentation.

SAX symbols are computed per channel, preserving multivariate structure. Similar symbols are merged into the same state via agglomerative clustering on Hamming distance, avoiding the brittleness of exact symbol matching.

Parameters:
  • axis (int) – Time axis. axis=0 assumes (n_timepoints, n_channels) input.

  • alphabet_size (int) – Number of SAX symbols per channel. Values >= 1 are supported.

  • paa_segments (int) – Number of PAA frames per segment. Short segments automatically reduce the number of frames so that every frame contains at least one sample; the resulting symbol is zero-padded (by repeating the last frame) to a fixed length of paa_segments * n_channels.

  • min_segment_length (int) – Minimum admissible segment length (in samples).

  • max_segment_length (int) – Maximum admissible segment length.

  • num_lengths (int) – Number of candidate lengths linearly spaced between min and max. Increasing this value improves flexibility at the cost of runtime.

  • penalty (float) – Cost added for every new segment. Use larger values to favour longer segments; reduce to obtain more change points.

  • symbol_merge_threshold (float) – Normalised distance threshold below which two SAX symbols are merged into the same state. Distance is measured as mean absolute difference of symbol indices divided by alphabet_size (so it lies in [0, 1]). 0 gives exact matching (original behaviour), 1 collapses everything into a single state.

  • zscore (bool) – Apply per-channel z-normalisation before computing scores.

  • adaptive_breakpoints (bool) – When True, learn SAX breakpoints from empirical quantiles of the training data instead of using Gaussian breakpoints.

  • random_state (int | None) – Accepted for API compatibility but unused (deterministic).

set_fit_request(*, axis: bool | None | str = '$UNCHANGED$') VSAXDetector

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

axis (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for axis parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, axis: bool | None | str = '$UNCHANGED$') VSAXDetector

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

axis (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for axis parameter in predict.

Returns:

self – The updated object.

Return type:

object

Module contents

Variable-length SAX baseline detector.