tsseg.metrics package

Submodules

tsseg.metrics.base module

class tsseg.metrics.base.BaseMetric(**kwargs)[source]

Bases: ABC

Base class for all metrics.

abstractmethod compute(y_true, y_pred, **kwargs)[source]

Computes the value of the metric.

Parameters:

y_true (ndarray) – Ground truth labels or change points.
y_pred (ndarray) – Predicted labels or change points.
**kwargs – Additional arguments for metric computation.

Return type:

dict[str, float]

Returns:

A dictionary containing metric names and their values.

tsseg.metrics.change_point_detection module

class tsseg.metrics.change_point_detection.Covering(convert_labels_to_segments=False, **kwargs)[source]

Bases: BaseMetric

Computes the Covering score for a segmentation.

The Covering metric evaluates how well the predicted segments cover the ground truth segments. It is calculated as a weighted sum of the maximum Intersection over Union (IoU) for each ground truth segment, where the weight is the length of the ground truth segment. This implementation is based on the logic proposed in various segmentation evaluation studies.

compute(y_true, y_pred)[source]

Computes the Covering score.

Parameters:

y_true (ndarray) – Array of true change points. The last element should be the total number of time steps.
y_pred (ndarray) – Array of predicted change points.

Return type:

dict[str, float]

Returns:

A dictionary with the Covering score.

class tsseg.metrics.change_point_detection.F1Score(margin=0.01, convert_labels_to_segments=False, **kwargs)[source]

Bases: BaseMetric

Computes the F1-score for change point detection.

compute(y_true, y_pred)[source]

Computes the F1-score, precision, and recall.

Parameters:

y_true (ndarray) – List of true change points. The last element should be the total number of time steps.
y_pred (ndarray) – List of predicted change points.

Return type:

dict[str, float]

Returns:

A dictionary with F1-score, precision, and recall.

class tsseg.metrics.change_point_detection.HausdorffDistance(**kwargs)[source]

Bases: BaseMetric

Computes the Hausdorff distance between two sets of change points.

compute(y_true, y_pred)[source]

Computes the Hausdorff distance.

Parameters:

y_true (ndarray) – Array of true change points.
y_pred (ndarray) – Array of predicted change points.

Return type:

dict[str, float]

Returns:

A dictionary with the Hausdorff distance.

tsseg.metrics.change_point_detection.labels_to_change_points(labels)[source]

Convert label sequence into change points (CPs).

Parameters:: labels (list or np.array) – Label sequence.
Returns:: Change points (CPs) including start (0) and end (n).
Return type:: list

tsseg.metrics.gaussian_f1 module

Experimental fuzzy F1 metric for change point detection.

This module introduces a differentiable alternative to the classic F1-score. Instead of relying on a hard margin around each change point, it evaluates predictions with a Gaussian reward that decays smoothly as the predicted change point drifts away from the ground truth. The default configuration uses the same Gaussian width for every change point, derived from a single fraction of the series length so that no event is implicitly favoured.

class tsseg.metrics.gaussian_f1.GaussianF1Score(*, sigma_fraction=0.01, min_sigma=1.0, adaptive_sigma=False, convert_labels_to_segments=False)[source]

Bases: BaseMetric

Gaussian-weighted alternative to the classic F1 score.

The metric operates in three conceptual steps:

Preparation – convert optional label sequences into change point
lists, remove boundary markers, and infer the series length.
Gaussian matching – every true change point is associated with a
Gaussian of width sigma_fraction * n (clamped below by min_sigma). Predictions are rewarded according to that shared kernel and a greedy assignment keeps the best non-overlapping pairs.
Soft precision & recall – derive precision and recall from the sum of
Gaussian rewards, yielding a fuzzy F1 in \([0, 1]\).

Special cases are handled explicitly:

No ground-truth change point – if the data really is stationary and no
change points are predicted either, we return the perfect score 1.0. Conversely, predicting spurious changes yields a zero score.
Single change point – the Gaussian spread still follows the global
fraction, ensuring a consistent reward scale across all events.

compute(y_true, y_pred)[source]

Return the Gaussian-weighted precision, recall, and F1 score.

Return type:: dict[str, float]

class tsseg.metrics.gaussian_f1.GaussianMatchResult(matched_weight, used_true, used_pred)[source]

Bases: object

Container storing intermediate results of the fuzzy matching.

matched_weight: float

used_pred: list[int]

used_true: list[int]

tsseg.metrics.bidirectional_covering module

Bidirectional Covering metric for change-point segmentation.

class tsseg.metrics.bidirectional_covering.BidirectionalCovering(*, convert_labels_to_segments=False, aggregation='harmonic', **kwargs)[source]

Bases: BaseMetric

Bidirectional extension of the classical Covering metric.

The classical Covering score only evaluates how well predicted segments cover the ground-truth segmentation. However, this directionality means that long predicted segments that cover the truth sparsely may still obtain a high score, even when the prediction introduces substantial over-segmentation.

The bidirectional variant evaluates coverage in both directions:

ground_truth_covering mirrors the traditional definition where each ground-truth interval is weighted by its duration and matched to the best overlapping predicted interval via Intersection over Union (IoU).
prediction_covering swaps the roles. Each predicted segment is weighted by its duration and matched to the best ground-truth overlap.

The two directional scores are then aggregated using an F1-style harmonic mean by default. Alternative aggregation strategies (geometric, arithmetic or min) can be selected via the aggregation argument. The resulting metric rewards segmentations that both cover the truth and avoid excessive over-segmentation.

Parameters:

convert_labels_to_segments (bool) – When True, the inputs are interpreted as label sequences and will be converted to change-points via tsseg.metrics.change_point_detection.labels_to_change_points().
aggregation (str) – Name of the aggregation strategy used to combine the two directional covering scores. Supported values are "harmonic" (default), "geometric", "arithmetic" and "min".
kwargs – Forwarded to tsseg.metrics.base.BaseMetric.

compute(y_true, y_pred)[source]

Computes the value of the metric.

Parameters:

y_true (Sequence[int] | ndarray) – Ground truth labels or change points.
y_pred (Sequence[int] | ndarray) – Predicted labels or change points.
**kwargs – Additional arguments for metric computation.

Return type:

dict[str, float]

Returns:

A dictionary containing metric names and their values.

tsseg.metrics.state_detection module

class tsseg.metrics.state_detection.AdjustedMutualInformation(**kwargs)[source]

Bases: BaseMetric

Computes the Adjusted Mutual Information (AMI).

compute(y_true, y_pred, **kwargs)[source]

Computes the value of the metric.

Parameters:

y_true (ndarray) – Ground truth labels or change points.
y_pred (ndarray) – Predicted labels or change points.
**kwargs – Additional arguments for metric computation.

Return type:

dict[str, float]

Returns:

A dictionary containing metric names and their values.

class tsseg.metrics.state_detection.AdjustedRandIndex(**kwargs)[source]

Bases: BaseMetric

Computes the Adjusted Rand Index (ARI).

compute(y_true, y_pred, **kwargs)[source]

Computes the value of the metric.

Parameters:

y_true (ndarray) – Ground truth labels or change points.
y_pred (ndarray) – Predicted labels or change points.
**kwargs – Additional arguments for metric computation.

Return type:

dict[str, float]

Returns:

A dictionary containing metric names and their values.

class tsseg.metrics.state_detection.NormalizedMutualInformation(**kwargs)[source]

Bases: BaseMetric

Computes the Normalized Mutual Information (NMI).

compute(y_true, y_pred, **kwargs)[source]

Computes the value of the metric.

Parameters:

y_true (ndarray) – Ground truth labels or change points.
y_pred (ndarray) – Predicted labels or change points.
**kwargs – Additional arguments for metric computation.

Return type:

dict[str, float]

Returns:

A dictionary containing metric names and their values.

class tsseg.metrics.state_detection.StateMatchingScore(weights=None, **kwargs)[source]

Bases: BaseMetric

Computes the State Matching Score (SMS).

DEFAULT_WEIGHTS = {'delay': 0.1, 'isolation': 0.8, 'missing': 0.5, 'transition': 0.3}

compute(y_true, y_pred, **kwargs)[source]

Computes the value of the metric.

Parameters:

y_true (ndarray) – Ground truth labels or change points.
y_pred (ndarray) – Predicted labels or change points.
**kwargs – Additional arguments for metric computation.

Return type:

dict[str, Any]

Returns:

A dictionary containing metric names and their values.

class tsseg.metrics.state_detection.WeightedAdjustedRandIndex(distance_func='linear', alpha=0.1, **kwargs)[source]

Bases: BaseMetric

Computes the Weighted Adjusted Rand Index (WARI).

compute(y_true, y_pred, **kwargs)[source]

Computes the value of the metric.

Parameters:

y_true (ndarray) – Ground truth labels or change points.
y_pred (ndarray) – Predicted labels or change points.
**kwargs – Additional arguments for metric computation.

Return type:

dict[str, float]

Returns:

A dictionary containing metric names and their values.

class tsseg.metrics.state_detection.WeightedNormalizedMutualInformation(distance_func='linear', alpha=0.1, average_method='arithmetic', **kwargs)[source]

Bases: BaseMetric

Computes the Weighted Normalized Mutual Information (WNMI).

compute(y_true, y_pred, **kwargs)[source]

Computes the value of the metric.

Parameters:

y_true (ndarray) – Ground truth labels or change points.
y_pred (ndarray) – Predicted labels or change points.
**kwargs – Additional arguments for metric computation.

Return type:

dict[str, float]

Returns:

A dictionary containing metric names and their values.

tsseg.metrics.state_detection.weighted_adjusted_rand_score(labels_true, labels_pred, weights)[source]: Compute the Weighted Adjusted Rand Index (WARI).

tsseg.metrics.state_detection.weighted_contingency_matrix(labels_true, labels_pred, weights, *, eps=None, sparse=False, dtype=<class 'numpy.float64'>)[source]: Build a weighted contingency matrix.

tsseg.metrics.state_detection.weighted_entropy(labels, weights)[source]: Compute the weighted entropy of a labeling.

tsseg.metrics.state_detection.weighted_mutual_info_score(labels_true, labels_pred, weights)[source]: Compute the Weighted Mutual Information (WMI).

tsseg.metrics.state_detection.weighted_normalized_mutual_info_score(labels_true, labels_pred, weights, *, average_method='arithmetic')[source]: Compute the Weighted Normalized Mutual Information (WNMI).

tsseg.metrics.state_detection.weighted_pair_confusion_matrix(labels_true, labels_pred, weights)[source]: Compute the weighted pair confusion matrix.

Module contents

class tsseg.metrics.AdjustedMutualInformation(**kwargs)[source]

Bases: BaseMetric

Computes the Adjusted Mutual Information (AMI).

compute(y_true, y_pred, **kwargs)[source]

Computes the value of the metric.

Parameters:

y_true (ndarray) – Ground truth labels or change points.
y_pred (ndarray) – Predicted labels or change points.
**kwargs – Additional arguments for metric computation.

Return type:

dict[str, float]

Returns:

A dictionary containing metric names and their values.

class tsseg.metrics.AdjustedRandIndex(**kwargs)[source]

Bases: BaseMetric

Computes the Adjusted Rand Index (ARI).

compute(y_true, y_pred, **kwargs)[source]

Computes the value of the metric.

Parameters:

y_true (ndarray) – Ground truth labels or change points.
y_pred (ndarray) – Predicted labels or change points.
**kwargs – Additional arguments for metric computation.

Return type:

dict[str, float]

Returns:

A dictionary containing metric names and their values.

class tsseg.metrics.BaseMetric(**kwargs)[source]

Bases: ABC

Base class for all metrics.

abstractmethod compute(y_true, y_pred, **kwargs)[source]

Computes the value of the metric.

Parameters:

y_true (ndarray) – Ground truth labels or change points.
y_pred (ndarray) – Predicted labels or change points.
**kwargs – Additional arguments for metric computation.

Return type:

dict[str, float]

Returns:

A dictionary containing metric names and their values.

class tsseg.metrics.BidirectionalCovering(*, convert_labels_to_segments=False, aggregation='harmonic', **kwargs)[source]

Bases: BaseMetric

Bidirectional extension of the classical Covering metric.

The classical Covering score only evaluates how well predicted segments cover the ground-truth segmentation. However, this directionality means that long predicted segments that cover the truth sparsely may still obtain a high score, even when the prediction introduces substantial over-segmentation.

The bidirectional variant evaluates coverage in both directions:

ground_truth_covering mirrors the traditional definition where each ground-truth interval is weighted by its duration and matched to the best overlapping predicted interval via Intersection over Union (IoU).
prediction_covering swaps the roles. Each predicted segment is weighted by its duration and matched to the best ground-truth overlap.

The two directional scores are then aggregated using an F1-style harmonic mean by default. Alternative aggregation strategies (geometric, arithmetic or min) can be selected via the aggregation argument. The resulting metric rewards segmentations that both cover the truth and avoid excessive over-segmentation.

Parameters:

convert_labels_to_segments (bool) – When True, the inputs are interpreted as label sequences and will be converted to change-points via tsseg.metrics.change_point_detection.labels_to_change_points().
aggregation (str) – Name of the aggregation strategy used to combine the two directional covering scores. Supported values are "harmonic" (default), "geometric", "arithmetic" and "min".
kwargs – Forwarded to tsseg.metrics.base.BaseMetric.

compute(y_true, y_pred)[source]

Computes the value of the metric.

Parameters:

y_true (Sequence[int] | ndarray) – Ground truth labels or change points.
y_pred (Sequence[int] | ndarray) – Predicted labels or change points.
**kwargs – Additional arguments for metric computation.

Return type:

dict[str, float]

Returns:

A dictionary containing metric names and their values.

class tsseg.metrics.Covering(convert_labels_to_segments=False, **kwargs)[source]

Bases: BaseMetric

Computes the Covering score for a segmentation.

The Covering metric evaluates how well the predicted segments cover the ground truth segments. It is calculated as a weighted sum of the maximum Intersection over Union (IoU) for each ground truth segment, where the weight is the length of the ground truth segment. This implementation is based on the logic proposed in various segmentation evaluation studies.

compute(y_true, y_pred)[source]

Computes the Covering score.

Parameters:

y_true (ndarray) – Array of true change points. The last element should be the total number of time steps.
y_pred (ndarray) – Array of predicted change points.

Return type:

dict[str, float]

Returns:

A dictionary with the Covering score.

class tsseg.metrics.F1Score(margin=0.01, convert_labels_to_segments=False, **kwargs)[source]

Bases: BaseMetric

Computes the F1-score for change point detection.

compute(y_true, y_pred)[source]

Computes the F1-score, precision, and recall.

Parameters:

y_true (ndarray) – List of true change points. The last element should be the total number of time steps.
y_pred (ndarray) – List of predicted change points.

Return type:

dict[str, float]

Returns:

A dictionary with F1-score, precision, and recall.

class tsseg.metrics.GaussianF1Score(*, sigma_fraction=0.01, min_sigma=1.0, adaptive_sigma=False, convert_labels_to_segments=False)[source]

Bases: BaseMetric

Gaussian-weighted alternative to the classic F1 score.

The metric operates in three conceptual steps:

Preparation – convert optional label sequences into change point
lists, remove boundary markers, and infer the series length.
Gaussian matching – every true change point is associated with a
Gaussian of width sigma_fraction * n (clamped below by min_sigma). Predictions are rewarded according to that shared kernel and a greedy assignment keeps the best non-overlapping pairs.
Soft precision & recall – derive precision and recall from the sum of
Gaussian rewards, yielding a fuzzy F1 in \([0, 1]\).

Special cases are handled explicitly:

No ground-truth change point – if the data really is stationary and no
change points are predicted either, we return the perfect score 1.0. Conversely, predicting spurious changes yields a zero score.
Single change point – the Gaussian spread still follows the global
fraction, ensuring a consistent reward scale across all events.

compute(y_true, y_pred)[source]

Return the Gaussian-weighted precision, recall, and F1 score.