tsseg.algorithms.hmm package

HMM — Hidden Markov Model state annotation via Viterbi decoding.

Description

Annotates a univariate time series with hidden-state labels using the Viterbi algorithm. The emission distributions, transition matrix and initial probabilities must be provided by the user (no EM learning). This makes the detector suitable as a baseline or when prior distributions are known.

Type: state detection
Supervision: requires known distributions (no learning)
Scope: univariate only

Parameters

Name

Type

Default

Description

emission_funcs

list / None

None

List of callables (PDFs) for each hidden state. Default: two-state Gaussian N(0,1) / N(1,1).

transition_prob_mat

ndarray / None

None

Row-stochastic transition matrix. Default: [[0.9,0.1],[0.1,0.9]].

initial_probs

ndarray / None

None

Initial state probabilities. Default: uniform.

Usage

from tsseg.algorithms import HMMDetector

detector = HMMDetector()   # default two-state Gaussian
labels = detector.fit_predict(X)

Implementation: Adapted from aeon. BSD 3-Clause.

Reference: Rabiner (1989), A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE.

Submodules

tsseg.algorithms.hmm.detector module

HMM annotation Estimator.

Implements a basic Hidden Markov Model (HMM) as a segmentor. To read more about the algorithm, check out the HMM wikipedia page.

class tsseg.algorithms.hmm.detector.HMMDetector(emission_funcs=None, transition_prob_mat=None, initial_probs=None)[source]

Bases: BaseSegmenter

Implements a simple HMM fitted with Viterbi algorithm.

The HMM annotation estimator uses the the Viterbi algorithm to fit a sequence of ‘hidden state’ class annotations (represented by an array of integers the same size as the observation) to a sequence of observations.

This is done by finding the most likely path given the emission probabilities - (ie the probability that a particular observation would be generated by a given hidden state), the transition prob (ie the probability of transitioning from one state to another or staying in the same state) and the initial probabilities - ie the belief of the probability distribution of hidden states at the start of the observation sequence).

Current assumptions/limitations of this implementation:
  • the spacing of time series points is assumed to be equivalent.

  • it only works on univariate data.

  • the emission parameters and transition probabilities are

    assumed to be known.

  • if no initial probs are passed, uniform probabilities are

    assigned (ie rather than the stationary distribution.)

  • requires and returns np.ndarrays.

_fit is currently empty as the parameters of the probability distribution are required to be passed to the algorithm.

_predict - first the transition_probability and transition_id matrices are calculated - these are both nxm matrices, where n is the number of hidden states and m is the number of observations. The transition probability matrices record the probability of the most likely sequence which has observation m being assigned to hidden state n. The transition_id matrix records the step before hidden state n that proceeds it in the most likely path. This logic is mostly carried out by helper function _calculate_trans_mats. Next, these matrices are used to calculate the most likely path (by backtracing from the final mostly likely state and the id’s that proceeded it.) This logic is done via a helper func hmm_viterbi_label.

Parameters:
  • emission_funcs (list | None) – List should be of length n (the number of hidden states) Either a list of callables [fx_1, fx_2] with signature fx_1(X) -> float or a list of callables and matched keyword arguments for those callables [(fx_1, kwarg_1), (fx_2, kwarg_2)] with signature fx_1(X, **kwargs) -> float (or a list with some mixture of the two). The callables should take a value and return a probability when passed a single observation. All functions should be properly normalized PDFs over the same space as the observed data.

  • transition_prob_mat (ndarray | None) – Each row should sum to 1 in order to be properly normalized (ie the j’th column in the i’th row represents the probability of transitioning from state i to state j.)

  • initial_probs (ndarray | None) – A array of probabilities that the sequence of hidden states starts in each of the hidden states. If passed, should be of length n the number of hidden states and should match the length of both the emission funcs list and the transition_prob_mat. The initial probs should be reflective of prior beliefs. If none is passed will each hidden state will be assigned an equal initial prob.

emission_funcs

The functions to use in calculating the emission probabilities. Taken from the __init__ param of same name.

Type:

list, shape = [num_hidden_states]

transition_prob_mat

Matrix of transition probabilities from hidden state to hidden state. Taken from the __init__ param of same name.

Type:

2D np.ndarry, shape = [num_states, num_states]

initial_probs

Probability over the hidden state identity of the first state. If the __init__ param of same name was passed it will take on that value. Otherwise it is set to be uniform over all hidden states.

Type:

1D np.ndarray, shape = [num_hidden_states]

num_states

The number of hidden states. Set to be the length of the emission_funcs parameter which was passed.

Type:

int

states

A list of integers from 0 to num_states-1. Integer labels for the hidden states.

Type:

list

num_obs

The length of the observations data. Extracted from data.

Type:

int

trans_prob

Shape [num observations, num hidden states]. The max probability that that observation is assigned to that hidden state. Calculated in _calculate_trans_mat and assigned in _predict.

Type:

2D np.ndarray, shape = [num_observations, num_hidden_states]

trans_id

Shape [num observations, num hidden states]. The state id of the state proceeding the observation is assigned to that hidden state in the most likely path where that occurs. Calculated in _calculate_trans_mat and assigned in _predict.

Type:

2D np.ndarray, shape = [num_observations, num_hidden_states]

Examples

>>> from aeon.segmentation import HMMSegmenter
>>> from scipy.stats import norm
>>> from numpy import asarray
>>> # define the emission probs for our HMM model:
>>> centers = [3.5,-5]
>>> sd = [.25 for i in centers]
>>> emi_funcs = [(norm.pdf, {'loc': mean,
...  'scale': sd[ind]}) for ind, mean in enumerate(centers)]
>>> hmm = HMMSegmenter(emi_funcs, asarray([[0.25,0.75], [0.666, 0.333]]))
>>> # generate synthetic data (or of course use your own!)
>>> obs = asarray([3.7,3.2,3.4,3.6,-5.1,-5.2,-4.9])
>>> hmm.fit_predict(obs)
array([0., 0., 0., 0., 1., 1., 1.])
set_fit_request(*, axis: bool | None | str = '$UNCHANGED$') HMMDetector

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

axis (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for axis parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, axis: bool | None | str = '$UNCHANGED$') HMMDetector

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

axis (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for axis parameter in predict.

Returns:

self – The updated object.

Return type:

object

Module contents