tsseg.algorithms.hmm package

HMM — Hidden Markov Model state annotation via Viterbi decoding.

Description

Annotates a univariate time series with hidden-state labels using the Viterbi algorithm. The emission distributions, transition matrix and initial probabilities must be provided by the user (no EM learning). This makes the detector suitable as a baseline or when prior distributions are known.

Type: state detection
Supervision: requires known distributions (no learning)
Scope: univariate only

Parameters

Name	Type	Default	Description
`emission_funcs`	list / None	`None`	List of callables (PDFs) for each hidden state. Default: two-state Gaussian `N(0,1)` / `N(1,1)`.
`transition_prob_mat`	ndarray / None	`None`	Row-stochastic transition matrix. Default: `[[0.9,0.1],[0.1,0.9]]`.
`initial_probs`	ndarray / None	`None`	Initial state probabilities. Default: uniform.

Usage

from tsseg.algorithms import HMMDetector

detector = HMMDetector()   # default two-state Gaussian
labels = detector.fit_predict(X)

Implementation: Adapted from aeon. BSD 3-Clause.

Reference: Rabiner (1989), A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE.

Submodules

tsseg.algorithms.hmm.detector module

HMM annotation Estimator.

Implements a basic Hidden Markov Model (HMM) as a segmentor. To read more about the algorithm, check out the HMM wikipedia page.

class tsseg.algorithms.hmm.detector.HMMDetector(emission_funcs=None, transition_prob_mat=None, initial_probs=None)[source]

Bases: BaseSegmenter

Implements a simple HMM fitted with Viterbi algorithm.

The HMM annotation estimator uses the the Viterbi algorithm to fit a sequence of ‘hidden state’ class annotations (represented by an array of integers the same size as the observation) to a sequence of observations.

This is done by finding the most likely path given the emission probabilities - (ie the probability that a particular observation would be generated by a given hidden state), the transition prob (ie the probability of transitioning from one state to another or staying in the same state) and the initial probabilities - ie the belief of the probability distribution of hidden states at the start of the observation sequence).

Current assumptions/limitations of this implementation:

the spacing of time series points is assumed to be equivalent.
it only works on univariate data.
the emission parameters and transition probabilities are
assumed to be known.
if no initial probs are passed, uniform probabilities are
assigned (ie rather than the stationary distribution.)
requires and returns np.ndarrays.

_fit is currently empty as the parameters of the probability distribution are required to be passed to the algorithm.

_predict - first the transition_probability and transition_id matrices are calculated - these are both nxm matrices, where n is the number of hidden states and m is the number of observations. The transition probability matrices record the probability of the most likely sequence which has observation m being assigned to hidden state n. The transition_id matrix records the step before hidden state n that proceeds it in the most likely path. This logic is mostly carried out by helper function _calculate_trans_mats. Next, these matrices are used to calculate the most likely path (by backtracing from the final mostly likely state and the id’s that proceeded it.) This logic is done via a helper func hmm_viterbi_label.

Parameters:

emission_funcs (list | None) – List should be of length n (the number of hidden states) Either a list of callables [fx_1, fx_2] with signature fx_1(X) -> float or a list of callables and matched keyword arguments for those callables [(fx_1, kwarg_1), (fx_2, kwarg_2)] with signature fx_1(X, **kwargs) -> float (or a list with some mixture of the two). The callables should take a value and return a probability when passed a single observation. All functions should be properly normalized PDFs over the same space as the observed data.
transition_prob_mat (ndarray | None) – Each row should sum to 1 in order to be properly normalized (ie the j’th column in the i’th row represents the probability of transitioning from state i to state j.)
initial_probs (ndarray | None) – A array of probabilities that the sequence of hidden states starts in each of the hidden states. If passed, should be of length n the number of hidden states and should match the length of both the emission funcs list and the transition_prob_mat. The initial probs should be reflective of prior beliefs. If none is passed will each hidden state will be assigned an equal initial prob.

emission_funcs

The functions to use in calculating the emission probabilities. Taken from the __init__ param of same name.

Type:: list, shape = [num_hidden_states]

transition_prob_mat

Matrix of transition probabilities from hidden state to hidden state. Taken from the __init__ param of same name.

Type:: 2D np.ndarry, shape = [num_states, num_states]

initial_probs

Probability over the hidden state identity of the first state. If the __init__ param of same name was passed it will take on that value. Otherwise it is set to be uniform over all hidden states.

Type:: 1D np.ndarray, shape = [num_hidden_states]

num_states

The number of hidden states. Set to be the length of the emission_funcs parameter which was passed.

Type:: int

states

A list of integers from 0 to num_states-1. Integer labels for the hidden states.

Type:: list

num_obs

The length of the observations data. Extracted from data.

Type:: int

trans_prob

Shape [num observations, num hidden states]. The max probability that that observation is assigned to that hidden state. Calculated in _calculate_trans_mat and assigned in _predict.

Type:: 2D np.ndarray, shape = [num_observations, num_hidden_states]

trans_id

Shape [num observations, num hidden states]. The state id of the state proceeding the observation is assigned to that hidden state in the most likely path where that occurs. Calculated in _calculate_trans_mat and assigned in _predict.

Type:: 2D np.ndarray, shape = [num_observations, num_hidden_states]

Examples

>>> from aeon.segmentation import HMMSegmenter
>>> from scipy.stats import norm
>>> from numpy import asarray
>>> # define the emission probs for our HMM model:
>>> centers = [3.5,-5]
>>> sd = [.25 for i in centers]
>>> emi_funcs = [(norm.pdf, {'loc': mean,
...  'scale': sd[ind]}) for ind, mean in enumerate(centers)]
>>> hmm = HMMSegmenter(emi_funcs, asarray([[0.25,0.75], [0.666, 0.333]]))
>>> # generate synthetic data (or of course use your own!)
>>> obs = asarray([3.7,3.2,3.4,3.6,-5.1,-5.2,-4.9])
>>> hmm.fit_predict(obs)
array([0., 0., 0., 0., 1., 1., 1.])

set_fit_request(*, axis: bool | None | str = '$UNCHANGED$') → HMMDetector

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: axis (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for axis parameter in fit.
Returns:: self – The updated object.
Return type:: object

set_predict_request(*, axis: bool | None | str = '$UNCHANGED$') → HMMDetector

Configure whether metadata should be requested to be passed to the predict method.