tsseg.algorithms.hmm package
HMM — Hidden Markov Model state annotation via Viterbi decoding.
Description
Annotates a univariate time series with hidden-state labels using the Viterbi algorithm. The emission distributions, transition matrix and initial probabilities must be provided by the user (no EM learning). This makes the detector suitable as a baseline or when prior distributions are known.
Parameters
Name |
Type |
Default |
Description |
|---|---|---|---|
|
list / None |
|
List of callables (PDFs) for each hidden state. Default: two-state
Gaussian |
|
ndarray / None |
|
Row-stochastic transition matrix. Default: |
|
ndarray / None |
|
Initial state probabilities. Default: uniform. |
Usage
from tsseg.algorithms import HMMDetector
detector = HMMDetector() # default two-state Gaussian
labels = detector.fit_predict(X)
Implementation: Adapted from aeon. BSD 3-Clause.
Reference: Rabiner (1989), A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE.
Submodules
tsseg.algorithms.hmm.detector module
HMM annotation Estimator.
Implements a basic Hidden Markov Model (HMM) as a segmentor. To read more about the algorithm, check out the HMM wikipedia page.
- class tsseg.algorithms.hmm.detector.HMMDetector(emission_funcs=None, transition_prob_mat=None, initial_probs=None)[source]
Bases:
BaseSegmenterImplements a simple HMM fitted with Viterbi algorithm.
The HMM annotation estimator uses the the Viterbi algorithm to fit a sequence of ‘hidden state’ class annotations (represented by an array of integers the same size as the observation) to a sequence of observations.
This is done by finding the most likely path given the emission probabilities - (ie the probability that a particular observation would be generated by a given hidden state), the transition prob (ie the probability of transitioning from one state to another or staying in the same state) and the initial probabilities - ie the belief of the probability distribution of hidden states at the start of the observation sequence).
- Current assumptions/limitations of this implementation:
the spacing of time series points is assumed to be equivalent.
it only works on univariate data.
- the emission parameters and transition probabilities are
assumed to be known.
- if no initial probs are passed, uniform probabilities are
assigned (ie rather than the stationary distribution.)
requires and returns np.ndarrays.
_fit is currently empty as the parameters of the probability distribution are required to be passed to the algorithm.
_predict - first the transition_probability and transition_id matrices are calculated - these are both nxm matrices, where n is the number of hidden states and m is the number of observations. The transition probability matrices record the probability of the most likely sequence which has observation
mbeing assigned to hidden state n. The transition_id matrix records the step before hidden state n that proceeds it in the most likely path. This logic is mostly carried out by helper function _calculate_trans_mats. Next, these matrices are used to calculate the most likely path (by backtracing from the final mostly likely state and the id’s that proceeded it.) This logic is done via a helper func hmm_viterbi_label.- Parameters:
emission_funcs (
list|None) – List should be of length n (the number of hidden states) Either a list of callables [fx_1, fx_2] with signature fx_1(X) -> float or a list of callables and matched keyword arguments for those callables [(fx_1, kwarg_1), (fx_2, kwarg_2)] with signaturefx_1(X, **kwargs) -> float(or a list with some mixture of the two). The callables should take a value and return a probability when passed a single observation. All functions should be properly normalized PDFs over the same space as the observed data.transition_prob_mat (
ndarray|None) – Each row should sum to 1 in order to be properly normalized (ie the j’th column in the i’th row represents the probability of transitioning from state i to state j.)initial_probs (
ndarray|None) – A array of probabilities that the sequence of hidden states starts in each of the hidden states. If passed, should be of lengthnthe number of hidden states and should match the length of both the emission funcs list and the transition_prob_mat. The initial probs should be reflective of prior beliefs. If none is passed will each hidden state will be assigned an equal initial prob.
- emission_funcs
The functions to use in calculating the emission probabilities. Taken from the __init__ param of same name.
- Type:
list, shape = [num_hidden_states]
- transition_prob_mat
Matrix of transition probabilities from hidden state to hidden state. Taken from the __init__ param of same name.
- Type:
2D np.ndarry, shape = [num_states, num_states]
- initial_probs
Probability over the hidden state identity of the first state. If the __init__ param of same name was passed it will take on that value. Otherwise it is set to be uniform over all hidden states.
- Type:
1D np.ndarray, shape = [num_hidden_states]
- num_states
The number of hidden states. Set to be the length of the emission_funcs parameter which was passed.
- Type:
- trans_prob
Shape [num observations, num hidden states]. The max probability that that observation is assigned to that hidden state. Calculated in _calculate_trans_mat and assigned in _predict.
- Type:
2D np.ndarray, shape = [num_observations, num_hidden_states]
- trans_id
Shape [num observations, num hidden states]. The state id of the state proceeding the observation is assigned to that hidden state in the most likely path where that occurs. Calculated in _calculate_trans_mat and assigned in _predict.
- Type:
2D np.ndarray, shape = [num_observations, num_hidden_states]
Examples
>>> from aeon.segmentation import HMMSegmenter >>> from scipy.stats import norm >>> from numpy import asarray >>> # define the emission probs for our HMM model: >>> centers = [3.5,-5] >>> sd = [.25 for i in centers] >>> emi_funcs = [(norm.pdf, {'loc': mean, ... 'scale': sd[ind]}) for ind, mean in enumerate(centers)] >>> hmm = HMMSegmenter(emi_funcs, asarray([[0.25,0.75], [0.666, 0.333]])) >>> # generate synthetic data (or of course use your own!) >>> obs = asarray([3.7,3.2,3.4,3.6,-5.1,-5.2,-4.9]) >>> hmm.fit_predict(obs) array([0., 0., 0., 0., 1., 1., 1.])
- set_fit_request(*, axis: bool | None | str = '$UNCHANGED$') HMMDetector
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- set_predict_request(*, axis: bool | None | str = '$UNCHANGED$') HMMDetector
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.