Hidden Markov Model in Machine learning

In the vast landscape of machine learning, Hidden Markov Models (HMMs) stand as powerful tools for modeling sequential data, making them particularly useful in various applications such as speech recognition, bioinformatics, and finance. In this blog, we will dive into the intricacies of HMMs, explore their applications, work through a simple example on paper, and guide you through the steps to solve HMM problems.

Fig. Hidden Markov Model

1. What is HMM?

The Hidden Markov Model (HMM) is a statistical model employed to characterize systems undergoing changes in unobservable states over time. This model is based on the assumption that there exists an underlying process with hidden states, each associated with a known outcome. The model specifies probabilities governing transitions between these hidden states and the emission of observable symbols.

Fundamentally, an HMM serves as a statistical model capturing a system with concealed states, where each state releases observable symbols with specific probabilities. It elucidates the probabilistic connection between a sequence of observations and a sequence of hidden states. The states, though concealed, influence the observable symbols. The model posits that the system undergoes transitions between these hidden states over time. It finds application in scenarios where the underlying system or process generating the observations is obscured or unknown, hence earning the label "Hidden Markov Model." The HMM is employed for predicting forthcoming observations or categorizing sequences, relying on the concealed process generating the data.

Due to their remarkable ability to handle uncertainty and temporal dependencies, HMMs find applications across various industries such as finance, bioinformatics, and speech recognition. Their adaptability makes them valuable for modeling dynamic systems and predicting future states based on observed sequences. Some applications of HMM include:

1. Speech Recognition: HMMs have been instrumental in transforming speech signals into meaningful text by modeling the sequential nature of spoken language.

2. Bioinformatics: In genomics, HMMs are used for gene prediction, identifying functional elements, and aligning biological sequences.

3. Finance: In finance, HMMs can be employed to model market regimes and predict changes in financial markets.

2. The Weather Example

Consider a scenario where you have a hypothetical friend who lives in a hypothetical city and that city has only three weathers - rainy, sunny and cloudy. Your friend has only two moods - Happy or sad based on the weather in his city. This is a very standard example for HMM explanation, now here remember that you live in a different city so you don't know the weather of your friends city but you do know his mood.

Markov chains

A Markov chains, by definition is a process that consists of a finite number of states with the Markovian property and some transition probabilities pij, where pij is the probability of the process moving from state i to state j. Rather putting in simple terms of our example, it is the way we represent the probabilities of weather and your friends mood. Here is the representation of our example:

Fig. HMM Example

Here, the weather and the mood are called states and the arrows are the transition from one state to another, and the weight mentioned over it is the probability of transitioning from this state to the other state. The state you are on is the current state and the state you want to go to is the future state so, the weighted arrow shows you the probability of moving from your current state to the future state. Example, if current weather is rainy then it is a 30% probability that it will be cloudy tomorrow. Similarly, if it is sunny today then it is 80% chance that your friend is happy.

Some properties to remember for markov models:

1. The future state always depends only on the the current state and not on any other previous state or the sequence of states the occurred before.

2. The sum of weights of all the outgoing arrows from a state should be 1. As these are probabilities their sum should be 1.

If you perform a random walk (taking a start state and going to next state randomly and taking a random number of steps in this and calculating the probability) on the graph and calculate probability of each state in those random walk. If you repeat this activity for say hundred thousand times, that is up to infinity, you would see those value to converge to a fixed value, rather than those value keep changing everytime these values would reach an equilibrium state.

A hidden markov chain has various components that include:

1. States (Hidden and Observable): Hidden states represent the underlying structure of the system, while observable states are the ones we can directly measure. Like the weather in our example are the hidden state as these cannot be known. However your friends mood are observable states as they can be known directly.

A probability distribution is employed to model the connection between hidden states and observations in the Hidden Markov Model (HMM). This model relies on two sets of probabilities: transition probabilities, governing state transitions, and emission probabilities, dictating the likelihood of observations given specific states.

2. Transition Probabilities: The likelihood of moving from one hidden state to another. The probability of tomorrow being a sunny day given today is a cloudy day is a transition probability of 40%. Writing a adjacency matrix of these transition, is called a transition matrix which is given as follows:

Fig. Transition Probability

3. Emission Probabilities: The probability of observing a particular observable state given the current hidden state. The probability of your friend being sad given today is a rainy day is 90%. Writing a adjacency matrix of these probabilities, is called a emission matrix which is given as follows:

Fig. Emission Probability

4. Initial State Distribution: The probabilities of starting in each hidden state. These are actually those converged equilibrium state values, which in our case after computation are as follows:

Fig. Initial state distribution

A HMM model can answer various questions which include:

1. Evaluation — how much likely is that something observable will happen? In other words, what is probability of observation sequence? Eg. What is the probability of your friend being happy, happy and sad for the three consecutive days with the weather as sunny, cloudy, sunny respectively ? What is the probability of observing this sequence?

2. Decoding — what is the reason for observation that happened? In other words, what is most probable hidden states sequence when you have observation sequence? What is the most likely series of states to generate an observed sequence? Eg. What is the most probable weather sequence for your friend observed as happy, happy and sad ?

Solving the Questions:

1] What is the probability of your friend being happy, happy and sad for the three consecutive days given the weather as sunny, cloudy, sunny respectively ? What is the probability of observing this sequence?

This can be represented as this:-

Fig. Question represented in diagram

Now what we are asking basically is to calculate the probability of Y given X. So, lets write this as

P(Y = Happy, Happy, Sad | X = Sunny, Cloudy, Sunny)

This is just the joint probability calculation which will come as the multiplication of six probabilities, so this can be solved as:

P(Y|X) = P(X1 = Sunny). P(Y1 = Happy | X1 = Sunny).

P(X2 = Cloudy | X1 = Sunny).P(Y2 = Happy | X2 = Cloudy).

P(X3 = Sunny | X2 = Cloudy). P(Y3 = Sad | X3 = Sunny)

Here, P(X1 = Sunny) is the start probability and the rest values can be populated from the transition and emission matrices.

Which gives us,

P(Y|X) = 0.509 * 0.8 * 0.3 * 0.4 * 0.4 * 0.2

= 0.003909

So, it is a 0.39% chance that your friend will be happy, happy and sad
if the weather has been sunny, sunny and cloudy for 3 consecutive days
respectively.

2] What is the most probable weather sequence for your friend observed as happy and happy?

Fig. Question represented in diagram

So for these question marks that you see, these can be replaced by a number of different sequences of weather are possible. Like Sunny, Sunny, Sunny or Sunny, Rainy, Cloudy or Cloudy, Sunny, Sunny and many such more. We want to find the one with the maximum probability. Since there are 3 hidden states and 2 places for the respective observed sequence, there are total 3^2 = 9 possibilities.

The naive approach to solving this is to calculate all the 9 possible combinations probabilities and then the one with maximum value is your answer, but you can also use forward and backward algorithm to reduce your calculations. Also, if you know dynamic programming you can also write a code for this forward algorithm.

Lets go with the naive approach here to find the solution, so our 9 probabilities go as follows:

1. Sunny and Sunny -

P( Sunny, Sunny | Happy, Sad) = P( Sunny). P( Sunny | Happy).

P( Sunny | Sunny). P( Sad | Sunny)

= 0.509 * 0.8 * 0.5 * 0.2 = 0.0407

2. Sunny and Cloudy -

P( Sunny, Cloudy | Happy, Sad) = P( Sunny). P( Sunny | Happy).
P(Sunny | Cloudy).P( Sad | Cloudy)
= 0.509 * 0.8 * 0.3 * 0.6 = 0.0733

3. Sunny and Rainy -

P( Sunny, Rainy | Happy, Sad) = P( Sunny). P( Sunny | Happy).

P( Sunny | Rainy). P( Sad | Rainy)
= 0.509 * 0.8 * 0 * 0.9 = 0

4. Cloudy and Sunny -

P( Cloudy, Sunny | Happy, Sad) = P( Cloudy). P( Cloudy | Happy).

P( Cloudy | Sunny). P( Sad | Sunny)

= 0.273 * 0.4 * 0.4 * 0.2 = 0.0087

5. Cloudy and Cloudy -

P( Cloudy, Cloudy | Happy, Sad) = P( Cloudy). P( Cloudy | Happy).

P(Cloudy | Cloudy).P(Sad | Cloudy)
= 0.273 * 0.4 * 0.2 * 0.6 = 0.0131

6. Cloudy and Rainy -

P( Cloudy, Rainy | Happy, Sad) = P( Cloudy). P( Cloudy | Happy).

P( Cloudy | Rainy). P( Sad | Rainy)
= 0.273 * 0.4 * 0.4 * 0.9 = 0.0393

7. Rainy and Sunny -

P( Rainy, Sunny | Happy, Sad) = P( Rainy). P( Rainy | Happy).
P( Rainy | Sunny). P( Sad | Sunny)
= 0.218 * 0.1 * 0.2 * 0.2 = 0.0008

8. Rainy and Cloudy -

P( Rainy, Cloudy | Happy, Sad) = P( Rainy). P( Rainy | Happy).

P(Rainy | Cloudy). P( Sad | Cloudy)
= 0.218 * 0.1 * 0.3 * 0.6 = 0.0039

9. Rainy and Rainy -

P( Rainy, Rainy | Happy, Sad) = P( Rainy). P( Rainy | Happy).

P( Rainy | Rainy). P( Sad | Rainy)
= 0.218 * 0.1 * 0.5 * 0.9 = 0.0098

So, the highest probability is for Sunny and Cloudy that is 0.0733, So if you observe your friend as happy and sad then the most probable weather sequence is Sunny and Cloudy.