Let us assume that there are two types of people: smart people an dumb people. Smart people’s test scores are normally distributed about 80% and dumb people’s tests scores are normally distributed about 40% on their test. If we observe the test score of one person, how do we know if they are smart or dumb? If we see a score of 85%, we are pretty sure they are smart. A dumb person might have had a good day, but this would be a low probability event. Similarly, if we saw a score of 35%, we would be fairly certain that the person is dumb, even though there is a small probability that a smart person may have had a bad day. If we see a score of 62%, however, then it is very difficult to distinguish if the person is smart of dumb. But how can we quantify the probabilities that a person is of a certain type.
One way of doing this is finite mixture models. Jim Hamilton’s Time Series Analysis book has a good explanation of this topic and I will review this material here.
Each type (e.g.: how smart the person is) will be designated as st=1,2,…, or N. Let us assume that there is an observed variable yt (e.g.: the test score) which is distributed according to a N(μs,σj2). What researchers wants to know is that given that we observe yt, what is the probability that the observation is from a person of type st=j.
Let us assume that we know the density of yt is:
- f(yt|st=j;θ)=(2πσj2)-1/2 * exp{-(yt – μj)/2σj2}
There is also some underlying distribution of types.
- P(st=j;θ)=λj
- θ=(μ1,…,μN,σ1,…,σN,λ1,…,λN)
From Bayes Rule, we know that:
- P(A and B)=P(A|B)*P(B), which implies
- f(yt,st=j;θ)=λj*(2πσj2)-1/2 * exp{-(yt – μj)/2σj2}
The unconditional density can be found as follows:
- f(yt;θ)=Σ1 to N p(yt,st=j;θ)
- f(yt;θ)=λ1*(2πσ12)-1/2 * exp{-(yt – μ1)/2σ12} +…+λN*(2πσN2)-1/2 * exp{-(yt – μN)/2σN2}
Now we can use maximum likelihood estimation techniques to find the θ which will maximize:
- maxθ L(θ)=Σ1 to Tlog f(yt;θ)
- s.t.: λ1 + λ2 +…+ λN=1
- s.t: λj≥0
Once we have the MLE estimated θ, we can figure out what the probability is that observation yt came from a person of type st=j. Using Bayes theory, again, we know that:
- P(st=j|yt;θ)=f(yt,st=j;θ)/f(yt;θ)=λj*f(yt|st=j;θ)/f(yt;θ)
This value represents the probabilty, given the observed data, that the unobserved type responsible for observation t was in of type j. For example, “…if an observation yt=0,, one could be vertually certain that the observation had come from a N(0,1) distribution rather than a N(4,1) distribution, so that P(st=1|yt;θ) for that date would be near unity. If instead yt were around 2.3, it is equally likely that the observation might have come from either regime so that P(st=1|yt;θ) for such an observation would be close to 0.5.”
Most of the above content came is from:
- James D. Hamilton (1994) Time Series Analysis, Princeton University Press, Princeton, NJ; pp. 685-689.