Finite Mixture Models

Let us assume that there are two types of people: smart people an dumb people. Smart people’s test scores are normally distributed about 80% and dumb people’s tests scores are normally distributed about 40% on their test. If we observe the test score of one person, how do we know if they are smart or dumb? If we see a score of 85%, we are pretty sure they are smart. A dumb person might have had a good day, but this would be a low probability event. Similarly, if we saw a score of 35%, we would be fairly certain that the person is dumb, even though there is a small probability that a smart person may have had a bad day. If we see a score of 62%, however, then it is very difficult to distinguish if the person is smart of dumb. But how can we quantify the probabilities that a person is of a certain type.

One way of doing this is finite mixture models. Jim Hamilton’s Time Series Analysis book has a good explanation of this topic and I will review this material here.

Each type (e.g.: how smart the person is) will be designated as st=1,2,…, or N. Let us assume that there is an observed variable yt (e.g.: the test score) which is distributed according to a N(μsj2). What researchers wants to know is that given that we observe yt, what is the probability that the observation is from a person of type st=j.

Let us assume that we know the density of yt is:

  • f(yt|st=j;θ)=(2πσj2)-1/2 * exp{-(yt – μj)/2σj2}

There is also some underlying distribution of types.

  • P(st=j;θ)=λj
  • θ=(μ1,…,μN1,…,σN1,…,λN)

From Bayes Rule, we know that:

  • P(A and B)=P(A|B)*P(B), which implies
  • f(yt,st=j;θ)=λj*(2πσj2)-1/2 * exp{-(yt – μj)/2σj2}

The unconditional density can be found as follows:

  • f(yt;θ)=Σ1 to N p(yt,st=j;θ)
  • f(yt;θ)=λ1*(2πσ12)-1/2 * exp{-(yt – μ1)/2σ12} +…+λN*(2πσN2)-1/2 * exp{-(yt – μN)/2σN2}

Now we can use maximum likelihood estimation techniques to find the θ which will maximize:

  • maxθ L(θ)=Σ1 to Tlog f(yt;θ)
  • s.t.: λ1 + λ2 +…+ λN=1
  • s.t: λj≥0

Once we have the MLE estimated θ, we can figure out what the probability is that observation yt came from a person of type st=j. Using Bayes theory, again, we know that:

  • P(st=j|yt;θ)=f(yt,st=j;θ)/f(yt;θ)=λj*f(yt|st=j;θ)/f(yt;θ)

This value represents the probabilty, given the observed data, that the unobserved type responsible for observation t was in of type j. For example, “…if an observation yt=0,, one could be vertually certain that the observation had come from a N(0,1) distribution rather than a N(4,1) distribution, so that P(st=1|yt;θ) for that date would be near unity. If instead yt were around 2.3, it is equally likely that the observation might have come from either regime so that P(st=1|yt;θ) for such an observation would be close to 0.5.”

Most of the above content came is from:

  • James D. Hamilton (1994) Time Series Analysis, Princeton University Press, Princeton, NJ; pp. 685-689.