Finite Mixture Models – Healthcare Economist

Let us assume that there are two types of people: smart people an dumb people. Smart people’s test scores are normally distributed about 80% and dumb people’s tests scores are normally distributed about 40% on their test. If we observe the test score of one person, how do we know if they are smart or dumb? If we see a score of 85%, we are pretty sure they are smart. A dumb person might have had a good day, but this would be a low probability event. Similarly, if we saw a score of 35%, we would be fairly certain that the person is dumb, even though there is a small probability that a smart person may have had a bad day. If we see a score of 62%, however, then it is very difficult to distinguish if the person is smart of dumb. But how can we quantify the probabilities that a person is of a certain type.

One way of doing this is finite mixture models. Jim Hamilton’s Time Series Analysis book has a good explanation of this topic and I will review this material here.

Each type (e.g.: how smart the person is) will be designated as s_t=1,2,…, or N. Let us assume that there is an observed variable y_t (e.g.: the test score) which is distributed according to a N(μ_s,σ_j²). What researchers wants to know is that given that we observe y_t, what is the probability that the observation is from a person of type s_t=j.

Let us assume that we know the density of y_t is:

f(y_t|s_t=j;θ)=(2πσ_j²)^-1/2 * exp{-(y_t – μ_j)/2σ_j²}

There is also some underlying distribution of types.

P(s_t=j;θ)=λ_j
θ=(μ₁,…,μ_N,σ₁,…,σ_N,λ₁,…,λ_N)

From Bayes Rule, we know that:

P(A and B)=P(A|B)*P(B), which implies
f(y_t,s_t=j;θ)=λ_j*(2πσ_j²)^-1/2 * exp{-(y_t – μ_j)/2σ_j²}

The unconditional density can be found as follows:

f(y_t;θ)=Σ_{1 to N} p(y_t,s_t=j;θ)
f(y_t;θ)=λ₁*(2πσ₁²)^-1/2 * exp{-(y_t – μ₁)/2σ₁²} +…+λ_N*(2πσ_N²)^-1/2 * exp{-(y_t – μ_N)/2σ_N²}

Now we can use maximum likelihood estimation techniques to find the θ which will maximize:

max_θ L(θ)=Σ_{1 to T}log f(y_t;θ)
s.t.: λ₁ + λ₂ +…+ λ_N=1
s.t: λ_j≥0

Once we have the MLE estimated θ, we can figure out what the probability is that observation y_tcame from a person of type s_t=j. Using Bayes theory, again, we know that:

P(s_t=j|y_t;θ)=f(y_t,s_t=j;θ)/f(y_t;θ)=λ_j*f(y_t|s_t=j;θ)/f(y_t;θ)

This value represents the probabilty, given the observed data, that the unobserved type responsible for observation t was in of type j. For example, “…if an observation y_t=0,, one could be vertually certain that the observation had come from a N(0,1) distribution rather than a N(4,1) distribution, so that P(s_t=1|y_t;θ) for that date would be near unity. If instead y_t were around 2.3, it is equally likely that the observation might have come from either regime so that P(s_t=1|y_t;θ) for such an observation would be close to 0.5.”

Most of the above content came is from:

James D. Hamilton (1994) Time Series Analysis, Princeton University Press, Princeton, NJ; pp. 685-689.