ESD: Willard Manning

Today I will review the insightful lecture of Willard Manning at European Science Days. Manning is most famous for his work with the RAND Health Insurance Experiment.

Problems with Healthcare Data

There are 4 major econometric problems one must consider when trying to analyze health care cost and utilization data:

  1. There is a large mass of individuals with zero utilization (or expenditures) during a given time period,
  2. Consumption among those with any care is very skewed (e.g.: visits, hospitalizations, expenditures),
  3. The dependent variable often responds in a non-linear manner to many covariates,
  4. demand response to covariates may change by the level of demand (e.g.: outpatient to inpatient, or low to high levels)

Log or Box-Cox Transformations

While using OLS is easy, it can often produce out-of-range predictions (i.e.: yhat=xβhat<0). Since health care data is skewed, many researchers decide to log the dependent variable in order to have a more symetric distribution of errors. The tradeoff of using logs is that although one gains precision and robustness, no one is interested in log-scale results per se.

The Box-Cox transformation of y is as follows:

  • [(yλ-1)/λ]=xβ+ε, if λ≠0
  • log(y)=xβ+ε, if λ=0

One estimates λ using MLE in order to minimize the skewness in the residuals.

Log Example

Using a log transformation implies that second moments often matter. For instance, let us assume log(y|g)~N(μgg), where treatment g=A, B. Then we know

  • E(y|g=A) = exp[μa+ 0.5(σa)2].
  • E(y|g=A)/E(y|g=B) = exp[(μab)+ 0.5{(σa)2-(σb)2}]

We can see from the second equation above, that the second moment of the distributions matters if there is heteroskedasticity, but not if there is homoskedasticity (i.e.: σab=σ)

Marginal Effects with log transformation

Calculating marginal effects with non-linear econometric formulations is often difficult.  For instance, we know that E(y)= exp(xβ)E{exp(ε)|x}. This implies that the marginal effect is equal to:

  • dE(y)/d(xk)=exp(xβ)[βkE{exp(ε)|x}+ d E{exp(ε)|x}/d(xk)]

This is much more complicated that the incorrect formulation that: dE(y)/d(xk)=exp(xβ)βk.

Generalized Linear Model Approach

In this method, one searches for the appropriate β’s to solve the following function:

  • Σ dμ(xβ)/dβ*V(x)-1*(y-μ(xβ))=0

In practice, one usually assumes that μ(xβ)=exp[xβ]. A variance structure is assumed so that Var(y|x)=α[E(y|x)]γ. The γ’s correspond to some standard parametric distributions:

  • Gaussian NLS: γ=0
  • Poisson: γ=1
  • Gamma: γ=2
  • Wald or inverse Gamma: γ=3.

Two Part Models

To this point, we have been focusing on the skewness problem and been ignoring the fact that many of the observations also clump at zero. We can decompose the expected value as follows:

  • E(y|x) = P(y>0)*E{y|y>0} + P(y=0)*0 = P(y>0)*E{y|y>0}

Now we must estimate P(y>0) and E(y|y>0) separately. The first part term we can estimate with a probit model [P(y>0)=Φ(xα). The second part one can log the y term to take into account skewness.

If the log-scale error term is normally distributed, then:

  • yhat= Φ(xα)*exp(xβ + .5σ2), where β, σ are estimated from the data.

If the log-scale error term is not normally distributed, than one can use the following formulation:

  • yhat= Φ(xα)*exp(xβ)*D
  • D is Duan’s (JASA 1983) smearing estimator:
  • D=N-1Σexp[ε]=N-1Σexp[ln(y|y>0)-xβols]

Count Data

Count data in health economics is very common. The number of doctor visits, hospitalizations and ER visits all are types of count data. Poisson and Negative Binomial regressions are frequently recommended for these types of data.