Many research questions require healthcare economists to measure the effect of various patient, physician or market-level characteristics on specific health events. Oftentimes, these events are discrete in nature. For instance, doctor’s visits, ER visits, and hospitalizations are all discrete events.
To properly estimate the effect of certain characteristics on a discrete event, count models are needed. The most frequently used count model is the Poisson regression. I describe the Poisson regression in detail here. [Note: In the case of a 0/1 event, logit or probit regressions are appropriate]. The drawback of the Poisson regression is that it requires the mean and variance of the dependent variable to be identical.
Another alternative is to use a negative binomial regression. Today, I describe how one estimates a negative binomial regression.
Negative Binomial Regression
A negative binomial regression models the probability a person with characteristics xi experiences exactly yi events (e.g., doctors visits, hospitalizations). Once can characterize this probability as follows.
- yi = dependent variable (e.g., doctor’s visits)
- λi = exp(xiβ)
- mi = (δ-1)λi(2-P) = exp[(2-P)xiβ – ln(δ)]
- Γ = the gamma distribuiton
The parameters δ and P need to be estimated (in addition to β). Setting P = 1 or P = 2 gives the NB-1 or NB-2 model.
A truncated version of the NB-P model can be used to analyze strictly positive counts. One can obtain this formulation by dividing the probability function by the probability of strictly positive doctor’s visits: 1 − Pr(yi = 0|xi).
One estimates the parameters δ, P and β to maximize the following log likelihood function:
- si = [mi/(λi + mi)]mi
The last term, ln(1-si), is only necessary if the dependent variable is strictly positive.
Negative Binomial Regression in Stata, SAS and R
To create a negative binomial regression in SAS, you use the same procedure as a Poisson regression, but you specify that the distribution is to be a negative binomial. For instance, to examine how various characteristics affect the probability of the number days absent from school, one could use the following specification.
proc genmod data = poissonreg;
model daysabs = male math langarts /dist=negbin;
A more detailed example is here.
In R, one can use the
glm.nb function to conduct a negative binomial regression.
- Farbmacher, H. (2012), Extensions of hurdle models for overdispersed count data. Health Econ.. doi: 10.1002/hec.2892