Oftentimes, people use the following rule of thumb: if the dependent variable is continuous, use OLS; if binary use a logit or probit. But what should you do if your dependent variable is fraction between 0 and 1. To use a logit or probit one would have to unnecessarily transform the dependent variable into binary form. If one would use OLS, the estimation of the coefficients would likely be incorrect. Because the dependent variable is bounded between 0 and 1, the effenct of any explanatory variably xj cannot be constant through its entire range. Additionally, the predicted values from an OLS regression often produce figures outside the range of 0 to 1.
A paper by Papke and Wooldridge (1996) examines potential econometric alternatives when your dependent variable is fractional.
LOG-ODDS RATIO
One option to estimate a fractional response variable is to transform the dependent variable into a a log-odds ratio. For instance:
- E(log[y/(1-y)]|x) = xβ
This model is simple and can be estimated with OLS techniques onces the depenent variable is transformed. It only works, however, when the dependent variable is strictly between 0 and 1. [If y=0 the you have the log(0) and if y=1 then you get the log(1/0) which is ∞]. Additionally, using this framework, it is difficult to recover E(y|x). Under the model specified above:
- E(y|x)=∫ {exp(xβ+ν)/[1+exp(xβ+ν)]} * f(ν|x)dν
If the residuals are independent of the explanatory variables (i.e., ν⊥x), one can use Duan’s (1983) smearing technique to estimate f(•). If not, one must make functional form assumptions regarding the distribution of the error terms.
QUASI-LIKELIHOOD METHODS
Papke and Wooldridge support using quasi-likelihood methods. Assume the following relationship:
- E(y|x) = G(xβ)
where 0<1 for all z∈ℜ. The most popular choice for G(z) is the logistic function where G(z)=exp(z)/[1+exp(z)]. In this model, one can estimate the parameters β using the following Brenoulli log-likelihood function:
- li(β) ≡ yilog[G(xiβ)] + (1-yi)log[1-G(xiβ)]
This method has several advantages. First, it is fairly easy to estimate. Secondly, the equation above is a member of the linear exponential family thus the quasi MLE method will produce a consistent estimator of β where β is normally distributed. Assuming a logit function for G(z) produces the following variance:
- Var(yi|xi) = σ2 * G(xiβ)[1-G(xiβ)]
The Papke and Wooldridge (1996) also describe how to compute the asymptotic variance of the estimator β.
- Papke LE and Wooldridge JM (1996) “Econometric methods for fractional response variables with an application to 401(k) plan participation rates“, Journal of Applied Econometrics, v11:619-632.