**OrdinaryÂ Least Squares**Â

If you have studied basic statistics, its likely that you have come across the ordinary least squares (OLS) estimation technique.Â OLS attempts to minimize the squared distance between dependent variables (‘*y*‘) and the a linear prediction of yÂ (y_hat=**xÎ²**).Â The parameter vector ‘* Î²_ols*‘ minimizes this distance.Â Â The most important assumptionÂ in order for

**toÂ reflect toÂ true parameters in the populationÂ is for the regressors to be uncorrelated with the error terms (**

*Î²**cov(*).Â Sometimes this is not the case.Â The assumption fails if:

**x**,e)=**0**- There are omitted variables which are correlated with the regressors (
)*x* - We have a system of simultaneous equations.
- There is an errors in variables problem
- The system has a lagged dependent variable with a serially correlated disturbance

**Instrumental Variables**

One solution to these problems is to use an Instrumental Variables (IV) technique.Â (Click here for an explanation of IV).Â A question remains as to when OLS is appropriate and when IV is best.Â OLS will generally give smaller standard errors (and thus is more precise) and is to be preferred when the **Î²_OLS** parameters are unbiased.Â Â

**Hausman Endogeneity Test**

To test whether the IV or OLS regression technique is best, one can use the Hausman endogeneity test.Â Let us try to estimate the following equation:

- (1)Â y1 =
**x1*****Î´**+ y2*Î± + e

Let the vector **z**=(**x1**,**x2**) be the set of all exogenous variables.Â The vector x1 is the set of regressors and x2 are our instruments.Â Since **z** is exogenous, we know E(**z**â²*****e)=**0**.Â The variable y2, we believe to be endogenous.Â

One example of an endogenous y2 would be a wage equation where y1 is the individual’s wage and y2 is the number of hours worked.Â We would think that full time workers would earn more than part time workers so hours would affect wage.Â On the other hand, when a worker’s wage is higher (assuming leisure is a normal good) one would expect the individual to work more hours.Â In this example we have dual causation.

To conduct the Hausman test, we first find the linear projection of y2 on **z** using OLS.

- (2) y2 =
**z*****Ï**+ v

SinceÂ the error term from the first equation (‘*e*‘)Â is uncorrelated with **z** by assumption, then y2 is endogenous if and only if E(v*e)â 0.Â We can test whether the structural error ‘*e*‘ is correlated with the reduced form error (‘*v*‘) using the following equation:

- (3) e = Ï?*v + u

If we plug equation 3 into equation 1 and we have:

- (4) y1 =
**x1*****Î´**+ y2*Î± + Ï?*v + u

In empirical data, however,Â ‘*v*‘ is not observed.Â Nevertheless, we can estimate ‘*v_hat*‘ by taking the saved residuals from our OLS regression in equation 2 and plugging these numbers into equation 4 for ‘v’.Â TheÂ final equation is:

- (5) y1 =
**x1*****Î´**+ y2*Î± + Ï?*(v_hat) + u

We can now consistently estimate **Î´**,Î±, and Ï? using OLS.Â Using the usual OLS t-statistic, we can test the null hypothesis that Ï?=0.Â If we accept the null, then there is no endogeneity problem and one should use an OLS estimation strategy.Â If Ï?â 0, then the instrumental variables technique is best.Â One can also use a heteroskedasticity-robust t-statistic for testing Ï? if one suspects heteroskedasticity.Â

A similar set of procedures can be extended to the case where y2 is a vector.Â Instead of an t-test on the residual ‘*v_hat*‘, in the vector case we would have to preform an F-test (**Ï?=0**) on a vector of residuals ‘* v_hat*‘.Â To see how to preform Hausman tests in the Stata statistical package, look at this paper by Baum, et al.

**Summary**

- Preform first stage regression of the endogenous variable (y2) on
**z**. - Calculate the residuals from this equation and include them as an additional regressor in the original estimation equation.
- Run OLS on this new equation and preform a t-test for the coefficient on the first stage residuals.
- If one accepts the null hypothesis, then there is no endogeneity problem and OLS should be used.Â If one rejects the null hypothesis, then endogeneity is a problem and one should use an IV estimation strategy.

**References**

Wooldridge, Jeffrey; *Econometric Analysis of Cross Section and Panel Data*, MIT Press, London, (c) 2002, pp. 118-122.Â Â

This is a really nice and clear explanation!