Econometrics

# Model Fit for a Logistic Regression

How do you know if your model fits the data well?  When applying an OLS regression, the standard metric is the R-squared (i.e., R2).   If you have a dependent variable that is binary, however, most researchers prefer a logistic regression.  If you choose a logistic rather than an OLS approach, however, how do you know if your model fits the data well?

One option is the Pearson chi-squared statistic.  This statistic relies on Pearson residuals calculated as:

• rj=(yjπ) / [π (1-π)]

Where yj is the value of the dependent value for observation j, and πj is the predicted values from the logistic regression for observation j.

The summary statistic based on this residuals in the Pearson chi-square statistic:

• χ2 =∑j rj

Another option is the Hosmer-Lemeshow tests.  This test has the advantage in that it doesn’t treat all observations as the same.  Instead, it calculates the fit of the model stratified by group.  These groups are often deciles (or other percentile groupings) based on the fitted values.  One can calculate this statistic as:

C = ∑{g=1 to G} (og-ng πg)/ [ng * πg(1- πg)]

where og is the number of observed events in group g, πg is the average predicted values for the observations in group g,  and ng is the number of observations in group g.  The value of C is well approximated by the chi-square distribution with G-2 degrees of freedom.

Hosmer and Lemeshow (2000) state that the advantage of their statistic is that “…it provides a single, easily interpretable value that can be used to assess fit.  The great disadvantage is that in the process of grouping we may miss an important deviation from fit due to a small number of individual data points.”  The authors recommend an analysis of individual residuals as well as applying the Hosmer-Lemeshow test.