Model Fit for a Logistic Regression

How do you know if your model fits the data well? When applying an OLS regression, the standard metric is the R-squared (i.e., R²). If you have a dependent variable that is binary, however, most researchers prefer a logistic regression. If you choose a logistic rather than an OLS approach, however, how do you know if your model fits the data well?

One option is the Pearson chi-squared statistic. This statistic relies on Pearson residuals calculated as:

r_j=(y_j–π) / [π (1-π)]

Where y_j is the value of the dependent value for observation j, and π_j is the predicted values from the logistic regression for observation j.

The summary statistic based on this residuals in the Pearson chi-square statistic:

χ² =∑_j r_j

Another option is the Hosmer-Lemeshow tests. This test has the advantage in that it doesn’t treat all observations as the same. Instead, it calculates the fit of the model stratified by group. These groups are often deciles (or other percentile groupings) based on the fitted values. One can calculate this statistic as:

C = ∑_{g=1_{to G}} (o_g-n_g π_g)/ [n_g * π_g(1- π_g)]

where o_g is the number of observed events in group g, π_g is the average predicted values for the observations in group g, and n_g is the number of observations in group g. The value of C is well approximated by the chi-square distribution with G-2 degrees of freedom.

Hosmer and Lemeshow (2000) state that the advantage of their statistic is that “…it provides a single, easily interpretable value that can be used to assess fit. The great disadvantage is that in the process of grouping we may miss an important deviation from fit due to a small number of individual data points.” The authors recommend an analysis of individual residuals as well as applying the Hosmer-Lemeshow test.

Leave a Reply Cancel reply