What is a Pseudo R-squared?

When running an ordinary least squares (OLS) regression, one common metric to assess model fit is the R-squared (R2). The R2 metric can is calculated as follows.

  • R2 = 1 – [Σi(yii)2]/[Σi(yi-ȳ)2]

The dependent variable is y, the predicted value from the OLS regression is ŷ, and the average value of y across all observations is ȳ. The index for observations is omitted for brevity.

One can interpret the R2 metric a variety of ways. UCLA’s Institute for Digital Research and Education explains as follows:

  1. R-squared as explained variability – The denominator of the ratio can be thought of as the total variability in the dependent variable, or how much y varies from its mean. The numerator of the ratio can be thought of as the variability in the dependent variable that is not predicted by the model. Thus, this ratio is the proportion of the total variability unexplained by the model. Subtracting this ratio from one results in the proportion of the total variability explained by the model. The more variability explained, the better the model.
  2. R-squared as improvement from null model to fitted model – The denominator of the ratio can be thought of as the sum of squared errors from the null model–a model predicting the dependent variable without any independent variables. In the null model, each y value is predicted to be the mean of the y values. Consider being asked to predict a y value without having any additional information about what you are predicting. The mean of the y values would be your best guess if your aim is to minimize the squared difference between your prediction and the actual y value. The numerator of the ratio would then be the sum of squared errors of the fitted model. The ratio is indicative of the degree to which the model parameters improve upon the prediction of the null model. The smaller this ratio, the greater the improvement and the higher the R-squared.
  3. R-squared as the square of the correlation – The term “R-squared” is derived from this definition. R-squared is the square of the correlation between the model’s predicted values and the actual values. This correlation can range from -1 to 1, and so the square of the correlation then ranges from 0 to 1. The greater the magnitude of the correlation between the predicted values and the actual values, the greater the R-squared, regardless of whether the correlation is positive or negative.

So then what is a pseudo R-squared? When running a logistic regression, many people would like a similar goodness of fit metric. An R-squared value does not exist, however, for logit regressions since these regressions rely on “maximum likelihood estimates arrived at through an iterative process. They are not calculated to minimize variance, so the OLS approach to goodness-of-fit does not apply.” However, there are a few variations of a pseudo R-squared which are analogs to the OLS R-squared. For instance:

  • Efron’s Pseudo R-Squared. R2 = 1 – [Σi(yi-πˆi)2]/[Σi(yi-ȳ)2], where πˆi are the model’s predicted values.
  • McFadden’s Pseudo R-Squared. R2 = 1 – [ln LL(Mˆfull)]/[ln LL(Mˆintercept)]. This approach is one minus the ratio of two log likelihoods. The numerator is the log likelihood of the logit model selected and the denominator is the log likelihood if the model just had an intercept. McFadden’s Pseudo R-Squared is the approach used as the default for a logit regression in Stata.
  • McFadden’s Pseudo R-Squared (adjusted). R2adj = 1 – [ln LL(Mˆfull)-K]/[ln LL(Mˆintercept)]. This approach is similar to above but the model is penalized penalizing a model for including too many predictors, where K is the number of regressors in the model.  This adjustment, however, makes it possible to have negative values for the McFadden’s adjusted Pseudo R-squared.

There are a number of other Pseudo R-Squared approaches that are listed on the UCLA IDRE website.


1 Comment

  1. Given that GLMs are usually fit using iteratively reweighted least squares, whereby a weighted least squares is carried out on a transformed scale and the weights are optimized to approximate 1/variance, why is it that one could not just simply report the R2 associated with the weighted LS fit of the last iteration of the IRLS algorithm? See here for a reproducible example:

Leave a Reply

Your email address will not be published. Required fields are marked *