## The problem with odds ratios

Many researchers use logit models to estimate the effect of specific variables on a binary (i.e., 0 or 1) outcome.  How are these models derived?  How are odds ratios calculated?  What are the problems with odds ratios?  I answer all these questions in this post, following a lovely summary by Norton and Dowd (2018). Deriving…

## Longitudinal Modelling of Healthcare Expenditures: Challenges and Solutions

Previous analyses–such as Basu and Manning 2009–have addressed the problem of mass of health care expenditures around \$0. In typical economic analyses, we assume that the dependent variable is normally distributed. In the case of health care expenditures, however, a large number of people have \$0 expenditures (i.e., healthy individuals). Further, among sick individuals that…

## Identifying high quality providers in the presence of heterogeneous preferences

Why is it so difficult for health care payers to identify a “best” provider?  A paper by Gutacker and Street (2017) explains: There are two key elements that complicate assessment of how well public sector organisations are doing their job (Besley & Ghatak, 2003; Dixit, 2002). First, they lack a single overarching objective against which…

## Stratified Covariate Balancing

When selection bias is an issue, many researchers use propensity score matching to insure that observable differences in patient characteristics are balanced between individuals who receive a given treatment and those who do not.  If unobservable characteristics are correlated with observable characteristics, propensity score matching generally works well. Cases where propensity score matching does not work well include…

## What is a Pseudo R-squared?

When running an ordinary least squares (OLS) regression, one common metric to assess model fit is the R-squared (R2). The R2 metric can is calculated as follows. R2 = 1 – [Σi(yi-ŷi)2]/[Σi(yi-ȳ)2] The dependent variable is y, the predicted value from the OLS regression is ŷ, and the average value of y across all observations…

## Optimal Matching Techniques

In randomized controlled trials, participants are randomized to different groups where each group receives a unique intervention (or control). This process insures that any differences in the outcomes of interest are due entirely to the interventions under investigation.   While RCTs are useful, they are expensive to run, are highly controlled and suffer from their own…

Berkson’s paradox happens when given two independent events, if you only consider outcomes where at least one occurs, then they become negatively dependent.  More technically, this paradox occurs when there is ascertainment bias in a study design. Let me provide an example. Consider the case where patients can have diabetes or HIV.  Assume that patients have a positive probability of…

## AA and selection bias

This video that discusses whether alcoholics anonymous actually improves the outcomes of alcoholics who attend the meeting.  More broadly, the video the AA treatment effect discussion serves as an example for expounding on some fundamental statistical issues such as selection bias, randomization, intention to treat, marginal effect, instrumental variables, and others.

## LOWESS Curves

Often times when doing data analysis, you want to find the relationship between two variables.  The first step is typically to plot a scatterplot.  To better understand this relationship, however, it is useful to fit a line to the scatterplot.  Most commonly, this is done with a simple linear regression (i.e., ordinary least squares (OLS)…

## What are regression trees?

Regression trees are a way to partition your explanatory variables to (potentially) better predict an outcome of interest.  Regression trees start with a an outcome (let’s call it y) and a vector of explanatory variables (X).   Simple Example For instance, let y be health care spending, X=(X1,X2) where X1 is the patient’s age and X2 is the patient’s…