## Statistical Power

What is power?  Merriam Webster defines power as the “possession of control, authority, or influence over others.”  The power I will talk about today, however, is statistical power.  Statistical power measures the ability of a statistical test to determine whether the null hypothesis is false.  For instance, in the U.S. judicial system, the null hypothesis…

## Regression Discontinuity

Regression Discontinuity is an econometric method that has become popular in recent years.  Let me give you an example where regression discontinuity would be valid.   Let us say that all students who score 1000 or more on their SATs matriculate at Ivy U and all students who score below 1000 attend college at State…

How do you estimate the specific risk a smoking has on the probability of being hospitalized.  If smokers on average have lower income and less educational achievement, is smoking truly causing the increase in hospitalization or could the covariates fully or partially explain the increased hospitalization rates? A paper by Kleinman and Norton suggests using…

## The History of Least Squares

Let us say you have 10 observations of 2 different variables.  How do you determine which of the observations to use?  Should you throw out the outliers?  Should you only include the most similar values?  Does more observations increase or decrease the amount of measurement error? These problems can be answered by the discipline of Statistics.…

## ANOVA

Let us say that you are a hospital administrator.  You are very clever and have come up with a system to score the quality of the work done by the physicians at your hospital.  To simplify things, lets assume that you only have 3 physicians who work at your hospital.  The physician’s scores are as…

## Heckman’s “Econometric Causality”

Nobel laureate James Heckman has a nice summary of how applied econometricians and policy researchers should define causality. Some of the more interesting points I have excerpted below. On the source of randomness in a sample “One reason why many statistical models are incomplete is that they do not specify the sources of randomness generating…

## Sample Selection vs. Two-part Model

Much of health care data is characterized by a large cluster of data at 0, and a right skewed distribution of the remaining outcomes. For instance, people who do not get sick generally use \$0 of medical care. Those who do get sick, use a varying amount of medical care dollars, but there are a…

## Finite Mixture Models

Let us assume that there are two types of people: smart people an dumb people. Smart people’s test scores are normally distributed about 80% and dumb people’s tests scores are normally distributed about 40% on their test. If we observe the test score of one person, how do we know if they are smart or…

## Serial Corelation and the Durbin Watson Statistic

What is the effect a country’s GDP on health? What about the country’s literacy rate on infant mortality rates? Often researchers try to answer these questions using time-series data. With time series data, we have observations of a few units (e.g.: countries or individuals) over many years. Let the subscript i represent the the individual…