Is my distribution normal?

How can you tell if you have a normal distribution?  For instance, assume you have data on the results of a drug relative to a placebo.  You know the mean and standard deviation of the data, but that does not necessarily imply that the data is distributed in a normal fashion.

How can you do this test?  What if you think your drug trial results follow a uniform distribution?  Or what if you are interested in comparing these drug trial results against  crazy probability distribution you just invented?

There is a simply way how to do this: use the Kolmogorov-Smirnov test (K-S test).  The K-S test compares the empirically observed cumulative density function (CDF) in the data against any CDF of your choosing.  The K-S test aims to find the maximum difference between your empirical CDF and the assumed CDF at any point in the empirical CDF’s distribution.

Mathematically, the CDF for your emirical distribution is:

  • Fn(x) = n-1 Σ{i=1 to n} I[Xi≤x]

The Kolmogorov-Smirnov statistic for a given CDF F(x) is equal to:

  • Dn = supx|Fn(x) – F(x)|

An example of how to calculate the K-S statistic is here.

One can use the Kolmogorov distribution (tables) and the K-S statistic Dn to determine the probability that the empirical distribution matches the assumed CDF F(x).

To perform the Kolmogorov-Smirnov equality-of-distributions test in Stata, one can use the ksmirnov command.


Leave a Reply

Your email address will not be published. Required fields are marked *