Kernel density estimation

Let’s say that you have data. This could be say data on how many points Giannis Antetokounmpo scored in a game. One could fit some distribution to the data. For instance, you may know the mean and standard deviation and you could estimate a normal distribution. However, the data may not be normal. Or the data could be bounded (e.g., you can’t score negative points in basketball). You could pick a different distribution, but it may be hard to know which is correct. Instead, you could use a kernal density function to create a continuous probability density function (PDF) from discrete data.

Basically, you just put a distribution (with specified bandwidth) at each data point and sum these distributions to create a new “super distribution”. To make sure the “super” distribution is bounded at one, you simply divide by the number of observation. You can also create kernel density functions in multi-dimensional space as well.

I was planning to go through an example of how to do this, but webel od already has a great explanatory video for doing so on YouTube.

If you want to implement in R, there is a simple default density function you can use but for more customizability you can download the kdensity package.

Kernel density functions are super useful for a variety of applications including forecasting (by smoothing out data) as well as performing copula regressions (which I will touch on in a later post).