Often times when doing data analysis, you want to find the relationship between two variables.  The first step is typically to plot a scatterplot.  To better understand this relationship, however, it is useful to fit a line to the scatterplot.  Most commonly, this is done with a simple linear regression (i.e., ordinary least squares (OLS) regression).

In many cases, however, the relationship between two variables may not be linear.  Consider the case of y=sin(x) where x ranges from 0 to 2π.  A linear regression would lead to y=β01*x where β01=0. However, this line would poorly fit the data as the sin curve undulates up and down reaching a maximum value at sin(π/2)=1 and a minimum value at sin(3π/2)=-1.

One way to better see this relationship is to fit a LOWESS or LOESS regression, also known as a local regression. The local regression creates smooth curves in your scatterplot by fitting points within local subsets of your data. More inclusive subset definitions (i.e., bandwidth) produces smoother curves, but ones that are less sensitive to local outliers; smaller bandwidth approaches can better capture local deviations, but are also more likely to overfit the data.

What is the difference between LOWESS and LOESS?

A smooth curve through a set of data points obtained with this statistical technique is called aLoess Curve, particularly when each smoothed value is given by a weighted quadratic least squares regression over the span of values of the y-axis scattergram criterion variable. When each smoothed value is given by a weighted linear least squares regression over the span, this is known as a Lowess curve; however, some authorities treat Lowess and Loess as synonyms.

The video below describes how to implement a LOWESS curve in R.  Stata also has a LOWESS command as well.

Leave a Reply

Your email address will not be published. Required fields are marked *