In previous posts, I have explained how to create bootstrap estimates for a variety of statistics. Doing so is fairly simple and involves a 3 step procedure:
- Step 1: Using the observe data, create m boostrap data sets by using random resampling with replacement.
- Step 2: Calculate the statistic of interest for each bootstrap data set.
- Step 3: The bootstrap estimate of the statistic of interest is the average value from Step 2 across all bootstrap samples.
One question that has not yet been answered is how to calculate the confidence interval for the statistic of interest. A paper by Haukoos and Lewis describes five methods for computing bootstrap confidence intervals: i) normal approximation, ii) percentile, iii) bias-corrected(BC), iv) bias-corrected and accelerated (BCa) and v) approximate bootstrap confidence (ABC) methods.
The normal approximation method is calculated as follows:
- original statistic +/- Z* (standard Error)
For instance, for a 95% confidence interval, Z=1.96. Another alternative is to use the percentile method. To calculate, the percentile confidence intervals for a 95% CI, one simple takes calculates the 2.5 and 97.5 percentiles for the distribution of statistics calculated in Step 2 of the bootstrap procedures.
The other bootstrap CI methods are a bit more complex.
The BCa method adjusts for bias in the bootstrapped sampling distributions relative to the actual sampling distribution, and is thus considered a substantial improvement over the percentile method. The BCa confidence interval is an adjustment of the percentiles used in the percentile method based upon the calculation of two coefficients
called ‘‘bias correction’’ and ‘‘acceleration.’’ The bias correction coefficient adjusts for the skewness in the bootstrap sampling distribution. If the bootstrap sampling distribution is perfectly symmetric, then the bias correction will be zero. The acceleration coefficient adjusts for nonconstant variances within the resampled data sets. The ABC method is an approximation of the BCa method that requires fewer resampled data sets than the BCa method.
Programming Bootstrap CI
In Stata, programming the bootstrap CI using the base-corrected and accelerated method is straightforward.
bs ‘‘centile var1’’ ‘‘r(c_1)’’,
The “centile” command calculates the median value for the variable var1. The “bs” command calculates the bootstrapped CI for the median value of var 1, where “r(c_1) refers to the reference statistics for which the 95% CI will be calculated. The rep(2500) indicates that there will be 2500 re-sampled data sets.
Calculating the boostrapped correlation measure for two variables is just as easy.
spearman var1 var2
bs ‘‘spearman var1 var 2’’ ‘‘r(rho)’’,
Programming the bootstrap in SAS relies on the %boot macro. Additional information on bootstrap programming in SAS is available here.
- Haukoos JS, Lewis RJ. Advanced statistics: bootstrapping confidence intervals for statistics with “difficult” distributions. Acad Emerg Med. 2005 Apr;12(4):360-5.