We health economists deal with medical cost data all the time. One challenge we all face is that the medical cost data is often censored. The censoring may occur because the patient dies. If you are using administrative health insurance claims data, censoring may occur because people switch their health plan and leave your sample.
The most common way of dealing with this problem is to drop all people for which you do not have complete data, and run the analysis only on the people for whom you have complete data. In some cases, researchers will conduct sensitivity analysis based on the continuous enrollment restriction they apply or stratify the results based on whether or not the patient died in the sample.
There is, however, another approach. Lin (2000) first describes a simple approach where people either are or are not censored. In this case, only people whose data is uncensored are included but the observations are weighted by an inverse probability weighting (IPW) based on the probability they will be censored in the data. Lin also proposes applying the same procedure for individual partitions. For instance, you could measure cumulative cost data through month 1, and do the IPW, then repeat the procedure through month 2 and use a different IPW weight and so on until you reach then end of your sample frame. If one is averaging monthly cost, one could average the different monthly cost partitions to get a more accurate estimate of the true average cost across the sample. Instead of just measuring average costs, one can also estimate regression parameters in each partition as well and sum these regression coefficients across partitions.
Griffiths et al. (2012) describes the procedure more eloquently in their application using SEER-Medicare data to examine how chemotherapy use affects costs among breast cancer patients. They describe their procedure as follows.
Patients were followed for up to 48 months (partitions) after diagnosis, and their actual total
cost was calculated in each partition. We then simulated patterns of administrative and dropout censoring and also added censoring to patients receiving chemotherapy to simulate comparing a newer to older intervention. For each censoring simulation, we performed 1000 IPW regression analyses (bootstrap, sampling with replacement), calculated the average value of each coefficient in each partition, and summed the coefficients for each regression parameter to obtain the cumulative values from 1 to 48 months.
Whereas Giffiths uses a linear regression, one can also apply generalized linear models (GLM) as well. Further, one can also use bootstrapping to create confidence intervals around the coefficients as follows:
Confidence intervals (CIs) for the cumulative cost coefficients were calculated by using a bootstrap approach, in which the process of performing 48 partitioned regression analyses and summing coefficients across partitions was repeated 1000 times using sampling with replacement from the original cohort.
In short, there are a number of creative ways for dealing with time-censored cost data.
- Lin, D. Y. “Linear regression analysis of censored medical costs.” Biostatistics 1, no. 1 (2000): 35-47.
- Griffiths, Robert I., Michelle L. Gleeson, Mark D. Danese, and Anthony O’Hagan. “Inverse probability weighted least squares regression in the analysis of time-censored cost data: an evaluation of the approach using SEER-Medicare.” Value in Health 15, no. 5 (2012): 656-663.