Longitudinal Modelling of Healthcare Expenditures: Challenges and Solutions

Previous analyses–such as Basu and Manning 2009–have addressed the problem of mass of health care expenditures around $0. In typical economic analyses, we assume that the dependent variable is normally distributed. In the case of health care expenditures, however, a large number of people have $0 expenditures (i.e., healthy individuals). Further, among sick individuals that incur positive health care expenditures, the expenditure distribution is typically heavily right-skewed. Approaches such as a two-part model have been used to address this type of health care spending distribution for cross-sectional data.

When addressing these distributional challenges for cross-sectional data, however, there are additional challenges. As Smith et al. 2018 write

There are additional considerations with longitudinal expenditures, and less research compares strategies for modeling longitudinal expenditures. As with any longitudinal outcome,
the estimation must incorporate the correlation of repeated measurements (Basu and Manning 2009). Furthermore, the distribution of longitudinal expenditures and the proportion of zeros are dependent upon the timeframe under consideration (e.g., person-month vs. person-year).

Using 2000-2003 VA data for veterans with hypertension, they authors test the model fit of four models:

One-part models. “A one-part generalized linear model (GLM) fit the data using generalized estimating equations (GEEs) and treats the observed expenditures as realizations of a single process, so the model does not distinguish between zero and positive-valued expenditures. The estimating equation is g(E(Y_ij)=βX. Often the function g() is the log function. “Similarly to GLMs fit with quasi-likelihood for cross-sectional expenditures, GLMs fit via GEEs do not require specification of a parametric distribution…Rather, one specifies only the mean and variance. Often, the variance is given as a mean–variance relationship (e.g., variance proportional to the mean, Var[E(Y_ij)]=ρ E(Y_ij), where ρ represents a proportionality constant), and a link function, as described above, provides the form of the mean model.” One can implement these regressions using standard software (e.g., SAS’s PROC GENMOD using a REPEATED statement and PROC GEE ; for Stata use xtgee and with R’s gee or geepack work
Uncorrelated two-part models. These models are useful when there are a large number of 0’s or if researchers are interested in the process through which patients occur $0 of expenditures. Two-part models have a binary component–often logit–measuring the likelihood of having $0 expenditures, and continuous part modeling the expenditure distribution conditional on positive costs. One complication, however, is that the first part of the model measures the P(Y_ij=0) for all individuals whereas the continuous part measures spending only among individuals with positive expenditures. One other challenges is that these two model components “…are often correlated over time, such that the probability of incurring any expense is associated with level of expenditures over time. Failure to account for this correlation leads to informative cluster sizes in the second component and biased results.

Correlated conditional two-part (CTP) models. “The correlated conditional two-part (CTP) random-effects model allows for estimation of correlation between the binary and continuous parts of the longitudinal expenditures by specifying a joint random-effects distribution, including correlations/covariance between the random effects. One can model these as shown below. The two-part models are the same as below, but omit the random intercepts, b_1iand b_2i, which are assumed to be distributed multivariate normal. Although this model can be implemented with some standard statistical packages (e.g., PROC NLMIXED in SAS) it is much more complex than the two options above.
- logit(Pr(Y_ij>0)=αX+b_1i
- E(log(Y_ij|Y_ij>0)=δX+b_2i
Correlated marginalized two-part (MTP) models. These models are similar to the CTP, but the continuous, second part of the model is not continuous as shown in the example below. Like the CTP, this is computational difficult to implement, although it has been done with Bayesian approaches. “Similar to the correlated CTP model, parameter estimates in the binary component are subject-specific estimates. Specifically, exp(αk) represents the subject-specific odds ratio for incurring positive expenditures associated with a one-unit increase in the kth covariate. Parameter estimates in the second component represent effects on the overall mean, and those corresponding to covariates not included as random effects are both subject-specific and population average.”
- logit(Pr(Y_ij>0)=αX+b_1i
- E(log(Y_ij)=δX+b_2i

You can read the paper to see the results of the authors application to the VA data. However, the authors do provide some useful guidance in the discussion section regarding model selection.

First, is there interest in what influences the probability of incurring expenditures? If so, a two-part model may be appropriate. Secondly, is the primary interest in overall mean expenditures of the entire population or is more interest in the level of expenditures conditional on them being incurred? If the former, the one-part GLM or the MTP model is preferable; if the latter, one should consider a CTP model. The uncorrelated two-part GLM should only be considered if the analyst feels confident of no correlation between the two components, often an untenable assumption.

Smith, Valerie A., Matthew L. Maciejewski, and Maren K. Olsen. “Modeling Semicontinuous Longitudinal Expenditures: A Practical Guide.” Health services research (2018).
Basu, Anirban, and Willard G. Manning. “Issues for the next generation of health care cost analyses.” Medical care 47, no. 7_Supplement_1 (2009): S109-S114.

Leave a Reply Cancel reply