Q: What is the effect of the number of patient comorbidities on health care spending?

A: This is a simple analysis to do. One could just run a regression with cost as the dependent variable and the number of comorbidities as an independent.

Q: What if the effect of the number of comorbitiides is non-lineaer. For instance, going from 1 to 2 comorbidieis may not increase health care cost as much as going from 2 to 3 comorbidities and so on.

A: Good point. In that case, you could just include a series of indicator variabiles for the number of indicator variables. In other words, you would have an indicator for the patient having exactly 1 comorbidiity, exactly 2 comorbidity, exactly 3 comorbidities, and so on.

Q: This is a nice, elegant solution. One drawback, however, is that it treats all comorbidities as equivalent. For instance, consider two sets of individuals. The first set of individuals has diabetes and hypertension; the second set of individuals have diabetes and schizophrenia. In both cases the number of comorbidities is 2 but the latter group is likely to have higher cost.

A: Well in this case, you could simply include indicator variables for your disease combinations of interest. For instance, if you are interested in (D)iabetes, (H)ypertension, and (S) chizophrenia, the indicators would be D only, H only, S only, DH, DS, HS and DHS where D is diabetes.

Q: That works well in your simple example. In practice, however, most researchers focus on a large number of disease. In that case, the number of disease interactions can get very large.

A: Excellent point. In addition to the large number of covariates, there typically are very few observations for any given disease combination. A study by Eckardt et al. (2016) had a sample of 1050 patients between ages 65 and 85 and there were 1047 unique combintations of comorbidities.

Q: Wow! That is a big problem. So, what is the solution?

A: One approach would use a hierarchical model to reduce the dimensionality. However, there is not a clear nesting structure when discussing patient comorbidities so this approach generally will not work in this case.

Q: Any other suggestions

A; Well, Eckhardt and co-authors use a finite mixture model. Finite mixture models include models you may be more familiar with including latent class models.

Q: Can you describe what they did?

A: Below is a quote from their paper:

We used a finite mixture of generalized regression techniques in order to statistically learn from data. Thus, opposite to mixtures of densities, we applied an extended finite mixture of regressions— also known as clusterwise regression— to capture unobserved heterogeneity of the regression coefficients. Thus, instead of taking specific patterns of single disease combinations into account, the aim of our study was to detect components of diseases within arbitrary morbidity patterns that influence healthcare costs in elderly patients with multiple chronic conditions.

After doing Akaike information criterion (AIC) test, they were left with a 4 component mixture model, where each component used a different gamma distribution.

Q: And what did they find?

A: Again, I quote from their study.

As expected, mean costs tend to increase with an increasing number of comorbidities in case of component 1 (group 1). A similar but less pronounced tendency is shown by component 4 (group 4). On the contrary, in component 3 (group 3) the mean costs tend to decrease with the number of comorbidities. This might be a hint that component 3 (group 3) captured the most expensive single disease leading to high mean costs with low numbers of comorbidities, while the mean costs in component 1 (group 1) and component 4 (group 4) were mainly caused by cumulative effects. Especially in component 1 each additional disease caused an increase in mean costs. In contrast, mean costs within component 4 are at a significantly lower level.

Q: I’m confused. Is this at all useful.

A: I agree that the summary statistics are not all that useful. However, you can combine the components to get results of interest. For instance, the authors found that “diagnosed obesity, osteoporosis and cerebral ischemia/chronic stroke cause positive effects on costs.” Additionally, for any component, you can see which components factor most for each disease.

For example, while asthma/COPD is ranked 16 (17%) in component 1, it is ranked 9 in all other components (24% to 26%). Similarly, prostatic hyperplasia achieved rank 22 (11%) in component 1 in contrast to the ranks 10 (component 2, 24%), 12 (component 3, 22%) and 14 (component 4, 18%).

Q: Interesting stuff, isn’t it?

A: I agree!