Generalizing the Results of Randomized Controlled Trials

The benefit of randomized controlled trials is that one can identify the causal effect of a certain treatment in the absence of selection effects (assuming the randomization is properly done).  However, RCT results often do not generalize to the larger population.  Reasons for this include:

  • The RCT population may not be representative of the population at large.  Frequently, RCT participants receive treatment for a disease which is their only disease; in the real world, patients suffer from multiple comorbidities and thus the RCT treatment may be less effective in practice.
  • Medication adherence is often lower in practice than during an RCT.  “For example, the Women Take Pride study assessed group versus self-directed behavioral interventions for women with heart disease (Janevic et al., 2003) and found much higher adherence rates for the preferred interventions (Long, Little and Lin, 2008).”
  • The patients who choose a certain treatment in practice may not be similar to the treatment patients are assigned to in the RCT.  For instance, sick patients outside an RCT may not choose an intensive treatment if the side effects are too severe.

How can we address these issues?  The Healthcare Economist explores some options from the literature.

Randomized Preference Trials

Too address some of these problems, a paper by Marcus et. al. proposes the use of a doubly randomized preference trial (DRPT).  In this set-up individuals are randomized two ways.  First, individuals are randomized into a randomization arm of the trial (where they are assigned to the treatment or control group) or a preference arm (where they get to choose whether to receive the treatment or the control.

Assume that z=1 indicates that the patient receives the treatment and z=0 indicates that the patient receives the control.  In the DRPT, one must also indicate whether the patient is in the randomization arm (w=1) or the preference arm (w=0).

A typical RCT measures the following:

  • E(y1|z=1, w=1) – E(y1|z=0, w=1)

However, one can also measure the effect of randomization on the outcome. Specifically:

  • E(y1|z=1, w=1) – E(y1|z=1, w=0)
  • E(y0|z=0, w=1) – E(y0|z=0, w=0)

where the first equation above indicates the effect of randomization versus preference for those who receive treatment and the second equation indciates teh effect of randomization versus preference for those who receive the control.

To control for differences in patient characteristics in both arms, Marcus et al. uses propensity score matching to determine the probability the individuals sort into the treatment compared to the control.

Other Forms of Preference Randomization

“Zelen (1990) proposed a randomized consent design that first randomizes subjects to the treatment and control conditions. Subjects randomized to treatment are then asked to give consent to
receive the treatment. The subject is given the treatment if the subject gives consent and is given the control otherwise.”

“Another type of randomization is the partially randomized preference trial (PRPT).  “Generally, most trials exclude subjects who do not give consent for randomization, which is another factor that reduces the generalizability of results from randomized trials. In the PRPT, subjects who give consent for randomization are randomized to treatment versus control conditions. Those who do not give consent for randomization are instructed to choose treatment or control conditions and are followed similarly to those in the randomized portion.”


In many cases, the sample population is not representative of the national population. One method to adjust the estimates is to weight the observations in the sample so that the observations correspond to the national sample. For instance, an inverse probability of treatment weighting (IPTW). The IPTW methods give each individual their own weight which is calculated as the inverse propensity scores. The IPTW method does have some drawbacks, however. According to Stuart et al. (2011) “the results can be somewhat unstable, especially if there are extreme weights, and the method is more sensitive to the specification of the propensity score model than are other propensity score approaches.”

Another method is to form a small number of subclasses and group individuals with similar propensity scores into these classes. For instance, this procedure can be accomplished using propensity score deciles. “However, subclassification approaches can suffer from having too few subclasses and thus insufficient bias reduction”

Stuart et al. proposes another alternative: full matching. “Full matching forms a relatively large number of subclasses, where in our use each subclass will have at least one member of the sample and at least one member of the target population, but the ratio of sample to population members in each subclass can vary. The subclasses reflect the fact that some areas of the propensity score space will have relatively few sample members and many population members, whereas other areas will have relatively few population members and many sample members. Full matching has been shown to be optimal in terms of reducing propensity score differences within subclasses (Rosenbaum, 1991).”


1 Comment

Leave a Reply

Your email address will not be published. Required fields are marked *