Econometrics

How to choose a Bayesian prior

Bayesian analysis is increasingly common in health economic research.  To apply Bayesian models, however, you need to select a prior distribution.  How do you select your prior?  Andrew Gelman (of the excellent Statistical Modeling, Causal Inference, and Social Science blog) provides some advice on selecting a prior on the stan-dev GitHub website.  I review some of his recommendations below.

Types of priors:

  • Flat prior;
  • Super-vague but proper prior: normal(0, 1e6);
  • Weakly informative prior, very weak: normal(0, 10);
  • Generic weakly informative prior: normal(0, 1);
  • Specific informative prior: Will depend on the assumption but an example would be normal(0.4, 0.2);

These priors of course would need to be scaled, but the examples above assume that the key parameters are close to a unit scale (e.g., 0 is average test score and 1 represents a 1 SD increase in test score, or where 0 is zero dose and 1 is a standard dose of a drug). 

  • Be transparent with your assumptions. If you use informative priors, you need to state why you believe informative priors are better than uninformative.  Clearly, informative priors can be helpful, especially if there is a large literature from similar studies.  However, priors can also be chosen strategically to bias results in certain directions.  Thus, all informative priors need to have a clear justification for why this is better than vague or weakly informative prior (see example of Section 4.1 of this paper)
  • Only use uniform priors if parameter range is restricted.  For instance, if you know a parameter must fall between 0 and 1, a normal distribution clearly will have some values <0 and >1, and thus a uniform or flat prior may be preferred.  On the other hand, if you use a uniform prior to cover the range you think is most likely, you will artificially be restricting the range of the parameter values.  Example: If you believe a parameter could be anywhere from 0 to 1, but these are not hard bounds, use normal(0.5, 0.5) instead of uniform(0,1).
  • Use of super-weak priors can be helpful for diagnosing model problems. If you pick weak priors (e.g., N(0, 1000)) for all parameters, then the priors basically provide no information.  This may be a helpful first step to diagnose any major problems with your model, even if you believe stronger priors are better.  At least as a sensitivity analysis, including super weak priors is helpful to show how the results would vary if you did not impose stronger priors.
  • Publication bias and available evidence.  One approach often taken is to use previous estimates from the literature as a prior for a specific parameter.  In theory, this makes sense.  In practice, however, this prior could be biased if there is publication bias.  Thus, one may want to shrink the prior towards zero or inflate the standard error (e.g., see Edlin factor adjustment).
  • Fat tails.  If you think a normal distribution seems reasonable, but think that there may be fat tails in your distribution, consider a student’s t distribution.  Less degrees of freedom will result in fatter tails.
  • Try to make the parameters scale free.  For instance, if your model looks at age and income, age is generally measured between 0 and 100 whereas income is measured between 0 and $1m (or for many 0 and $1 billion or more).  Thus, a one unit change in age is very different from a one unit change in income.  Some approaches to ‘de-scale’ parameters would be: (i) divide by the standard deviation to get a z-score, (ii) in a regression, take logs of (positive-constrained) predictors and outcomes, and then the coefficients of interest can be interpreted as elasticities, (iii) divide by the average or typical number of events.
  • Don’t be overconfident in your prior.  A good procedure is to take a best guess of your prior, and then increase the dispersion.   The cost of uninformative priors is that you are putting too much weight on your actual data; the cost of too strong a prior is that you are letting assumptions rather than data do most of the work.  Most researchers would prefer to let the data you have do the talking.

Leave a Reply

Your email address will not be published. Required fields are marked *