Useful Data Files for Simulation Modelling

In recent years, researchers have created a number of complex simulation models to estimate future trends and the impacts of policy changes.  Some of the more prominant models include:

  • RAND Comprehensive Assessment of Reform Efforts (COMPARE)
  • CBO’s Health Insurance Simulation Model (HISim)
  • Urban Institute’s Health Insurance Reform Simulation Model (HIRSM),
  • Future Elderly Model (FEM)

These models use a variety of data sources.  For instance:

…the RAND COMPARE microsimulation model uses the Survey of Income and Program Participation (SIPP) as the principal base dataset. Data from the Medical Expenditure Panel Survey (MEPS) regarding health care utilization and medical expenditures are statistically matched to SIPP respondents based on age, insurance status, health status region, and income, and individuals are assigned to synthetic firms with particular characteristics and health benefit offerings using data from the Kaiser/HRET survey and based on region, firm size, and industry…Additional data regarding expected birth and death rates and rates of immigration may be used to project the population forward over time.

What data sources are available to model changes in the supply of providers? An article by Gresenz, Auerbach, and Duarte (2013) provide an overview answering just this question.

Type I Data: Supply of Providers

Unit-level data on the current supply of providers, including characteristics of the providers.  The characteristics can describe the providers themselves (e.g., primary care vs. specialists, for-profit vs. non-profit hospitals) and also describe the patient populations they treat (e.g., share of Medicaid patients)

Key data files for estimating current provider supply include:

  • American Medical Association’s (AMA) Masterfile. All allopathic and most osteopathic medical students in the United States are entered into the Masterfile’s records and are contacted every several years throughout their lifetimes.  Core elements include information about the physician’s age, location, contact information, detailed specialty, some practice and employer characteristics, and major function (e.g., patient care, research, education, administration).  However, variables such as hours worked, income, detailed practice setting, characteristics of patients seen, or time spent in various activities are not collected.  Summary information from the AMA Masterfile are available HRSA’s in the Area Resource File (ARF).
  • State Licensing data from state regulatory agencies. These data vary substantially across states in breadth and quality and there is little or no coordination across to standardize data elements or to aggregate data across states (although the Health Resources Services Administration—HRSA—is undertaking efforts toward such standardization)
  • Current Population Survey (CPS). The CPS contains more labor supply-relevant data than the AMA Masterfile, such as hours worked, family characteristics, and industry. Physicians are identified in survey if they report their primary occupation as a physician (or, if out of the labor force, they would remain in the sample if they identified physician as their predominant occupation over the past 5 years). However, the sample size is relatively small (on the order of 1,000 physicians per year are obtained) and information by physician specialty is not available.
  • American Community Survey (ACS). This survey is a large nationally representative survey, that replaces the long-form Census.  The survey has been collecting information on roughly 10,000 physicians per year since that time. Geographic location is available at the MSA level.
  • Community Tracking Study administered by the Center for Studying Health System Change, includes a physician survey which has been administered every 4 years since 1996 (although it may not be continued in the future).  The CTS physician sample was clustered within 60 metropolitan areas in the United States, but the 2008 sampling frame was changed to obtain a representative sample of physicians across the United States. Physicians are asked detailed questions about their practice size, type and organization, the financial incentives they face, the payer mix of the patients they see, and the percentage of their (or their organization’s) revenues derived from different payers. The physician component of the CTS is also known as the Health Tracking Physician Survey (HTPS).
  • National Ambulatory Medical Care Survey (NAMCS).  These data contain practice and organizational details for office-based physicians, but excludes anesthesiologists, radiologists, and pathologists. The sample goes back to 1973.  Although the NAMCS has county-level information, these data are restricted use.
  • Other sources.  IMS Health, SK&A, and the Medical Group Management Association (MGMA) assemble provider data primarily for marketing purposes.

Type 2 Data: Estimating future demand

Typically, these models assume that provider supply will respond to demand for medical services.  One can project demand for medical services using demographic projects.  These data may include age-adjusted death rates, birth rates, and immigration rates on the demand side.  One can use these statistics to to create a synthetic population for some future date in time.

  • American Association of Medical Colleges collects information school applicants, enrollments, and graduates
  • National Residency Matching Program publishes counts of physician residents by year and residency type (also in the September issue of JAMA).


Type 3 Data: Behavioral Parameters

Type 3 data includes not only primary data sources that can be used to estimate behavioral parameters but also secondary data sources that include published estimates of relevant parameters. The specific behavioral parameters selected depend on the impacts the model is trying to estimate.


Leave a Reply

Your email address will not be published. Required fields are marked *