Difference-in-Differences Correlation Structures

About

This app explores correlation structures in longitudinal data with an eye toward in difference-in-differences (DID) studies. It has two main parts: simulated data and real data. The goal is to understand how common correlation structures look in data plots. The app plots of simulated data and correlations among observations over time.

Simulated Data

You can modify the inputs such as the number of time points, the number of units, sources of variation, and more. A full list of parameters is found on the Simulation tab and a full description of each is found on the Help tab.

  1. Play around with the inputs and see how the simulated data change.
    Q1) What parameters are most influential?
    Q2) Does the data-generating model behave the way you expect?
    Q3) Do these correlation structures resemble your “mental model” of how outcomes evolve over time?

  2. Input parameters derived from some of your own real data.
    Q1) How do the simulated data compare to your data?
    Q2) Do the correlation matrices look like you expect?

  3. Consider modeling approaches in light of these timeseries correlation structures.
    Q1) What do these longitudinal error covariances imply about the stability of trends over time?
    Q2) About the plausibility of DID assumptions?

Real Data

To supplement the simulated data, we provide data from three sources:

  1. From Dartmouth Health Atlas: annual estimates of ambulatory-care sensitive admissions per 1000 Medicare beneficiaries in each state, adjusted for age, sex, and race.
  2. From MarketScan claims data: estimated annual per beneficiary average total spending among commercial enrollees in each Metropolitan Statistical Area (MSA), adjusted for age, sex, and chronic conditions.
  3. From Medicare claims: estimated annual per beneficiary average total spending among fee-for-service enrollees in each Hospital Referral Region (HRR), adjusted for age, sex, race, dual eligibility, HCC score, chronic conditions, educational attainment in the 65-and-older population in each bene's ZIP, and the percent living in poverty among the 65-and-older population in each bene's ZIP.

Issues and Contact Information

To report bugs or to give suggestions/feedback, please email Bret Zeldow or Laura Hatfield.

Details on inputs to the simulated data

Markets [m=1,…,M]: The hypothetical intervention is applied at this level, such as states or hospitals, for example.

Observations [i=1,…,I]: Individual observations within each unit at each time. For example, residents of a state or patients within a hospital, for example.

Time points [t=1,…,T]: Times at which observations are made

First time point in post-treatment period: Must be between 2 and T

SD of market shocks [σT]: Roughly, how unstable the market-level mean trajectory is over time. Large values = “bumpy” market-level mean trajectories.

AR(1) market shocks [ρT]: Roughly, how correlated market-level errors are across time. Large values = less “bumpy”, but not pulled toward zero. A correlation parameter of 0 implies independent shocks within markets over time.

SD of obs within markets [σI]: Variance of individual observations around their market-level means. Large values = big spread of within-unit observations. Since the plots show observed market means, making this very large can overwhelm the market-level parameters.

Overall intercept [β0]: Mean of the outcome at the midpoint of the time period

Overall slope [β1]: Overall national trend, which is linear. The single slope parameter β equals the change in the outcome over the pre-intervention time period.

SD of market intercepts [σ0]: Variability of market fixed effects. Large values = more spread among the markets at any time point

SD of market slopes [σ1]: Variability of market time slopes. Large values = more variation among market trends around the national trend

Correlation between market level and slope [ρM]: Correlation between these two market-specific parameters. Positive values = markets with high levels have more positive slopes (relative to the national trend)

We simulate data for observation i in market m in year t using the following process:

  1. Draw market-level intercepts and slopes from a bivariate Normal distribution with mean 0 and 2x2 variance matrix formed using σ0, σ1, and ρM
  2. Compute market-level trends: add the overall intercept β0 to the market-level intercepts to obtain β0m; add the overall slope β1 to the market-level slopes to obtain β1m; compute β0m + z β1m where z is a sequence from -1 to 1 of length T
  3. Draw market-level shocks from T-variate Normal distribution with mean 0 and AR(1) variance matrix formed using σT and ρT
  4. Draw observation errors from univariate Normal distribution with mean 0 and standard deviation σI
  5. Add together the market-level trends, market-level shocks, and observation errors

To compute these quantities from your own data:

  1. Fit a simple linear model with time and unit fixed effects. (Alternatively, subtract time period averages and unit averages from each observations).
  2. Compute residuals from the model above. Then compute the empirical variance-covariance matrix of the collection of residual vectors.

To use the quantities computed from the three real data examples, hit the “Extract Simulation Parameters” button at the bottom of the Real Data tab and the parameters for that data set will be transferred over as inputs on the Simulated Data tab.