G3.
Gauge data fitness

This step asks whether the available EHR data are suitable for estimating the target estimand. EHR data may be large, but size alone does not mean the data are appropriate for the question.

Researchers should assess whether the sample, variables, measurement timing, follow-up period, and data quality are sufficient for the intended research task. If the data are not fit for purpose, the question, estimand, data source, or analysis plan may need to be revised.

Description

Sample Requirements

The sample should be drawn from the target population or contain sufficient information about the selection mechanism to enable standardisation or reweighting to that population.
Sample size should be sufficient to estimate the estimand with desired level of precision
Period of observation is sufficient for target estimand (e.g. calendar time coverage sufficient for trend analysis, or follow-up long enough for cumulative incidence estimation)

Variable Requirements

Key health state, event, exposure, or practice of interest and all key auxiliary variables are available and accurately measured
Variable definitions, coding practices, and recording completeness are consistent across data sources, sites, and time periods (e.g. diagnostic codes are harmonised across datasets, and any secular changes in coding or recording practices are unlikely to produce artefactual trends)

Signal Discovery

Sample Requirements

The sample should be drawn from the target population, or contain sufficient information to enable transportation or reweighting of estimates to that population.
Sample size should be sufficient to detect desired effect sizes after multiple testing correction, accounting for case-control imbalance where applicable (e.g. effective sample size in case-control designs.

Variable Requirements

Key variables (i.e. exposures/variants and outcomes/traits) are consistently defined, available, and accurately measured across all data sources and cohorts
Where the signal of interest takes time to emerge, the period of observation is sufficient to detect it (e.g. the duration of follow-up in a pharmacovigilance analysis is sufficient for relevant adverse outcomes to occur)

Factual Prediction

Sample Requirements

The sample should be drawn from the target population, or contain sufficient information to enable transportation or reweighting of estimates to that population.
Sample size should be sufficient to estimate baseline risk with desired precision, while minimising overfitting and optimism (e.g. using pmsampsize or equivalent)
Period of observation is sufficient to observe the outcome within the intended prediction horizon (e.g. follow-up long enough to observe 5-year cardiovascular events.

Variable Requirements

Outcome, treatments, and all key predictors are available, accurately measured, and consistently defined across the study period and any external validation samples
All predictors are available at the reference time point (i.e. measured at or before the landmark time for prognostication, or available at the point of screening for classification)

Counterfactual Prediction

Sample Requirements

The sample should be drawn from the target population, or contain sufficient information to enable transportation or reweighting of estimates to that population.
Sample size should be sufficient to estimate counterfactual risks with desired precision, while minimising overfitting and optimism, and with adequate observations per confounder to avoid sparse data bias
Period of observation is sufficient to observe the outcome within the intended prediction horizon, with sufficient prior observation time to establish baseline exposure status

Variable Requirements

The outcome, all hypothetical treatment strategies, and all confounders, are available and accurately measured, with consistent definitions and coding practices across data sources, sites, and time periods
Variables are measured with sufficient timing and frequency to establish the correct causal ordering between the hypothetical treatment strategies and the outcome.
All hypothetical treatment strategies of interest are observed across all relevant confounder strata

Causal Effect Estimation

Sample Requirements

The sample should be drawn from the target population, or contain sufficient information to enable transportation or reweighting of estimates to that population.
Sample size should be sufficient to estimate the estimand with desired precision, with adequate observations per confounder to avoid sparse data bias
Period of observation is sufficient to observe the outcome following exposure (e.g. follow-up long enough for the outcome to accrue, and sufficient prior observation time to establish baseline exposure status and exclude prevalent users)

Variable Requirements

Exposure, outcome, any mediators, and all key confounders are available and accurately measured, with consistent definitions and coding practices across data sources, sites, and time periods.
Variables are measured with sufficient timing and frequency to establish the correct causal ordering between exposure and outcome
Exposure (and any mediators) varies within the sample and across all relevant confounding strata

By the end of this step, you should have:

Assessed whether the sample reflects or can be transported to the target population
Confirmed that key variables are available and measured with sufficient accuracy
Checked whether timing and follow-up support the estimand
Identified gaps in variable availability, measurement, or observation periods
Decided whether to proceed, refine the question, enrich the data, or seek another data source

G3.
Gauge data fitness

Description

Signal Discovery

Signal Discovery

Factual Prediction

Counterfactual Prediction

Causal Effect Estimation

By the end of this step, you should have:

Next: Outline Sources of Error, Bias, & Threats to Validity

RIGOROUS

RIGOROUS Framework

G3.Gauge data fitness

Description

Signal Discovery

Signal Discovery

Factual Prediction

Counterfactual Prediction

Causal Effect Estimation

By the end of this step, you should have:

Next: Outline Sources of Error, Bias, & Threats to Validity

RIGOROUS

RIGOROUS Framework

G3.
Gauge data fitness