O4.
Outline Sources of Error, Bias, & Threats to Validity
EHR studies are vulnerable to specific sources of error, bias, and threats to validity. These may arise from who appears in the data, how variables are recorded, missingness, data source heterogeneity, sparse data, confounding, or incorrect time alignment.
This step helps researchers identify the major threats before analysis and describe how each will be prevented, reduced, assessed, or reported.
Description
-
Non-representative sampling and/or participation. The study sample does not represent the target population, either because the contributing healthcare systems are a non-random subsample of the population of interest, or because of informative presence*. When the probability of presence in the data is related to the health state, event, exposure, or practice of interest, occurrence estimates will be biased.
Collider restriction fallacy. When both the primary variable of interest and a stratifying variable are related to presence in the data, whether directly or through shared or intermediate causes, the apparent pattern of occurrence across strata may be misleading. Common instances include Berkson's bias*, index event bias**, survivorship bias***, and M-bias****.
-
Systematic measurement error in the primary variable: The health state/event of interest is subject to systematic measurement error, leading to misleading estimates of occurrence.
Differential measurement error across an auxiliary variable: The health state/event of interest is subject to systematic measurement error that varies across levels of one or more auxiliary variables (including data source, time, or calendar period), leading to misleading estimates within strata and misleading comparisons between strata.
-
Data are missing for the health state/event of interest or for auxiliary/stratifying variables, and the probability of missingness is related to the variable itself or to other variables of interest (e.g., due to informative observation processes). When the analysis requires follow-up over time (e.g., describing incidence or survival), loss to follow-up or informative censoring occurs when individuals leave the observation window for reasons related to the health state or event of interest (e.g., transferring care, death captured in a different system). Informative missingness can bias occurrence estimates and distort comparisons across strata.
-
Data are pooled from multiple healthcare systems or across calendar periods with different measurement practices and/or case-mix. This introduces uncertainty in pooled estimates, and can bias estimates within strata or distort comparisons across strata when these differences are related to one or more auxiliary variables of interest (e.g., temporal trends in occurrence may be artefacts of changing coding practices rather than genuine secular changes)
-
Insufficient observations within strata leads to unstable or biased estimates of stratum-specific occurrence (e.g., in standardisation, MAIHDA, or MrP)
-
N/A
-
Lead-time Bias
When comparing time-to-event between groups or over time, differences may be misleading if the timing of the index event (e.g., diagnosis) itself varies between groups or over time. For example, if diagnosis is made progressively earlier due to screening, apparent survival time may appear to improve over time even if the true course of disease is unchanged.Immortal Time Fallacy
When describing outcomes within strata of a time-dependent variable (e.g., 30-day mortality by duration of treatment), differences between strata may be misleading because strata requiring longer durations are only observable for individuals who survived to that point.
Signal Discovery
-
Type 2 selection bias (generalisability bias):
The sample is not a census or random sample from the target population, the distribution of certain effect modifiers differs between the sample and the target population, and the analytic sample cannot be reweighted to represent the target population. Detected signals may not replicate in the broader population, or true signals may be missed due to insufficient representation of relevant subgroups.Type 1 selection bias (collider restriction bias)
When both the scanning variable (exposure, genotype, or phenotype) and the outcome of interest are related to presence in the data, whether directly or through shared or intermediate causes, spurious signals may be detected. Common instances include Berkson's bias*, index event bias**, survivorship bias***, and M-bias****.
-
Outcome measurement error The outcome is measured with error, which may be non-differential or differential. Non-differential error can attenuate true associations below detection thresholds after multiple testing correction; differential error can generate spurious signals or mask true signals.
Scanning variable measurement error
The scanning variable (exposure, genotype, or phenotype) is measured with error, which may be non-differential or differential. Non-differential error attenuates associations, increasing the risk of missed signals; differential error can generate spurious signals in any direction.Dependent measurement errors
There is correlated measurement error in both the scanning variable and outcome, e.g. because both are derived from the same clinical documentation process. Dependent errors can generate spurious signals or mask true signals.
-
Data are missing for the outcome, scanning variable, or covariates, and the probability of missingness is related to the signal of interest. When the analysis requires follow-up over time, loss to follow-up or informative censoring occurs when individuals leave the observation window for reasons related to the scanning variable or outcome. Informative missingness can generate spurious signals or mask true signals.
-
Data are pooled from multiple healthcare systems or across calendar periods with different measurement practices and/or case-mix. This introduces uncertainty and can generate spurious signals or mask true signals when these differences covary with the scanning variable or outcome.
-
Insufficient observations for specific exposure-phenotype combinations leads to unstable association estimates, inflated effect sizes (i.e. ‘winners curse’), and an excess of false positives.
-
Unobserved baseline confounding
One or more common causes of the scanning variable and outcome are not captured in the data, generating spurious signals. Important instances include population stratification in GWAS (where ancestry-related genetic variation correlates with both the variant and the outcome through shared environmental or demographic pathways), confounding by indication in pharmacovigilance, and shared lifestyle or environmental confounders in ExWAS.Residual baseline confounding
One or more available baseline confounders are poorly measured or have been coarsened (e.g. dichotomised), meaning conditioning does not fully remove confounding. For example, principal component adjustment for population stratification in GWAS may not fully capture ancestry structure, leading to residual confounding by population stratification.
-
Lead-time bias
When the scanning variable influences the timing of the index event (e.g., a drug triggers earlier diagnostic investigation), spurious signals may be detected that reflect differential detection timing rather than genuine associations with the outcome.Immortal time bias
Scanning variable definitions that require a period of time to be satisfied (e.g., "at least 7 days of drug exposure") guarantee that exposed individuals have survived event-free for that period, potentially generating spurious protective signals or masking harmful signals.Prevalent user bias
Scanning for signals among prevalent users of an exposure means that adverse outcomes occurring before the observation window are uncaptured, selecting for individuals who survived and tolerated the exposure. This can mask true signals or attenuate genuine associations.
Factual Prediction
-
Non-representative development sample
The case-mix, predictor distributions, and/or outcome frequency in the development sample do not match the intended deployment population, either because the contributing healthcare systems are unrepresentative or because of informative presence*. Predictive performance will degrade when the model is applied to populations with different characteristics.Collider path learning
When presence in the data is related to both the outcome and one or more predictors, whether directly or through shared or intermediate causes, the model may learn associations that are induced by the selection process. Such associations will transport to identically selected populations but may not transport to populations with different selection properties, even when case-mix appears similar. Common structures that lead to such associations include Berkson's bias*, index event bias**, survivorship bias***, and M-bias****.
-
Outcome measurement error
The outcome is measured with systematic error, meaning the model learns to predict the true outcome incorrectly. Predictive accuracy will be degraded regardless of the deployment context.Predictor measurement error
One or more predictors are measured with error. The model will perform the same only when the measurement process operate in the deployment context. When error structures differ between development and deployment populations (e.g., due to different coding practices, diagnostic technologies, or documentation standards), predictive performance will degrade, even when case-mix appears similar.Dependent measurement errors
There is correlated measurement error in both the predictors and outcome, e.g. because both variables are measured by the same clinician during the same clinical encounter. The model may learn associations that are partly artefacts of shared measurement processes. Such associations will transport to identically measured populations, but may not transport to populations with different measurement processes.
-
Data are missing for the outcome or one or more predictors, and the probability of missingness is related to the outcome and/or predictors. When the analysis requires follow-up over time, loss to follow-up or informative censoring occurs when individuals leave the observation window for reasons related to the outcome or predictors. Informative missingness can degrade predictive performance, particularly if the missingness mechanism differs between development and deployment contexts.
-
Data are pooled from multiple healthcare systems or across calendar periods with different measurement practices and/or case-mix. The model may learn associations that reflect an average across heterogeneous settings rather than the relationships present in any single context, potentially degrading predictive performance when deployed in a specific setting.
-
Insufficient sample size and/or covariate separation leads to overfitted models with inflated performance that are not externally valid.
-
N/A
-
Inconsistent landmark time
Lack of a clearly-defined landmark time means model predictions correspond to an unknown weighted average of predictions at different clinical moments (e.g., after symptom onset, after screening, after formal diagnosis).Survivorship conditioning
Predictors that require a period of observation to be measured (e.g., treatment response, biomarker trajectories) implicitly require survival to that point, restricting the prediction to a subpopulation of survivors and potentially limiting applicability.
Counterfactual Prediction
-
Type 2 selection bias (generalizability bias)
The sample is not a census or random sample from the target population, the distribution of certain effect modifiers differs between the sample and the target population, and the analytic sample cannot be reweighted to represent the target population (e.g., due to absence of survey weights or insufficient auxiliary data to construct them). Counterfactual predictions will not generalise to the intended deployment population.Type 1 selection bias (collider restriction bias)
When both the hypothetical intervention and outcome of interest are related to presence in the data, whether directly or through shared or intermediate causes, the predicted outcomes under the intervention will be biased. Common instances include Berkson's bias*, index event bias**, survivorship bias***, and M-bias****.
-
Outcome measurement error
The outcome is measured with systematic error, meaning the model learns to predict the outcome under the specified intervention conditions incorrectly.Intervention measurement error
The intervention variables are measured with error, meaning the counterfactual predictions reflect a different intervention than the one specified and may be biased in any direction.Dependent measurement errors
There is correlated measurement error in both the intervention variables and outcome, e.g. because both are measured by the same clinician during the same clinical encounter. Dependent errors can bias counterfactual predictions in any direction.
-
Data are missing for the outcome, intervention variables, or other adjustment variables, and the probability of missingness is related to the outcome and/or intervention (e.g., due to informative observation processes). When the analysis requires follow-up over time, loss to follow-up or informative censoring occurs when individuals leave the observation window for reasons related to the intervention or outcome (e.g., transferring care, death captured in a different system). Informative missingness can bias counterfactual predictions in any direction.
-
Data are pooled from multiple healthcare systems or across calendar periods with different measurement practices and/or case-mix. The estimated causal effects underpinning the counterfactual predictions may reflect an average across heterogeneous settings, biasing predictions for any specific deployment context.
-
Insufficient sample size, or strong determination of one or more interventions of interest*, leads to few observations for certain intervention-covariate combinations, producing unstable counterfactual predictions.
Note. Poor covariate overlap may arise because the sample is too small to adequately represent all covariate patterns or because certain covariate patterns strongly predict exposure/intervention status. This second issue cannot be resolved by simply collecting more data.
-
Unobserved baseline confounding
One or more baseline common causes of the intervention and outcome are not captured in the data, leading to biased predictions. Important instances in EHR include confounding by indication (where the clinical reason for prescribing an intervention itself influences the outcome status) and protopathic bias (where early undiagnosed symptoms of the outcome influence intervention status).Residual baseline confounding
One or more available baseline confounders are poorly measured, or have been coarsened (e.g., dichotomised), meaning conditioning does not fully remove confounding (e.g., dichotomised ‘obesity’ does not capture confounding by BMI). Predictions will remain biased, even after conditioning, though typically less so.
Unobserved time-varying confounding
For predictions involving multiple interventions, or sustained or dynamic intervention regimes, one or more time-varying common causes of subsequent intervention decisions and outcome are not captured in the data, leading to biased predictions.Residual time-varying confounding
For predictions involving multiple interventions, or sustained or dynamic intervention regimes, one or more available time-varying confounders are poorly measured or have been coarsened, meaning predictions will remain biased even after appropriate handling, though typically less so.
-
Lead-time bias
When the hypothetical intervention influences the timing of the index event (e.g., a screening intervention leads to earlier diagnosis), counterfactual predictions of time-to-event from the index event may overestimate the benefit of the intervention.Immortal time bias
Intervention definitions that require a period of time to be satisfied (e.g., "at least 7 days of treatment") guarantee that individuals assigned to that intervention have survived event-free for that period, biasing the counterfactual predictions in favour of the intervention.Prevalent user bias
Basing predictions on prevalent users of the intervention means that adverse outcomes occurring before the observation window are uncaptured. The counterfactual predictions reflect a selected population of survivors and tolerators rather than all individuals who would initiate the intervention.
Causal Effect Estimation
-
Type 2 selection bias (generalizability bias)
The sample is not a census or random sample from the target population, the distribution of certain effect modifiers differs between the sample and the target population, and the analytic sample cannot be reweighted to represent the target population (e.g., due to absence of survey weights or insufficient auxiliary data to construct them). Causal effect estimates will not generalise to the target population.
Type 1 selection bias (collider restriction bias)
When both the exposure and outcome of interest are related to presence in the data, whether directly or through shared or intermediate causes, the estimated causal effects will be biased. Common instances include Berkson's bias*, index event bias**, survivorship bias***, and M-bias****.
-
Outcome measurement error
The outcome is measured with error, which may be unrelated to the exposure (non-differential) or related to the exposure (differential, e.g., due to surveillance bias, where diagnostic examination is more likely for a particular exposure). Non-differential error can lead to diluted effect estimates for categorical outcomes; differential error can introduce bias in any direction.Exposure measurement error
The exposure is measured with error, which may be unrelated to the outcome (non-differential) or related to the outcome (differential). Non-differential error generally leads to diluted effect estimates; differential error can introduce bias in any direction.Dependent measurement errors
There is correlated measurement error in both the exposure and outcome, e.g. because both variables are measured by the same clinician during the same clinical encounter. Dependent errors can introduce bias in any direction.Effect modifier measurement error
The effect measure modifier of interest is measured with error, which can introduce bias in any direction in the apparent heterogeneity between groups.Mediator measurement error
The mediator is measured with error, which can introduce bias in any direction in the apparent direct and/or indirect effects.
-
Data are missing for the exposure, outcome, mediator, effect measure modifier, or other adjustment variables, and the probability of missingness is related to the exposure and/or outcome (e.g., due to an informative observation processes). When the analysis requires follow-up over time, loss to follow-up or informative censoring occurs when individuals leave the observation window for reasons related to the treatment or outcome (e.g., transferring care, death captured in a different system). Informative missingness can bias causal effect estimates in any direction.
-
Data are pooled from multiple healthcare systems or across calendar periods with different measurement practices and/or case-mix. This introduces uncertainty and can bias causal effect estimates when site or calendar period covaries with the exposure, outcome, or confounders.
-
Insufficient sample size, or strong determination of the exposure*, leads to poor overlap between exposure groups after conditioning, producing extreme weights, unstable coefficients, and biased effect estimates.
Footnote: *Poor covariate overlap may arise because the sample is too small to adequately represent all covariate patterns or because certain covariate patterns strongly predict exposure/intervention status. This second issue cannot be resolved by simply collecting more data.
-
Unobserved baseline confounding
One or more baseline common causes of the exposure and outcome are not captured in the data, leading to biased effect estimates. Important instances in EHR include confounding by indication (where the clinical reason for prescribing a treatment itself influences the outcome status) and protopathic bias (where early undiagnosed symptoms of the outcome influence treatment status).Residual baseline confounding One or more available baseline confounders are poorly measured, or have been coarsened (e.g. dichotomised), meaning conditioning does not fully remove confounding (e.g., dichotomised ‘obesity’ does not capture confounding by BMI). Effect estimates will remain biased even after conditioning, though typically less so.
Unobserved time-varying confounding
For mediation analyses or studies of sustained or dynamic treatment regimes, one or more time-varying common causes of subsequent exposure and outcome are not captured in the data, leading to biased effect estimates.Residual time-varying confounding
For mediation analyses or studies of sustained or dynamic treatment regimes, one or more available time-varying confounders are poorly measured or have been coarsened, meaning effect estimates will remain biased, even after appropriate handling, though typically less so.
-
Lead-time bias
When the exposure influences the timing of the index event (e.g., a screening intervention leads to earlier diagnosis), comparing time-to-event from the index event between exposed and unexposed groups may show an apparent benefit even if the exposure has no true effect on the outcome.Immortal time bias
Exposure definitions that require a period of time to be satisfied (e.g., "at least 7 days of treatment") guarantee that exposed individuals have survived event-free for that period, while no equivalent guarantee exists for unexposed individuals. This creates a biased comparison by excluding early events from the exposed group.Prevalent user bias
Studying an exposure that began before the observation window (prevalent use) means that any adverse outcomes occurring between exposure initiation and entry into the data have not been captured, selecting for individuals who survived and tolerated the exposure.
By the end of this step, you should have:
Identified the main sources of error, bias, and validity threats relevant to the study
Described how each threat could affect the results
Listed planned mitigation strategies
Flagged threats that cannot be fully addressed with the available data
Created a bias and validity assessment plan