Factual Prediction
Predict current or future outcomes under observed or expected care.
R1: Research task identification
Screening and classification
Estimating the probability that a health state is currently present, or the expected current value of a health state.
Prognostication
Estimating the probability of an outcome occurring in the future - or the expected future value of an outcome - conditional on individual characteristics and under observed or expected treatment conditions.
I2: Identify estimand(s)
Carefully describe the target quantity/quantities of interest and all relevant criteria using the appropriate estimand framework
Clinical decision, action, or policy to be informed by the prediction (e.g., whether to initiate treatment, whether to refer for further investigation, risk-stratified screening or resource allocation)
Target population (including the population definition and sampling frame)
Outcome definition (including the prediction horizon where appropriate, e.g. 5-year risk of heart attack)
Intended deployment context, including proposed user (e.g. by primary care doctor during routine appointment at aged 40 years)
Reference time point (i.e. landmark time) when the prediction will be made
Handling of treatments (e.g. whether predictions reflect outcomes regardless of treatment received, prior to treatment initiation, or with treatment as part of a composite outcome)
Handling of competing events (e.g. whether competing events such as death are handled via cause-specific or subdistribution approaches)
G3: Gauge data fitness
Carefully describe the target quantity/quantities of interest and all relevant criteria using the appropriate estimand framework
Sample Requirements
The sample should be drawn from the target population, or contain sufficient information to enable transportation or reweighting of estimates to that population.
Sample size should be sufficient to estimate baseline risk with desired precision, while minimising overfitting and optimism (e.g. using pmsampsize or equivalent)
Period of observation is sufficient to observe the outcome within the intended prediction horizon (e.g. follow-up long enough to observe 5-year cardiovascular events)
Variable requirements
Outcome, treatments, and all key predictors are available, accurately measured, and consistently defined across the study period and any external validation samples
All predictors are available at the reference time point (i.e. measured at or before the landmark time for prognostication, or available at the point of screening for classification)
O4: Outline and consider key sources of error, bias & threats to validity
Consider all potential sources of error, bias and threats to validity and outline mitigation strategies. Table 2 contains prompt questions to help identify major sources of bias and select mitigation strategies.
Selection
Non-representative development sample
The case-mix, predictor distributions, and/or outcome frequency in the development sample do not match the intended deployment population, either because the contributing healthcare systems are unrepresentative or because of informative presence*. Predictive performance will degrade when the model is applied to populations with different characteristics.
Collider path learning
When presence in the data is related to both the outcome and one or more predictors, whether directly or through shared or intermediate causes, the model may learn associations that are induced by the selection process. Such associations will transport to identically selected populations but may not transport to populations with different selection properties, even when case-mix appears similar. Common structures that lead to such associations include Berkson's bias*, index event bias**, survivorship bias***, and M-bias****.
Measurement
Outcome measurement error
The outcome is measured with systematic error, meaning the model learns to predict the true outcome incorrectly. Predictive accuracy will be degraded regardless of the deployment context.
Predictor measurement error
One or more predictors are measured with error. The model will perform the same only when the measurement process operate in the deployment context. When error structures differ between development and deployment populations (e.g., due to different coding practices, diagnostic technologies, or documentation standards), predictive performance will degrade, even when case-mix appears similar.
Dependent measurement errors
There is correlated measurement error in both the predictors and outcome, e.g. because both variables are measured by the same clinician during the same clinical encounter. The model may learn associations that are partly artefacts of shared measurement processes. Such associations will transport to identically measured populations, but may not transport to populations with different measurement processes.
Missing Data
Data are missing for the outcome or one or more predictors, and the probability of missingness is related to the outcome and/or predictors. When the analysis requires follow-up over time, loss to follow-up or informative censoring occurs when individuals leave the observation window for reasons related to the outcome or predictors. Informative missingness can degrade predictive performance, particularly if the missingness mechanism differs between development and deployment contexts.
Data Source Heterogeneity
Data are pooled from multiple healthcare systems or across calendar periods with different measurement practices and/or case-mix. The model may learn associations that reflect an average across heterogeneous settings rather than the relationships present in any single context, potentially degrading predictive performance when deployed in a specific setting.
Data Sparsity
Insufficient sample size and/or covariate separation leads to overfitted models with inflated performance that are not externally valid.
Confounding
N/A
Time Zero Alignment
Inconsistent landmark time
Lack of a clearly-defined landmark time means model predictions correspond to an unknown weighted average of predictions at different clinical moments (e.g., after symptom onset, after screening, after formal diagnosis).
Survivorship conditioning
Predictors that require a period of observation to be measured (e.g., treatment response, biomarker trajectories) implicitly require survival to that point, restricting the prediction to a subpopulation of survivors and potentially limiting applicability.
_________________________
* Berkson’s bias = A type of selection bias that occurs when both primary variables of interest (e.g. an exposure and outcome) both directly influence entry into the sample
** Index event bias = A type of selection bias that occurs when a primary variable of interest (e.g. the outcome) is only possible among people who have experienced a qualifying event that is directly influenced by another variable of interest (e.g. the exposure), and the primary variable is also related to the qualifying event through shared causes.
*** Survivorship bias = a type of selection bias that occurs when a primary variable of interest (e.g. the exposure) directly influences survival to study entry, and another variable of interest (e.g. the outcome) is also related to survival through shared causes.
**** M-bias = a type of selection bias that occurs when there are unmeasured causes of two primary variables of interest (e.g. the exposure and outcome) that also cause study entry. In EHR data, this often arises through informed presence bias, where presence in the dataset is influenced by factors (e.g. healthcare utilisation, socioeconomic position) that are also linked to the primary variables of interest.
R5: Run appropriate analysis
Select and conduct analyses that are suitable for estimating your target estimands in the available data
Choose modelling approach based on outcome type, sample size relative to number of candidate predictors, interpretability requirements, and deployment constraints (e.g., regression-based, machine learning, or ensemble approaches)
Choose and implement appropriate analytical methods for the competing event and intercurrent event strategies specified in the estimand (e.g., cause-specific or subdistribution models for competing events; composite endpoint or treatment policy strategies for intercurrent events)
For each source of error, bias, or validity threat identified in the O1 step (above), specify the analytical strategy and mitigation approach, documenting the details in Table 2.
Apply shrinkage or penalization in high-dimensional settings
Use bootstrap resampling or cross-validation for internal validation and optimism correction
Assess discrimination and calibration in internal and external validation samples
Evaluate and address miscalibration revealed by external validation
When the goal is to evaluate the incremental predictive value of additional variables, use nested model comparison methods (e.g., likelihood ratio tests, C-statistic improvement, net reclassification, decision curve analysis)
*Standard regression adjustment is inappropriate when time-varying confounders are affected by prior treatment, as conditioning on these variables simultaneously blocks causal pathways and introduces collider bias.
O6: Outline and assess assumptions
Clearly outline the assumptions behind your results and conduct appropriate sensitivity analyses
Deployment validity
Assess whether the case-mix, predictor distributions, and outcome prevalence in the development data are representative of the intended deployment population. Consider whether learned predictor-outcome associations may depend on selection mechanisms or measurement processes specific to the development setting. Discuss potential sources of miscalibration when applying the model to new settings. Evaluate model performance in external validation samples where available.
Temporal stability
Assess whether predictor-outcome relationships are likely to remain stable over time, considering changes in coding practices and evolving patient populations.
Predictor availability
Confirm that all model predictors are realistically available at the intended point of prediction in clinical practice and that no predictors require a period of observation to be measured. Discuss implications if predictors are missing or measured differently in deployment settings.
Competing events and censoring assumptions
For time-to-event outcomes, state the assumptions underlying the chosen approach to competing events and censoring (e.g., non-informative censoring, independence of competing events) and discuss whether these assumptions are plausible in the intended deployment context.Missing data assumptions
State the assumed missing data mechanism (MCAR, MAR, or MNAR) and justify the chosen analytical approach. Assess sensitivity of estimates to missing data assumptions and analytical approach if missing data are substantial.
Model assumptions
Report and check all model-specific assumptions (e.g. linearity, proportional hazards for survival models).
U7: Use appropriate language
Describe and interpret aims, methods, and results in terms of predicted outcomes and model performance, avoiding associational, risk factor, or causal language.
Examples:
Aim: 'We aimed to develop a model to predict 5-year risk of Y in population Z'
Methods: 'Model discrimination and calibration were assessed using the C-statistic and calibration plots across risk deciles'
Results: 'The predicted 5-year risk of Y was X% (95% CI…); the model achieved good discrimination (C-statistic …) and was well-calibrated across risk deciles'.
Discussion: 'These findings suggest our model can accurately predict 5-year risk of Y in population Z, with the addition of biomarker X improving discrimination (change in C-statistic:…)'
Examples to Avoid:
Throughout:
'We aimed to identify risk factors for Y' (risk factor language is unclear and should be avoided)
'X was associated with higher predicted risk of Y' (associational language obscures the predictive aim)
'X was an independent predictor of Y' (implies causal interpretation of model coefficients)
'These findings suggest X increases the risk of Y' (causal language inappropriate for predictive study)
S8: Satisfy reporting and transparency standards
Follow current best practice and relevant reporting guidelines for reporting study details and results.
Pre-register a study protocol and statistical analysis plan (e.g. on OSF) before data access, clearly stating the factual prediction estimands of interest.
Make analytical code available (e.g., as a supplement to the publication, alongside the protocol, or in a public repository), having reviewed it for disclosive content
Provide a data availability statement describing the process for obtaining access to the source data. Report summary-level information including sample flow diagrams and baseline sample characteristics
Follow RECORD and TRIPOD+AI reporting guidelines. Use PROBAST to assess and report risk of bias