Articol
Dangerous projections: How poor survival model choices with immature data can misguide HTA decisions and impact patients
How much can we trust survival projections when the data are still immature? In health technology assessments (HTAs), the choice of model can make the difference between robust evidence and misleading conclusions.
The importance of survival projections in HTA
When assessing the value of an innovation in oncology in the context of health technology assessments (HTAs), the evaluation frequently centres on cost-effectiveness analysis (CEA) or cost-utility analysis (CUA). These frameworks assess the value that new treatments provide to patients, typically quantified as years of life gained, derived from overall survival (OS), or years of life free from cancer, derived from progression-free survival (PFS), relative to the costs of the treatments. Projecting the years of life gained over a lifetime horizon is fundamental to HTA because the benefits of a new therapy can accumulate over extended periods, especially for treatments used in earlier disease stages or those with the potential to cure.
Importantly, these gains in life-years are distinct from median OS or PFS values. Immunotherapies in melanoma provide a clear example. While the median PFS between CTLA-4 antibodies and anti-PD-1 therapies differs by only a few months, the average time patients spend without disease progression is significantly greater for those receiving anti-PD-1 therapy. This is because anti-PD-1 therapies are associated with a PFS plateau, with more than 25% of patients remaining progression-free at 66 months, accruing more years without progression, compared to those treated with CTLA-4 antibodies. In cases like this, median PFS fails to accurately capture the average benefit to patients provided by the treatment, and lifetime projections are needed, well beyond the median survival time.
As a result, HTA processes place substantial emphasis on projecting lifetime survival benefits. These projections rely on data from randomised controlled trials (RCTs). However, RCTs often have limited follow-up periods at the time of market entry, leading to high rates of right-censoring in the survival data. Consequently, the data available for survival projections are frequently immature (ie, with a substantial amount of right-censoring) when initial regulatory and HTA decisions must be made.
As a result, HTA processes place substantial emphasis on projecting lifetime survival benefits. These projections rely on data from randomised controlled trials (RCTs). However, RCTs often have limited follow-up periods at the time of market entry, leading to high rates of right-censoring in the survival data. Consequently, the data available for survival projections are frequently immature (ie, with a substantial amount of right-censoring) when initial regulatory and HTA decisions must be made.
Increasing reliance on immature survival data in oncology HTA
To estimate long-term survival outcomes, parametric survival functions and extrapolation techniques are employed, which are based on assumptions about hazard functions derived from observed data. Current recommendations from the United Kingdom’s National Institute for Health and Care Excellence (NICE) advocate for the fitting of six standard parametric models, exponential, Weibull, Gompertz, log-logistic, log-normal, and generalised gamma, to survival data. These models are evaluated by assessing their goodness of fit through visual inspection and metrics such as the Akaike Information Criterion (AIC), as well as considering the plausibility of extrapolated tails using external data, expert opinion, and biological reasoning. If none of the standard parametric models are suitable for the observed survival data, more flexible and complex modelling approaches, such as parametric spline models, can be used. It is important to recognise that these more flexible models were designed to provide a better fit for observed survival data, particularly when the data exhibits complex hazard patterns, as is the case with immunotherapies. This modelling process is essential for generating robust long-term survival projections, which ultimately support decision-making that affects patient access to treatments and the allocation of healthcare resources.
In oncology, this process has been hindered by growing reliance on immature survival data. For example, between 2015 and 2017, 41% of cancer single technology appraisals conducted by NICE were based on immature survival data. This percentage rose to 56% for appraisals published between 2018 and 2022.
This trend toward using less mature data in HTA submissions raises important questions regarding optimal model selection. While standard parametric models remain widely used, the adoption of flexible models such as spline models, in recent HTA submissions, has increased. Although these models often provide a better fit to the observed data, this does not necessarily translate into more reliable long‑term projections.
Current guidelines recommend selecting the model that best fits the available data, including comparing long-term accuracy using external data, and using expert opinion and biological reasoning. However, particularly in the context of single-arm trials and independent fitting of intervention and comparator arms, access to long-term external data is often limited, while reliance on expert opinion and biological reasoning may lack sufficient robustness. In such situations, real‑world data can provide valuable supplementary evidence to assess tail plausibility and strengthen long‑term projections.
At ISPOR Europe 2025 in Glasgow, we examined whether survival extrapolations should prioritise fitting existing data or reducing future uncertainty when survival data are limited. We focused on standard parametric models that are the most widely used in HTA submissions.
In oncology, this process has been hindered by growing reliance on immature survival data. For example, between 2015 and 2017, 41% of cancer single technology appraisals conducted by NICE were based on immature survival data. This percentage rose to 56% for appraisals published between 2018 and 2022.
This trend toward using less mature data in HTA submissions raises important questions regarding optimal model selection. While standard parametric models remain widely used, the adoption of flexible models such as spline models, in recent HTA submissions, has increased. Although these models often provide a better fit to the observed data, this does not necessarily translate into more reliable long‑term projections.
Current guidelines recommend selecting the model that best fits the available data, including comparing long-term accuracy using external data, and using expert opinion and biological reasoning. However, particularly in the context of single-arm trials and independent fitting of intervention and comparator arms, access to long-term external data is often limited, while reliance on expert opinion and biological reasoning may lack sufficient robustness. In such situations, real‑world data can provide valuable supplementary evidence to assess tail plausibility and strengthen long‑term projections.
At ISPOR Europe 2025 in Glasgow, we examined whether survival extrapolations should prioritise fitting existing data or reducing future uncertainty when survival data are limited. We focused on standard parametric models that are the most widely used in HTA submissions.
Prioritising long-term accuracy over fit
We presented an analysis that used OS and PFS data from a diverse set of recent oncology trials that featured sufficiently mature datasets. The required data were obtained by digitising the published Kaplan–Meier (KM) survival curves and generating pseudo-patient-level datasets, including censoring, according to the methodology recommended by NICE. The clinical trials selected for this study included CLEAR, CM-649, COU-AA-301, KEYNOTE-A39, KEYNOTE-A39 (8/8/24), SUNLIGHT, and TROPICS-02. For each of these trials, we created additional, more immature datasets by artificially right-censoring the data at varying event thresholds: 60%–70%, 50%, 30%, and 20%. This was accomplished by censoring all remaining patients after the specified event threshold was reached, thereby simulating datasets with increasing levels of immaturity.
Following this, we performed survival extrapolation as per standard HTA submission guidelines. Extrapolations were conducted using five standard parametric models that are the most widely used in HTA submission: generalised gamma, Weibull, exponential, log-normal, and log-logistic. These models were applied to all datasets to predict long-term survival outcomes. The predictive accuracy of each extrapolation was assessed using the restricted mean survival time (RMST). The RMST represents the mean survival time restricted to a specified maximum follow-up time, which in our case corresponded to the maximum observed duration for each KM curve. To quantify the accuracy of the survival projections, we calculated the absolute average difference, the relative difference, and the squared error between the RMST derived from the KM curves in the observed, published data and the RMST predicted by each parametric model.
As expected, as illustrated in Figure 1, the variation in the extrapolated RMST was high for models estimated with limited follow-up (eg, with more than 25% of events censored). With the selected data and censoring levels explored, projections could be up to 60% off compared to the more mature KM data. This is unsurprising and has been seen in previous similar work. However, potential uncertainty is important. This could translate into a doubling or halving of the CEA or CUA results when more mature data are available, radically changing the HTA conclusion and potentially influencing reimbursement decisions and patient access. Most functions performed similarly when applied to mature data, showing average differences of less than 2.5% between projected and KM RMST results, particularly for the best-fitting function. This indicates that model fit to the available data has less influence on outcomes than the choice of a function with more reliable long‑term projection behaviour.
Figure 1. Relative RMST differences across censoring levels and extrapolation functions
Following this, we performed survival extrapolation as per standard HTA submission guidelines. Extrapolations were conducted using five standard parametric models that are the most widely used in HTA submission: generalised gamma, Weibull, exponential, log-normal, and log-logistic. These models were applied to all datasets to predict long-term survival outcomes. The predictive accuracy of each extrapolation was assessed using the restricted mean survival time (RMST). The RMST represents the mean survival time restricted to a specified maximum follow-up time, which in our case corresponded to the maximum observed duration for each KM curve. To quantify the accuracy of the survival projections, we calculated the absolute average difference, the relative difference, and the squared error between the RMST derived from the KM curves in the observed, published data and the RMST predicted by each parametric model.
As expected, as illustrated in Figure 1, the variation in the extrapolated RMST was high for models estimated with limited follow-up (eg, with more than 25% of events censored). With the selected data and censoring levels explored, projections could be up to 60% off compared to the more mature KM data. This is unsurprising and has been seen in previous similar work. However, potential uncertainty is important. This could translate into a doubling or halving of the CEA or CUA results when more mature data are available, radically changing the HTA conclusion and potentially influencing reimbursement decisions and patient access. Most functions performed similarly when applied to mature data, showing average differences of less than 2.5% between projected and KM RMST results, particularly for the best-fitting function. This indicates that model fit to the available data has less influence on outcomes than the choice of a function with more reliable long‑term projection behaviour.
Figure 1. Relative RMST differences across censoring levels and extrapolation functions
Key: RMST – restricted mean survival time.
Interestingly, different functions demonstrated varying levels of long-term accuracy (Figure 2). Log-normal and exponential functions each had approximately ±6% overall uncertainty, while the generalised gamma function became increasingly unstable as censoring thresholds increased. Most functions tended to underestimate long-term survival projections, although the exponential function generally overestimated survival gains. Among the underestimating models, log-normal and log-logistic functions produced the most accurate long-term estimates on average. These findings raise an important question when selecting survival models for long-term projections: should we prioritise fitting available KM data, potentially ignoring uncertainties in immature datasets, or choose models that yield better average long-term projections? An alternative to current guidelines might be to focus on selecting functions that provide greater long-term predictive accuracy rather than just short-term fit.
Figure 2. Average RMST difference vs. KM (0%–10%, 10%–30%, 30%–50%, >50%) across censoring levels and extrapolation functions
Figure 2. Average RMST difference vs. KM (0%–10%, 10%–30%, 30%–50%, >50%) across censoring levels and extrapolation functions
Key: KM – Kaplan–Meier; RMST – restricted mean survival time.
These results emphasise the need to reassess methodological guidelines as circumstances change. As survival data become more immature and hazard functions become more complex, it's worth reconsidering whether traditional approaches are still appropriate. Delivering accurate HTA requires continuous improvement to ensure patients receive optimal treatment and healthcare resources are used effectively. At Cencora, advancing innovation and expertise remains a priority.
These results emphasise the need to reassess methodological guidelines as circumstances change. As survival data become more immature and hazard functions become more complex, it's worth reconsidering whether traditional approaches are still appropriate. Delivering accurate HTA requires continuous improvement to ensure patients receive optimal treatment and healthcare resources are used effectively. At Cencora, advancing innovation and expertise remains a priority.
Sources listed below.
Disclaimer:
This article summarises Cencora’s understanding of the topic based on publicly available information at the time of writing (see listed sources) and the authors’ expertise in this area. Any recommendations provided in the article may not be applicable to all situations and do not constitute legal advice; readers should not rely on the article in making decisions related to the topics discussed.
Luați legătura cu echipa noastră
Echipa noastră de experți în valoare este dedicată transformării dovezilor, informațiilor privind politicile și informațiilor de piață în strategii eficiente de acces la piața globală. Permiteți-ne să vă ajutăm să navigați cu încredere prin peisajul complex al asistenței medicale din ziua de azi. Contactați-ne pentru a afla în ce mod vă putem sprijini în atingerea obiectivelor.
Sources
- Bakker LJ, Thielen FW, Redekop WK, Groot CU, Blommestein HM. Extrapolating empirical long-term survival data: the impact of updated follow-up data and parametric extrapolation methods on survival estimates in multiple myeloma. BMC Med Res Methodol. 2023;23(1):132. doi: 10.1186/s12874-023-01952-2. PMID: 37248477; PMCID: PMC10226243.
- Bullement A, Meng Y, Cooper M, et al. A review and validation of overall survival extrapolation in health technology assessments of cancer immunotherapy by the National Institute for Health and Care Excellence: how did the initial best estimate compare to trial data subsequently made available? J Med Econ. 2019;22(3):205-214. doi: 10.1080/13696998.2018.1547303. Epub 2018 Nov 30. PMID: 30422080.
- Everest L, Blommaert S, Chu RW, Chan KKW, Parmar A. Parametric survival extrapolation of early survival data in economic analyses: a comparison of projected versus observed updated survival. Value Health. 2022;25(4):622-629. doi: 10.1016/j.jval.2021.10.004. Epub 2021 Nov 24. PMID: 35365306.
- Fizazi K, Scher HI, Molina A, et al. . Abiraterone acetate for treatment of metastatic castration-resistant prostate cancer: final overall survival analysis of the COU-AA-301 randomised, double-blind, placebo-controlled phase 3 study. Lancet Oncol. 2012;13(10):983-992. doi: 10.1016/S1470-2045(12)70379-0. Epub 2012 Sep 18. Erratum in: Lancet Oncol. 2012;13(11):e464. Erratum in: Lancet Oncol. 2014;15(9):e365. PMID: 22995653.
- Gibbons CL, Latimer NR. Prevalence of immature survival data for anticancer drugs presented to the National Institute for Health and Care Excellence between 2018 and 2022. Value Health. 2025;28(3):406-414. doi: 10.1016/j.jval.2024.11.013. Epub 2024 Dec 24. PMID: 39725010.
- Gray J, Sullivan T, Latimer NR, et al. Extrapolation of survival curves using standard parametric models and flexible parametric spline models: comparisons in large registry cohorts with advanced cancer. Med Decis Making. 2021;41(2):179-193. doi: 10.1177/0272989X20978958. Epub 2020 Dec 22. PMID: 33349137.
- Janjigian YY, Shitara K, Moehler M, et al. First-line nivolumab plus chemotherapy versus chemotherapy alone for advanced gastric, gastro-oesophageal junction, and oesophageal adenocarcinoma (CheckMate 649): a randomised, open-label, phase 3 trial. Lancet. 2021;398(10294):27-40. doi: 10.1016/S0140-6736(21)00797-2. Epub 2021 Jun 5. PMID: 34102137; PMCID: PMC8436782.
- Kang J, Cairns J, Latimer NR, Duffield S, Grieve R. An assessment of the maturity of cancer survival data used in economic models for the National Institute for Health and Care Excellence’s single technology appraisals. Value Health. 2025;28(11):1705-1713. doi: 10.1016/j.jval.2025.07.010. Epub 2025 Jul 22. PMID: 40706705.
- Latimer N. NICE DSU Technical Support Document 14: Undertaking Survival Analysis for Economic Evaluations Alongside Clinical Trials—Extrapolation with Patient-Level Data. Sheffield (UK): Decision Support Unit, ScHARR, University of Sheffield; 2011.
- Leleu H, Carette J, Berkovitch B. Balancing Fit and Accuracy: Evaluating Survival Model Projections with Immature Data in Health Technology Assessments. ISPOR EU25. 9-12 November 2025. Glasgow, UK.
- Motzer R, Alekseev B, Rha SY, et al. Lenvatinib plus pembrolizumab or everolimus for advanced renal cell carcinoma. N Engl J Med. 2021;384(14):1289-1300. doi: 10.1056/NEJMoa2035716. Epub 2021 Feb 13. PMID: 33616314.
- Powles T, Valderrama BP, Gupta S, et al. Enfortumab vedotin and pembrolizumab in untreated advanced urothelial cancer. N Engl J Med. 2024;390(10):875-888. doi: 10.1056/NEJMoa2312117. PMID: 38446675.
- Prager GW, Taieb J, Fakih M, et al. Trifluridine-tipiracil and bevacizumab in refractory metastatic colorectal cancer. N Engl J Med. 2023;388(18):1657-1667. doi: 10.1056/NEJMoa2214963. PMID: 37133585.
- Rugo HS, Bardia A, Marmé F, et al. Overall survival with sacituzumab govitecan in hormone receptor-positive and human epidermal growth factor receptor 2-negative metastatic breast cancer (TROPiCS-02): a randomised, open-label, multicentre, phase 3 trial. Lancet. 2023;402(10411):1423-1433. doi: 10.1016/S0140-6736(23)01245-X. Epub 2023 Aug 23. PMID: 37633306.
- van Not OJ, van den Eertwegh AJM, Jalving H, et al. Long-term survival in patients with advanced melanoma. JAMA Netw Open. 2024;7(8):e2426641. doi: 10.1001/jamanetworkopen.2024.26641. PMID: 39141388; PMCID: PMC11325208.
- Zhu Y, Liu K, Zhu H, Li S, Yuan D. Enfortumab vedotin plus pembrolizumab for previously untreated locally advanced or metastatic urothelial carcinoma: a cost-effectiveness analysis. Ther Adv Med Oncol. 2025;17:17588359241295544. doi: 10.1177/17588359241295544. PMID: 39776535; PMCID: PMC11705323.
