ESTRO 2026 - Abstract Book PART II

S2293

Physics - Machine learning and AI algorithms

ESTRO 2026

Results:

For both NTCP models, incorporating UQ did not compromise model performance. For dysphagia, both Monte Carlo dropout and deep ensemble maintained comparable performance to the baseline model (AUC range=0.86-0.87), while test-time augmentation led to a moderate decline (AUC=0.80). For xerostomia, performance was comparable across methods (AUC range=0.71-0.72). Monte Carlo dropout and deep ensembles produced the most reliable uncertainty estimates (Figure 2B), showing strong calibration between predictive certainty and accuracy. In contrast, TTA exhibited poor calibration for both models, with large variability in reliability across uncertainty metrics. Variance and binary entropy provided the most consistent and well-calibrated uncertainty estimates, although the latter typically overestimated

Material/Methods: Two previously published DL NTCP models for dysphagia [2] and xerostomia [3] were retrained on 965 and tested on 241 head and neck cancer (HNC) patients treated with definitive (chemo)radiotherapy. Each model preserved its original architecture and hyperparameters and was trained both as a standard deterministic model and with three UQ approaches: Monte Carlo dropout, deep ensembles, and test-time augmentation (TTA). For each sampling method, three uncertainty measures (binary entropy, variance, and mutual information) were used to derive uncertainty values. Predictive performance and calibration of uncertainty estimates were assessed on an independent test set.

uncertainty. Conclusion:

UQ methods can provide well-calibrated indicators of prediction reliability in DL NTCP models, and can thus strengthen confidence in their predictions and improve their clinical interpretability. Establishing standardized evaluation of UQ methods is a critical step towards reliable and trustworthy clinical implementation of DL NTCP models. References: [1] K. A. Wahid et al., “Artificial Intelligence Uncertainty Quantification in Radiotherapy Applications - A Scoping Review,” Radiother. Oncol., vol. 0, no. 0, Sep. 2024.[2] S. P. M. de Vette et al., “Deep learning NTCP model for late dysphagia after radiotherapy for head and neck cancer patients based on 3D dose, CT and segmentations,” Radiother. Oncol., vol. 213, p. 111169, Dec. 2025.[3] H. Chu et al., “3D deep learning Normal

Made with FlippingBook - Share PDF online