ESTRO 2026 - Abstract Book PART II

S1572

Physics - Autosegmentation

ESTRO 2026

Conclusion: Auto-contours of Contour+ were of good to intermediate quality for the breast and nodal levels, outperforming AI-Rad in both qualitative and quantitative evaluations, with correlated quantitative and qualitative results. The IMN contours were generally acceptable.Additional dosimetric and AI dose plan analyses of Contour+-CTVs will be presented at

Performance was evaluated using Soft Dice, which measures spatial overlap based on probabilistic rather than binary segmentations, for segmentation accuracy; mean squared error (MSE) between predictive and ground-truth entropy for uncertainty quantification; and expected calibration error (ECE) for uncertainty calibration. Statistical significance was assessed using the Wilcoxon signed-rank test between all pairs of methods. Results: The resulting Soft Dice, MSE, and ECE values for each prostatic zone are presented in Table 1. Overall, the Probabilistic U-Net performed best in terms of uncertainty calibration, particularly for regions that are difficult to delineate, such as the urethra. However, it showed a slight decrease in Soft Dice for easier regions such as the transitional zone compared to the Softmax and TTA models. Figure 1 illustrates this behavior, where the Probabilistic U-Net more accurately identifies the uncertain urethral region but also assigns low-level uncertainty across a broader area, resulting in a minor reduction in Soft Dice for otherwise well-defined regions. Conclusion: The preliminary findings suggest that the Probabilistic U-Net provides better alignment with multi-annotator variability compared to Softmax- and TTA-based methods. This work demonstrates the usefulness of integrating a ground-truth comparison of uncertainties derived from multiple annotations, enabling a clearer understanding of which uncertainty quantification methods are most reliable for specific clinical purposes. By directly relating predicted uncertainty to real inter-observer variability, the study provides an approach for evaluating not only segmentation accuracy but also uncertainty relevance. References: 1.Huang, L., et al. A review of uncertainty quantification in medical image analysis: Probabilistic and non-probabilistic methods. Med Image Anal 97, 103223 (2024).2.Wang, G., et al. Aleatoric uncertainty estimation with test-time augmentation for medical image segmentation with convolutional neural networks. Neurocomputing (Amst) 335, 34-45 (2019).3.Kohl, S. A probabilistic U-Net for segmentation of ambioguous images. (2019).4.Holmlund, W., et al. ProstateZones - Segmentations of the prostatic zones and urethra for the PROSTATEx dataset. Sci Data 11, 1097 (2024).5.Litjens, G.D., Oscar; Barentsz, Jelle; Karssemeijer, Nico; Huisman, Henkjan. SPIE-AAPM PROSTATEx Challenge Data. (2017).6.Knegt, S.

the meeting. References: 1.

Offersen BV, Boersma LJ, Kirkove C, et al.

ESTRO consensus guideline on target volume delineation for elective radiation therapy of early stage breast cancer. Radiother Oncol. Jan 2015;114(1):3–10. doi:10.1016/j.radonc.2014.11.030 Keywords: Auto-segmentation, breast cancer, target volumes Digital Poster 2660 Ground-Truth Comparison of Aleatoric Uncertainty Quantification Methods in Medical Image Segmentation Gustav Jönsson 1 , Tommy Löfstedt 2 , Attila Simko 1 , Kristina Sandgren 1 , Joakim Jonsson 1 , Anders Garpebring 1 1 Department of Diagnostics and Intervention, Umeå University, Umeå, Sweden. 2 Department of Computing Science, Umeå University, Umeå, Sweden Purpose/Objective: Uncertainty quantification of medical image segmentation is an essential step toward safer and more adaptive radiotherapy planning, where model confidence can guide personalized treatment decisions. Recent reviews have identified a lack of ground truth comparisons in uncertainty quantification studies1. The aim of this work was to compare datasets and methods for estimating data- inherent (aleatoric) uncertainty quantification methods against known inter-annotator variability as a ground- truth reference. Material/Methods: Three commonly used aleatoric uncertainty quantification methods1 were evaluated: Softmax likelihoods, test-time augmentation (TTA)2, and the Probabilistic U-Net3. The study used the multi- annotator dataset ProstateZones4 which contains two prostatic-zone segmentations, based on MRI images from ProstateX5. These annotations were used as reference to compare aleatoric uncertainty.For Softmax and TTA evaluation a standard U-Net was implemented using the MONAI framework and trained to segment the prostatic zones with a combined Dice and cross-entropy loss. The Probabilistic U-Net6 was implemented and extended to 3D for volumetric data.

Probabilistic U-Net in PyTorch. (2020). Keywords: Uncertainty, ground truth

Made with FlippingBook - Share PDF online