S1548
Physics - Autosegmentation
ESTRO 2026
Digital Poster 637
In-depth analysis of failure rates and -modes in auto-segmentation of the esophagus in patients treated for lung cancer Lise B J Thorsen 1,2 , Marianne M Knap 1 , Tine B Nyeng 1 , Torben Aagaard 1 , Ditte S Møller 1,2 , Anne Holm 1 1 Department of Oncology, Aarhus University Hospital, Aarhus, Denmark. 2 Department of Clinical Medicine, Aarhus University, Aarhus, Denmark Purpose/Objective: Routine use of AI-based tools for auto-segmentation of organs at risk (OAR) in radiation therapy (RT) must be monitored to identify causes of failure and possible areas of improvement. AI-tools may be failure-prone when applied in patient populations different from those used to develop the tool, i.e. if data bias is present. We report a detailed analysis of failure rates and modes for automated delineation of the esophagus in patients with lung cancer. Material/Methods: A bespoke AI-solution (b-AI) for auto-segmentation of thoracic OAR was developed in-house on contrast- enhanced (CE) 4D-CT-scans from patients treated with lung cancer. For 20 consecutive months starting in June 2023, thoracic OAR were auto-segmented using this b-AI in all patients treated with RT for lung cancer at Aarhus University Hospital. Long-course RT was planned on CE 4D-CT-scans, whereas stereotactic RT was mainly planned on non-CE 4D-CT-scans. Uncorrected and corrected b-AI contours were saved and compared for monitoring. Esophageal outliers were defined as having 2mm Surface Dice Similarity Coefficient (SDSC)< 0.8 and/or mean Hausdorff Distance> 2mm. Two oncologists reviewed all outlier cases to assess failure modes, b-AI failure characteristics and associated traits of anatomical anomalies. Chi-squared or Fisher’s exact test were applied to assess differences in frequencies between CE- versus non-CE contours. Results: Among 402 consecutive patients, 53 outliers (13.1%) were confirmed as failures (table 1). More failures occurred in non-CE vs. CE planning CT scans (p=0.00001). In all CE scans an obvious failure mode was determined, whereas this was the case in only 52% of non-CE scans. The dominant failure-mode overall (n=26) was abnormal anatomy, mainly presenting as aberrant esophageal dimensions or trajectory. Three cases of abnormal anatomy were due to prior esophageal or pulmonary surgery, and eight additional cases were due to gross hiatal herniation. There were no significant differences in AI-failure characteristics or anatomical failure causes between scan types. Most AI-failure characteristics would be reflected in the cranio-caudal extent of the esophageal
Conclusion: Uncertainty maps for high-quality contours did not significantly affect manual editing times, with human factors exerting a stronger influence. Improving trust and usability of uncertainty maps may be key to enhancing efficiency; however, their clinical impact requires further assessment per use-case. References: [1] Rogowski V, Svalkvist A, Maspero M, Janssen T, Maruccio FC, et al. Impact of deep learning model uncertainty on manual corrections to MRI ‐ based auto ‐ segmentation in prostate cancer radiotherapy. J Applied Clin Med Phys 2025;26:e70221. https://doi.org/10.1002/acm2.70221. [2] Ferreira Silvério N, Van Den Wollenberg W, Betgen A, Wiersema L, Marijnen C, et al. Evaluation of Deep Learning Clinical Target Volumes Auto-Contouring for Magnetic Resonance Imaging-Guided Online Adaptive Treatment of Rectal Cancer. Advances in Radiation Oncology 2024;9:101483.
https://doi.org/10.1016/j.adro.2024.101483. Keywords: Contouring, editing, uncertainty
Made with FlippingBook - Share PDF online