ESTRO 2026 - Abstract Book PART II

Autosegmentation PHYSICS

S1544

Physics - Autosegmentation

ESTRO 2026

Digital Poster 142 Automated body composition analysis in borderline resectable pancreatic cancer: robust AI, disappointing clinical translation William Gehin 1 , Aurélien Lambert 2 , Jean-Emmanuel Bibault 3 1 Radiation therapy, Institut de Cancérologie de Lorraine, Vandoeuvre-lès-Nancy, France. 2 Oncology, Institut de Cancérologie de Lorraine, Vandoeuvre-lès- Nancy, France. 3 Radiation therapy, Hôpital Européen Georges Pompidou, Paris, France Purpose/Objective: Artificial intelligence (AI) research in oncology and radiation oncology is expanding exponentially, with thousands of publications annually. Many works propose AI-based tools for therapeutic decision support, particularly in rare or complex settings such as borderline resectable pancreatic cancer (BRPC), where patients face a difficult balance between intensive multimodal therapy and palliative strategies prioritizing quality of life. Yet clinical translation of such tools remains scarce. We aimed to evaluate whether automated body composition analysis could serve as a prognostic biomarker in BRPC and illustrate the challenges of AI translation into clinical practice. Material/Methods: Baseline CT scans from 107 patients with evaluable imaging among 110 randomized in the PRODIGE 44 BRPC trial were analyzed with a fully automated segmentation pipeline integrated in the clinical workflow. Skeletal muscle index (SMI) and additional 2D/3D body composition biomarkers were extracted. Associations with overall survival (OS), progression- free survival (PFS), and completion of multimodal therapy (chemotherapy ± radiotherapy ± surgery) were tested both as continuous variables and using published sarcopenia cut-offs. Results: Automated segmentation achieved excellent accuracy (Dice >0.9 vs manual reference), confirming technical feasibility. Sarcopenia prevalence varied widely according to the cut-off applied (30–60%). Neither SMI nor other continuous 2D/3D body composition biomarkers consistently predicted OS, PFS, or treatment completion. The tool, while technically robust and seamlessly integrable in clinical workflows, did not provide actionable patient stratification in this homogeneous prospective cohort. Conclusion: This study illustrates the paradox of AI in oncology: technically strong algorithms, addressing legitimate clinical questions, but potentially disappointing clinical utility. Automated sarcopenia analysis failed to stratify BRPC patients for multimodal therapy in a randomized trial setting, despite methodological rigor. This

represents a typical example of the current challenges in translational AI research, where only very few tools are rigorously validated and ultimately adopted in routine oncology and radiation oncology practice. Keywords: sarcopenia, AI, clinical translation Digital Poster 273 GTV segmentation in MRI guided radiotherapy with promptable foundation models Tom Julius Blöcker 1 , Nikolaos Delopoulos 1 , Miguel A. Palacios 2 , Sebastian Klüter 3 , Juliane Hörner-Rieber 3,4 , Carolin Rippke 3 , Lorenzo Placidi 5 , Luca Boldrini 6,7 , Vincenzo Frascino 7 , Nicolaus Andratschke 8 , Michael Baumgartl 8 , Riccardo Dal Bello 8 , Sebastian N. Marschner 1 , Claus Belka 1,9 , Stefanie Corradini 1,10 , Denis Dudas 1 , Marco Riboldi 11 , Christopher Kurz 1 , Guillaume Landry 1,9 1 Department of Radiation Oncology, LMU University Hospital, LMU Munich, Munich, Germany. 2 Dept. of Radiation Oncology, Amsterdam UMC, Vrije Universiteit Medical Centre, Amsterdam, Netherlands. 3 Department of Radiation Oncology, Heidelberg University Hospital, Heidelberg, Germany. 4 Department of Radiation Oncology, University Hospital Düsseldorf, Düsseldorf, Germany. 5 Medical Physics Unit, Dipartimento di Diagnostica per Immagini e Radioterapia oncologica, Fondazione Policlinico Universitario "A. Gemelli" IRCCS, Rome, Italy. 6 Institute of Radiology, Università Cattolica del Sacro Cuore, Rome, Italy. 7 Radiation therapy unit, Dipartimento di Diagnostica per Immagini e Radioterapia oncologica, Fondazione Policlinico Universitario "A. Gemelli" IRCCS, Rome, Italy. 8 Department of Radiation Oncology, University Hospital Zurich, University of Zurich, Zurich, Switzerland. 9 Bavarian Cancer Research Center (BZKF), Partner Site Munich, Munich, Germany. 10 Department of Radiation Oncology, Universitätsklinikum Erlangen, Erlangen, Germany. 11 Department of Medical Physics, Ludwig-Maximilians-Universität (LMU), Munich, Germany

Purpose/Objective: Magnetic resonance imaging (MRI) guided

radiotherapy (MRIgRT) requires the delineation of gross tumor volumes (GTV) in daily MRI from MRI- linacs. Specialised models have been developed for automatic segmentation of tumors from specific anatomical regions, but with limited performance, due to complex and varied GTV geometry and often subtle appearance.This study explores promptable foundation models for AI-assisted GTV segmentation across multiple tumor sites. Material/Methods: Promptable foundation models were driven by six

S1545

Physics - Autosegmentation

ESTRO 2026

sparse geometric prompt types to produce GTV segmentation masks (Figure 1): a single central point alone (Point) and with six external points (Points7), 2D bounding boxes in the axial plane (Box1) and in 3D (Box3), and 2D masks on one axial slice (Mask1) or three orthogonal slices (Mask3).Three promptable foundation models were evaluated: Segment Anything 2 (SAM2), SAM2 fine-tuned for medical imaging (MedSAM2), and nnInteractive (nnI), an nnUNet-based model trained for interactive medical segmentation.A multi-institutional (N=5) dataset of 580 clinical GTV delineations in volumetric MRI from 0.35T MRI-linacs was compiled with GTVs from abdomen (56 cases), lung (215), liver (110), pancreas (53), and pelvis (146) sites. Treatment planning MRI images were acquired with various imaging parameters and acquisition settings, reflecting differences in equipment and protocols across involved institutions.The output of a given model-prompt combination was compared to the ground truth using various metrics, including the Dice similarity coefficient (DSC).

produced overall better results (median DSC over all prompt types 0.75 for nnInteractive, 0.70 for MedSAM2, 0.5 for SAM2).Performance showed modest variability across anatomical sites, reflecting differences in MRI contrast and image quality. Performance was highest for liver and lung sites, followed by abdominal and pancreatic sites, and lowest for pelvic cases.

Conclusion: Promptable foundation models can effectively produce GTV delineations on 0.35T MR images from MR-linacs across multiple anatomical sites when provided with high-information prompts. Performance approaches that of specialised, task-specific models, suggesting strong potential for integration into the general MRIgRT segmentation workflows. References: 1 Betancourt Tarifa et al. "Pancreatic Tumor Segmentation in Therapeutic and Diagnostic MRI." (2025) DOI: 10.5281/zenodo.15081832 Leaderboard Task 2. URL: https://panther.grand- challenge.org/evaluation/closed-testing-phase-task- 2/leaderboard/ Keywords: MRI-linac,GTV segmentation Digital Poster 335 Evaluation of SAM2 model performance and its clinical aspects for ultrasound image segmentation Fruzsina Dvorzsak 1 , Maria Prosszer 1 , Alinka Olajos- Horvath 1 , Anna Fronto 1 , Greta Czibere 1 , Tamas Nadasi 1 , Veronika Donka 1 , Dora Bianka Dr Koczka- Balogh 2 , Krisztian Koos 1 1 HC STO-Artificial Intelligence & Machine Learning, GE Healthcare Magyarország Kft., Budapest, Hungary.

Results: Promptable foundation models achieved

segmentation performance with median DSCs of up to 0.85 (nnInteractive-Mask3) with an interquartile range of 0.80-0.89. AI-assistance thus surpassed automated domain-specific approaches, such as those of the PANTHER challenge with mean DSCs up to 0.53 for pancreas GTVs on 1.5T MRI images. (1)Prompts with more spatial information (especially three masks in orthogonal planes) yielded better results with reduced variability. This effect was less prominent for nnInteractive and MedSAM2, which were trained/fine- tuned for medical images. These specialised models

2 Department of Obstetrics and Gynecology, Semmelweis University, Budapest, Hungary

S1546

Physics - Autosegmentation

ESTRO 2026

Purpose/Objective: Segment Anything Model (SAM) is a powerful framework designed for interactive segmentation across various image types, including medical images such as MRI, CT and ultrasound. In this study, the performance of the model was evaluated on ultrasound images from various gynecological examinations. We used the adapted model – SAM2 – for this evaluation. Trained on videos, SAM2 handles dynamic ultrasound data better than SAM. Delineation of selected organs was performed in 3D Slicer software using a custom plugin. Material/Methods: 3D Slicer was extended with a plugin to integrate SAM2 via an interactive segmentation module. The model operates by standard prompts (inside/outside points) and can be limited along third axis with a bounding box. The workflow starts with generating an initial 2D contour using SAM2, which is refined manually before propagating through slices within the bounding box.20 gynecological scans were contoured for three organs – ovary, uterus, endometrium – using two methods: manual contouring tools and SAM2 (with and without manual refinement). The dataset included 2D images, 3D volumes and cineloops. Contours were generated by five observers (except one 3D endometrium case, completed by four observers) and compared against ground truth contours provided by clinical experts (Figure 1). Dice Similarity Coefficient (DSC) was calculated for each organ. Annotation time for each case was recorded.

minutes). For uterus scans, this reduction is statistically significant (p<0.05) across all data types. On cineloops, SAM2 consistently helps to achieve more accurate contours in less time. In terms of delineation, the endometrium seems to be the most challenging organ (both manually and with SAM2), due to its variable appearance and indistinct boundaries.

Table 1. Average annotation time and quality of manual, SAM2 and refined SAM2 contours. Conclusion: This study highlights the potential of SAM2 for medical image segmentation. For selected organs, manual correction of SAM2-generated pre-contours can yield higher-quality results than fully manual contouring while reducing annotation time. SAM2 can serve as a baseline for future models, though finetuned approaches may offer greater precision. Our findings also emphasize the critical importance of expert knowledge in achieving high-quality contours across imaging applications. References: Deák-Karancsi, B., Karancsi, Z., Kékesi, Á., Kiss, E. K., Hakim, L. M., Koós, K., & Ruskó, L. (2025). 2511 Evaluation of SAM and SAM2 from clinical perspective for organ delineation. Radiotherapy and Oncology, 206, S4360-S4363. Dong, H., Gu, H., Chen, Y., Yang, J., Chen, Y., & Mazurowski, M. A. (2024). Segment anything model 2: an application to 2d and 3d medical images. arXiv preprint arXiv:2408.00756. Ning, G., Liang, H., Jiang, Z., Zhang, H., & Liao, H. (2023). The potential of 'Segment Anything'(SAM) for universal intelligent ultrasound image guidance. Bioscience trends, 17(3), 230-233. Keywords: Segment Anything, organ delineation, ultrasound

Digital Poster 427

Impact of uncertainty prediction on manual editing of rectal cancer CTV auto-segmentations Federica Carmen Maruccio 1 , Rita Simões 1 , Fokie Cnossen 2 , Christian Jamtheim Gustafsson 3,4 , Sanne Conijn 1 , Alice Couwenberg 1 , Suzan Gerrets - van Noord 1 , Inge de Jong 1 , Vivian van Pelt 1 , Lisa Wiersema 1 , Joëlle van Aalst 5 , Jan-Jakob Sonke 1 , Charlotte L. Brouwer 5 , Tomas Janssen 1

Figure 1. Example of manual, SAM2 and refined SAM2 contours compared to ground truth. Results: Table 1 shows that SAM2 with refinement (DSC=0.854) slightly outperforms manual contouring (DSC=0.849). Using SAM2 as a pre-annotation tool reduces total annotation time by 11.35% (from 65.88 to 58.40

S1547

Physics - Autosegmentation

ESTRO 2026

1 Department of Radiation Oncology, The Netherlands Cancer Institute, Amsterdam, Netherlands. 2 Department of Artificial Intelligence, Bernoulli Institute of Mathematics, Groningen, Netherlands. 3 Department of Hematology, Oncology, and Radiation Physics, Skåne University Hospital, Lund, Sweden. 4 Department of Translational Medicine, Lund University, Malmö, Sweden. 5 Department of Radiation Oncology, University Medical Center Groningen, Groningen, Netherlands Purpose/Objective: Uncertainty maps can be used to quantify and visualise the estimated confidence of Deep Learning (DL) models in contouring predictions. It has been hypothesised that such maps can support clinicians during manual review, potentially reducing editing time. However, uncertainty maps are not currently presented in clinical practice, and data regarding their influence on clinical decision-making remains limited [1]. This study investigates the impact of simulated uncertainty maps on clinical behaviour during manual editing of high-quality mesorectum CTV contours in rectal cancer radiotherapy. Material/Methods: A retrospective dataset of ten rectal cancer patients, used in an earlier assessment of inter-observer variability (IOV) [2], was utilised. Each patient had five independent manual CTV contours, from which one was randomly selected as surrogate ‘DL contour’, while the inter-observer variation served as ‘DL uncertainty map’. This design allowed to focus on the impact of meaningful uncertainty maps using well-defined contours. Six clinicians participated in two editing phases, two months apart. In a within-subject, counterbalanced design, they manually edited 10 contours per phase, presented in random order, 5 with and 5 without theuncertainty maps. Clinicians were told that both contours and maps were created by a DL-model. To gather qualitative feedback on user experience, questionnaires were completed during each phase, followed by one-on-one interviews (Figure 1). Editing times were extracted from screen recordings, and editing amount quantified using the added path length (APL).

Results: Median editing time per patient did not differ with the use of uncertainty maps (4.2 ± 3.3 min vs. 4.1 ± 3.3 min), but was significantly shorter in Phase 2 (3.4 ± 2.1 min) compared to Phase 1 (6.2 ± 3.4 min) (p < 0.01; Figure 2a). Similarly, APL results indicated comparable editing amount with and without uncertainty maps, yet significantly decreased in Phase 2 (1.1 ± 3.5 cm³) compared to Phase 1 (3.0 ± 3.6 cm³) (p < 0.01; Figure 2b). Questionnaire and interview findings suggested that editing behaviour was influenced more by workload, memory and anchoring biases, mind-set, mood, and learning effect from task repetition, rather than by uncertainty maps. Clinicians reported limited trust in the uncertainty maps, using them primarily for confirmation rather than decision-making. Nonetheless, clinicians acknowledged potential value for low-quality contours if trust is established.

S1548

Physics - Autosegmentation

ESTRO 2026

Digital Poster 637

In-depth analysis of failure rates and -modes in auto-segmentation of the esophagus in patients treated for lung cancer Lise B J Thorsen 1,2 , Marianne M Knap 1 , Tine B Nyeng 1 , Torben Aagaard 1 , Ditte S Møller 1,2 , Anne Holm 1 1 Department of Oncology, Aarhus University Hospital, Aarhus, Denmark. 2 Department of Clinical Medicine, Aarhus University, Aarhus, Denmark Purpose/Objective: Routine use of AI-based tools for auto-segmentation of organs at risk (OAR) in radiation therapy (RT) must be monitored to identify causes of failure and possible areas of improvement. AI-tools may be failure-prone when applied in patient populations different from those used to develop the tool, i.e. if data bias is present. We report a detailed analysis of failure rates and modes for automated delineation of the esophagus in patients with lung cancer. Material/Methods: A bespoke AI-solution (b-AI) for auto-segmentation of thoracic OAR was developed in-house on contrast- enhanced (CE) 4D-CT-scans from patients treated with lung cancer. For 20 consecutive months starting in June 2023, thoracic OAR were auto-segmented using this b-AI in all patients treated with RT for lung cancer at Aarhus University Hospital. Long-course RT was planned on CE 4D-CT-scans, whereas stereotactic RT was mainly planned on non-CE 4D-CT-scans. Uncorrected and corrected b-AI contours were saved and compared for monitoring. Esophageal outliers were defined as having 2mm Surface Dice Similarity Coefficient (SDSC)< 0.8 and/or mean Hausdorff Distance> 2mm. Two oncologists reviewed all outlier cases to assess failure modes, b-AI failure characteristics and associated traits of anatomical anomalies. Chi-squared or Fisher’s exact test were applied to assess differences in frequencies between CE- versus non-CE contours. Results: Among 402 consecutive patients, 53 outliers (13.1%) were confirmed as failures (table 1). More failures occurred in non-CE vs. CE planning CT scans (p=0.00001). In all CE scans an obvious failure mode was determined, whereas this was the case in only 52% of non-CE scans. The dominant failure-mode overall (n=26) was abnormal anatomy, mainly presenting as aberrant esophageal dimensions or trajectory. Three cases of abnormal anatomy were due to prior esophageal or pulmonary surgery, and eight additional cases were due to gross hiatal herniation. There were no significant differences in AI-failure characteristics or anatomical failure causes between scan types. Most AI-failure characteristics would be reflected in the cranio-caudal extent of the esophageal

Conclusion: Uncertainty maps for high-quality contours did not significantly affect manual editing times, with human factors exerting a stronger influence. Improving trust and usability of uncertainty maps may be key to enhancing efficiency; however, their clinical impact requires further assessment per use-case. References: [1] Rogowski V, Svalkvist A, Maspero M, Janssen T, Maruccio FC, et al. Impact of deep learning model uncertainty on manual corrections to MRI ‐ based auto ‐ segmentation in prostate cancer radiotherapy. J Applied Clin Med Phys 2025;26:e70221. https://doi.org/10.1002/acm2.70221. [2] Ferreira Silvério N, Van Den Wollenberg W, Betgen A, Wiersema L, Marijnen C, et al. Evaluation of Deep Learning Clinical Target Volumes Auto-Contouring for Magnetic Resonance Imaging-Guided Online Adaptive Treatment of Rectal Cancer. Advances in Radiation Oncology 2024;9:101483.

https://doi.org/10.1016/j.adro.2024.101483. Keywords: Contouring, editing, uncertainty

S1549

Physics - Autosegmentation

ESTRO 2026

Villalonga Mur 1 , Magdalena Lafuente Alconchel 1 , Moisés Mira Flores 1 1 Radiation Oncology, H.U Arnau de Vilanova, Lleida, Spain. 2 GREISI, IRB, Lleida, Spain. 3 Radiation Physics Department, H.U Arnau de Vilanova, Lleida, Spain Purpose/Objective: Accurate delineation of cardiac substructures is crucial in breast cancer radiotherapy to minimize radiation exposure and late cardiac toxicity. The use of artificial intelligence (AI) in contouring has the potential to improve consistency and efficiency, but its accuracy in complex and variable structures, such as the heart chambers and coronary arteries, requires validation. This study aimed to evaluate the concordance between AI-based contouring using the MVision® Contour+ software and manual delineations performed by physicians with different levels of experience. Material/Methods: A total of 125 patients with breast cancer undergoing radiotherapy were included. Four radiation oncologists (two consultants and two residents) contoured the main cardiac substructures—left and right atria, left and right ventricles, and coronary arteries—on planning CT images. The Dice Similarity Coefficient (DSC) was used to quantify spatial overlap between contours, and interobserver variability (Δmean) was also calculated. Statistical comparison between AI and physicians was performed using paired tests, with p < 0.05 considered significant. Results: The mean DSC values (±SD) for MVision® vs. physicians were:Left atrium: 0.827 ± 0.004 vs. 0.813 ± 0.005 (p = 0.19)Right atrium: 0.778 ± 0.006 vs. 0.777 ± 0.006 (p = 0.93)Right ventricle: 0.797 ± 0.004 vs. 0.793 ± 0.005 (p = 0.72)Left ventricle: 0.870 ± 0.004 vs. 0.885 ± 0.004 (p = 0.82)Coronary arteries (A_LAD): 0.366 ± 0.010 vs. 0.378 ± 0.013 (p = 0.91)No statistically significant differences were found in any structure. Interobserver Δmean ranged from 0.003–0.008 for cardiac chambers, indicating high reproducibility, and 0.009–0.014 for coronary arteries, reflecting greater variability in smaller and less-defined structures.

contour, suggesting that simple thresholds for esophageal dimensions could be utilized to flag possible failures up-front.

Conclusion: While the CE-scan-based b-AI performed well for esophageal contouring in CE scans, performance in non-CE scans was worse, and less predictable, demonstrating clinical importance of data bias in application of AI-tools. Anatomical failure characteristics suggested that cases at high risk of failure could be identified up-front and that development of quantitative thresholds for flagging failures before undertaking corrections may be possible. Keywords: failure modes, monitoring

Digital Poster 732

Artificial Intelligence vs. Physicians in Cardiac Contouring for Breast Cancer Radiotherapy: A Comparative Analysis of 125 Patients Virginia García Reglero 1,2 , Sara Vázquez González 1 , Luis Ramos García 3 , Oscar Ripol Valentin 3 , Lucía Tueros Farfan 1 , José David González Gómez 1 , Priscila Bernard Contreras 1 , Manuela Bermúdez Zubiría 1 , Alejandro Rodríguez Gutierrez 1 , Elena García Alonso 1 , Amaya Gracia Sanjuan 1 , Daniel José Lueza Gistau 1 , Mireia

S1550

Physics - Autosegmentation

ESTRO 2026

Conclusion: AI-based auto-segmentation with MVision® demonstrated excellent agreement with manual contours for the four cardiac chambers, achieving DSC values >0.78 in all cases and >0.85 for the left ventricle. The algorithm’s performance was comparable to that of experienced physicians, with no significant differences across expertise levels. The coronary arteries showed the lowest DSC values (<0.4), consistent with their small volume and poor visibility on standard planning CT. Both human and AI contours exhibited similar limitations, highlighting the need for multimodality imaging or model retraining for these structures.Overall, these findings support the safe integration of AI-assisted cardiac contouring into clinical workflows. AI provides reliable delineations for major cardiac chambers, reduces contouring time, and offers a consistent starting point for expert review,

Material/Methods We assembled 322 pelvic CT volumes; 215 local datasets and 107 from The Cancer Imaging Archive (TCIA) and split them into two groups (258 train, 64 validation). A lightweight 2D U-Net encoder was first pretrained without labels to predict the subsequent CT slice from tri-planar inputs (axial, sagittal, coronal), then was fine-tuned (Figure 1) for multi-organ segmentation of bladder, femoral heads, penile bulb (PB), rectum, prostate and seminal vesicles (SV). Baselines were identical U-Nets trained from scratch with 1-channel and 3-channel inputs to isolate representation-learning effects from input formatting. Two regimens were studied: a full-label dataset (n=258) and a reduced-label dataset (n=60) to emulate limited-annotation settings. Evaluation on held-out patients used Dice Similarity Coefficient (DSC) and Mean Distance Agreement (MDA, mm). Paired t-tests with Bonferroni correction (significance p<0.0083) were used to assess differences. Implementation remained lightweight, with no teacher–student distillation or pseudo-labelling and training and evaluation were identical across each model.

thereby contributing to improved efficiency, standardization, and patient safety in breast radiotherapy planning. References:

Tsang Y, Hoskin P, Spezi E, Landau D, Lester J, Miles E, Conibear J. Assessment of contour variability in target volumes and organs at risk in lung cancer radiotherapy. Technical Innovations & Patient Support in Radiation Oncology. 2019;10:8–12González-del Portillo E, Hernández-Rodríguez J, Tenllado-Baena E, Fernández-Lara Á, Alonso-Rodríguez O, Matías-Pérez Á, Cigarral-García C, García-Álvarez G, Pérez- Romasanta LA.Cardiac segments dosimetric benefit from deep inspiration breath hold technique for left- sided breast cancer radiotherapy.Rep Pract Oncol Radiother. 2024;29(2): Keywords: Cardiac structures, Contouring variability, IA Learning Anatomy from Unlabelled CT Volumes: An Unsupervised Framework for Improving Prostate Radiotherapy Segmentation Diyana Afrina Hizam, Ngie Min Ung, Marniza Saad, Firdaus Mohd Salleh, Asyraf Muaadz, Li Kuo Tan Department of Clinical Oncology, Faculty of Medicine, Universiti Malaya, Kuala Lumpur, Malaysia Purpose/Objective To test whether unsupervised next-slice prediction pretraining can improve pelvic CT organ segmentation for prostate radiotherapy when labels are limited. To our knowledge, this is the first report of using a novel lightweight 2D U-Net with next-slice prediction as a general pretraining strategy for CT segmentation. Digital Poster 765

Results The novel lightweight 2D U-Net next-slice pretraining improved boundary accuracy across structures. In the full-label dataset settings, MDA decreased for bladder (0.779 → 0.547 mm), femoral heads (0.900 → 0.669 mm), PB (1.688 → 1.283 mm), rectum (1.443 → 0.994 mm), prostate (1.474 → 1.183 mm), and SV (1.201 → 0.893 mm). In a direct data efficiency comparison, the unsupervised model in the reduced-label dataset settings (n=60) achieved MDA comparable to or better than a fully supervised scratch model trained on the full-label dataset (n=258), with the largest gains for small/low-contrast organs (PB, SV, rectum). Organ-wise MDA (lower is better) for pretrained-reduced versus scratch-full is shown in Figure 2. Using three identical input channels did not improve over a single channel, indicating that the benefit arises from representation learning rather than input format. Several organ-wise MDA reductions remained significant after Bonferroni correction (p<0.0083).

S1551

Physics - Autosegmentation

ESTRO 2026

1 Medical Physics & Clinical engineering, Guy’s & St. Thomas’ NHS Foundation Trust, London, United Kingdom. 2 Urology, Guy’s & St. Thomas’ NHS Foundation Trust, London, United Kingdom. 3 Clinical Oncology, Guy’s & St. Thomas’ NHS Foundation Trust, London, United Kingdom. 4 School of Biomedical Engineering & Imaging Sciences, King's College London, London, United Kingdom. 5 AI & Data Analytics, FH Technikum Wien, Vienna, Austria Purpose/Objective: In radiotherapy, accurate segmentation of organs at risk on planning CT scans is critical for effective treatment. A deep-learning (DL) pelvic auto- segmentation model developed at Guy’s and St. Thomas’ NHS Foundation Trust [1] has reduced manual segmentation workload [2], whilst aiming to improve consistency. This study evaluates the model’s performance using quantitative metrics over three distinct stages of clinical implementation – testing, prospective evaluation, and embedded clinical service – to assess its continued utility, accuracy, stability and potential user automation bias over time. Material/Methods: The segmentation model was evaluated at three distinct stages: at model testing (stage 1, datasets n = 10), prospective clinical evaluation (stage 2, n = 20), and embedded clinical service (stage 3, n = 20). Datasets in stage 3 were auto-segmented a median of 88 days after day 1 of clinical implementation (range 63 days to 110 days). Accuracy was assessed using volumetric Dice Similarity Coefficient (vDSC) and surface DSC (sDSC) against clinician-approved structures. Statistical significance between time-points was evaluated using a Mann-Whitney U test (p,U). Results:

Conclusion Unsupervised next-slice prediction pretraining with a novel lightweight 2D U-Net improves pelvic CT segmentation and reduces annotation needs. It achieves clinically relevant performance with fewer labels, with clear benefit for small, low-contrast organs. Because it avoids pseudo-labels and heavy teacher–student frameworks, the method is simple and computationally light, supporting adoption in resource-constrained radiotherapy workflows. References 1. Wolf, D., et al., Self-supervised pre-training with contrastive and masked autoencoder methods for dealing with small datasets in deep learning for medical imaging. Scientific Reports, 2023. 13(1): p. 20260. 2. Isensee, F., et al., nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 2021. 18(2): p. 203- 211. 3. Peng, J., et al., Boundary-aware information maximization for self-supervised medical image segmentation. Medical Image Analysis, 2024. 94: p. 103150. 4. Dominic, J., et al., Improving data-efficiency and robustness of medical imaging segmentation using inpainting-based self-supervised learning. Bioengineering, 2023. 10(2): p. 207. Keywords Next-slice prediction, Unsupervised, Segmentation

Digital Poster 959

Evaluating the Performance and Consistency of a Deep-Learning Pelvic Auto-Segmentation Model Across Clinical Stages in Radiotherapy Evi Markou 1 , Victoria Butterworth 1 , Luis Ribeiro 2 , Priyankah Patel 3 , Benjamin Taylor 3 , Ingrid White 3 , Anna Winship 3 , Andrew King 4 , Isabel Dregely 5 , Sally Barrington 4 , Teresa Guerrero Urbano 3 , Christopher Thomas 1

S1552

Physics - Autosegmentation

ESTRO 2026

Digital Poster 979 Improved Accuracy of Prostate CTV Autosegmentation Using Combined CT and MR Imaging: A Deep Learning Approach Chia-Yu Lai 1 , Yu-Te Wu 1 , Weir-Chiang You 2,3 , Tzu-Hsuan Chen 1 , Chi-Wen Jao 1,4 , Chun-Yi Lin 1 , Mau-Shin Chi 5 , Wei-Kai Lee 6 , Chia-Chi Wen 5 , Chung-Hsien Hsu 5 , Kai-Lin Yang 5,7 1 Institute of Biophotonics, National Yang Ming Chiao Tung University, Taipei City, Taiwan. 2 Department of Post-Baccalaureate Medicine, National Chung Hsing University, Taichung City, Taiwan. 3 Department of Radiation Oncology, Taichung Veterans General Hospital, Taichung City, Taiwan. 4 Department of Research, Shin Kong Wu Ho-Su Memorial Hospital, Taipei City, Taiwan. 5 Department of Radiation Therapy and Oncology, Shin Kong Wu Ho-Su Memorial Hospital, Taipei City, Taiwan. 6 Brain Research Center, National Yang Ming Chiao Tung University, Taipei City, Taiwan. 7 School of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan Purpose/Objective: Accurate delineation of the clinical target volume (CTV) is a crucial prerequisite for safe and effective radiotherapy in prostate cancer. However, compared with magnetic resonance imaging (MRI), computed tomography (CT) images have limited soft-tissue contrast, making it difficult to visualize the boundaries between adjacent structures. This study investigates the impact of integrating MRI with CT images on the performance of automatic CTV segmentation, compared with models trained and tested using CT images alone. Material/Methods: This retrospective study included 73 prostate cancer patients who underwent both CT and MRI before receiving definitive radiotherapy between 2013 and 2023. The ground truth contours were derived from RT Structure Set (RTSS) files and manually delineated by experienced radiation oncologists.The 3D U-Net model was trained for automatic segmentation of the prostate CTV. Patients were randomly divided into training and test sets in an 8:2 ratio. The models were optimized using the Generalized Dice loss function, trained for 500 epochs, and employed the Adam optimizer with a learning rate of .Performance was evaluated using the Dice similarity coefficient (DSC), recall, precision, and the 95th percentile Hausdorff distance (HD95). Results: For prostate CTV delineation, the model using combined CT and MRI inputs achieved a Dice similarity coefficient (DSC) of 0.76 ± 0.07, a recall of 0.71 ± 0.13, a precision of 0.84 ± 0.09, and an HD95 of 3.30 ± 1.88 mm. In comparison, the CT-only model yielded a DSC

Median vDSC values for stages 1, 2 and 3 time points were as follows: bladder (0.97, 0.99, 1.00), rectum (0.89, 0.99, 0.97), sigmoid (0.73, 0.89, 0.85), and bowel (0.91, 0.97, 0.98). Median sDSC values for stages 1, 2 and 3 were: bladder (0.95, 0.99, 1.00), rectum (0.89, 0.98, 0.95), sigmoid (0.74, 0.86, 0.79), and bowel (0.89, 0.97, 0.97). Significant differences were observed between stage 1 and stages 2 & 3 for all structures. Comparison between stage 1 and 3 revealed significant differences for all structures and metrics except for rectum (sDSC) and sigmoid (sDSC). With the exception of bladder, no differences were found between stages 2 and 3 for any structures or metric. Conclusion: The observed improvement in model performance between testing and later stages is likely primarily due to the difference in ground truth delineation methods: full manual delineation (model testing, stage 1) versus AI-structure editing (prospective evaluation and routine clinical use). The increase in quantitative scores confirms the model’s utility, as clinically- approved contours can be obtained with AI structure edits rather than full manual delineation. The equivalent performance observed between prospective evaluation and routine clinical use demonstrated consistent model accuracy and stability, with no clear evidence of user automation bias. Regular monitoring will continue. References: [1] Image-Based Deep Learning Enables the Reduction of Gastro-Intestinal Toxicity in Pelvic Radiotherapy, Thomas, C. (Author). 1 Oct 2023, Student thesis: Doctoral Thesis › Doctor of Philosophy[2] Implementation of in-house pelvic radiotherapy auto- contouring in real world clinical practice, Luis Ribeiro et. Al. RCR Global AI Conference 2025 Proceedings, 2025 Keywords: Autosegmentation, pelvis, metrics

S1553

Physics - Autosegmentation

ESTRO 2026

of 0.67 ± 0.11, recall of 0.57 ± 0.14, precision of 0.86 ± 0.09, and HD95 of 5.33 ± 2.76 mm. Paired t-tests demonstrated that the integration of MRI significantly improved the Dice score, recall, and HD95 (p < 0.05), whereas the difference in precision was not statistically significant.

orientation were optimized by minimizing the voxel- wise Hounsfield unit differences between the image and its mirror reflection, using a robust Huber-loss– based similarity metric applied exclusively to bony structures to avoid that large tumor masses bias the MSP location (Figure 1). Tumor extension beyond the MSP was quantified by the maximum distance of any contralateral GTV-T voxel. Contralateral lymphatic spread patterns were then evaluated as a function of MSP extension. Patients were grouped into distance cohorts: <0 mm (DC<0, lateralized tumors not extending over MSP), 0–10 mm (DC<10), 10–20 mm (DC<20), and >20 mm (DC>20).

Conclusion: The integration of MRI with CT resulted in statistically significant improvements in Dice score, recall, and HD95 for automatic prostate CTV segmentation, indicating enhanced boundary delineation and spatial accuracy. Although precision remained comparable between the two models, the overall findings suggest that incorporating MRI can meaningfully improve segmentation performance and may support more reliable radiotherapy planning. Keywords: prostate cancer, magnetic resonance imaging, CT Automated Mid-Sagittal Plane Detection to Analyze Contralateral Lymphatic Spread in Oropharyngeal Cancer Yoel Pérez Haas, Loris Keller, Roman Ludwig, Noemi Bührer, Esmée L Looman, Panagiotis Balermpas, Jan Unkelbach Radiation Oncology, University Hospital Zurich, Zurich, Switzerland Purpose/Objective: Oropharyngeal squamous cell carcinoma (OPSCC) extending beyond the mid-sagittal plane (MSP) are known to exhibit higher contralateral lymph node involvement compared to lateralized primary tumors1. This study aimed to (1) develop a robust, automated algorithm to detect the MSP from planning CT images, (2) quantify the primary tumor’s contralateral extension as a continuous geometric metric, and (3) analyze how this distance correlates with contralateral Digital Poster 1128 lymphatic spread. This work aims to provide a quantitative basis for estimating contralateral involvement risk and guiding elective irradiation

Results: Across all patients, 25 % exhibited cLNL involvement (LNL II 22 %, LNL III 9 %, LNL IV 3 %). Contralateral prevalence increased with primary-tumor extension beyond the MSP: 13 % for lateralized tumors (DC<0), 23 % for small extension (DC<10), 35 % for moderate extension (DC<20), and 43 % for extensive extension (DC>20) (Figure 2).Level-specific rates also rose with increasing MSP extension: in LNL II, prevalence increased from 11 % (DC<0) to 20 % (DC<10), 31 % (DC<20), and 43 % (DC>20); and in LNL III, from 2 % (DC<0) to 7.5 % (DC<10), 10.4 % (DC<20), and 22.7 % (DC>20).

decisions in OPSCC. Material/Methods:

A fully automated MSP detection algorithm was developed for head and neck CT imaging and evaluated on 198 OPSCC patients treated at the University Hospital Zurich. The MSP position and

S1554

Physics - Autosegmentation

ESTRO 2026

Thomas’ NHS Foundation Trust, London, United Kingdom. 4 AI & Data Analytics, FH Technikum Wien, Vienna, Austria Purpose/Objective: In radiotherapy (RT), accurate segmentation of organs at risk on planning CT scans is essential for optimal treatment planning. Deep-learning (DL) pelvic auto- segmentation models developed at Guy’s and St. Thomas’ NHS Foundation Trust [1][2] have reduced manual segmentation workload [2], with an additional aim to improve consistency. Prior to clinical implementation, qualitative scoring and timing studies were carried out as part of validation of clinical utility [2]. This study utilises efficient scripting to extract quantitative metrics for segmentation accuracy and investigates whether they offer an objective alternative to resource-intensive traditional clinical utility measures. Material/Methods: DL models auto-segmented bladder, bowel, rectum, sigmoid, femoral heads and penile bulb. Clinicians assigned quality scores (QS) using the MD Anderson Likert scale (1 to 5) [3] and recorded adjustment times (AT) for contour corrections [2]. Quantitative evaluation included volumetric Dice Similarity Coefficient (vDSC), the surface DSC (sDSC), Hausdorff distance (HD), and Added Path Length (APL). Pearson correlation coefficients (PCC) assessed the relationship between quantitative metrics, QS, and AT. Correlation strength was categorised as insignificant (absolute PCC (PCCabs) < 0.20), weak (0.20 ≤ PCCabs < 0.40), moderate (0.40 ≤ PCCabs < 0.75) or strong (PCCabs ≥ 0.75). Results: Quantitative metrics showed moderate to strong correlations with clinician-assigned QS for all structures except the bowel, and with adjustment times for all structures except the bowel and rectum. vDSC and sDSC metrics demonstrated stronger correlations across all structures compared to HD and APL, with mean PCC values of 0.70±0.30 (vDSC-QS), 0.71±0.25 (sDSC-QS), -0.55 ± 0.28 (vDSC-AT) and -0.61 ± 0.24 (sDSC-AT). Notably, sigmoid segmentation exhibited high variability in vDSC (range: 0.44-1.00) and sDSC (range: 0.47-1.00), yet maintained strong correlations with QS (PCC = 0.81, 0.82 respectively).

Conclusion: Automated MSP detection enables objective quantification of tumor extension across the midline and demonstrates a correlation between extension depth and contralateral lymphatic involvement. Continuous quantification of distance to MSP may allow for more accurate prediction of contralateral involvement patterns compared to binary distinction between tumors crossing the MSP or not. References: Ludwig, R., Pérez Haas, Y., Benavente, S., Balermpas, P., & Unkelbach, J. (2025). A probabilistic model of bilateral lymphatic spread in head and neck cancer. Scientific Reports,15(1). https://doi.org/10.1038/s41598-025-99978-7 Keywords: mid-sagittal plane, oropharynx-cancer, lymph nodes Evaluating the Clinical Utility of AI-Generated Auto-Segmentation: Correlation Between Quantitative Metrics and Clinician Assessment in Radiotherapy Evi Markou 1 , Victoria Butterworth 1 , Maram Alqarni 2 , Luis Ribeiro 3 , Ajay Aggarwal 3 , Gurdip Azad 3 , Victoria Harris 3 , Simon Hughes 3 , Stephen Morris 3 , Kirsty Morrison 3 , Vinod Mullassery 3 , Lydia Pascal 3 , Priyankah Patel 3 , Benjamin Taylor 3 , Sindu Vivekanandan 3 , Ingrid White 3 , Anna Winship 3 , Andrew King 2 , Isabel Dregely 4 , Sally Barrington 2 , Teresa GuerreroUrbano 3 , Christopher Thomas 1 1 Medical Physics & Clinical engineering, Guy’s & St. Thomas’ NHS Foundation Trust, London, United Kingdom. 2 School of Biomedical Engineering & Imaging Sciences, King's College London, London, United Kingdom. 3 Clinical Oncology, Guy’s & St. Digital Poster 1143

S1555

Physics - Autosegmentation

ESTRO 2026

Digital Poster 1274 SpecCTSegNet: A Physics-Informed Attention Network for Automated OAR and Tumor Segmentation in Multi-Energy PC-CBCT for Preclinical Radiotherapy Xinhong Wu 1 , Maxin Chen 1 , Julie Lascaud 1 , Daniel Berthe 2,3 , Franz Pfeiffer 2,3 , Katia Parodi 4 1 Department of Medical Physics, Ludwig-Maximilians- Universität München, Munich, Germany. 2 Department of Physics, Technical University of Munich, munich, Germany. 3 Munich Institute of Biomedical Engineering, Technical University of Munich, munich, Germany. 4 faculty of physics, Ludwig-Maximilians-Universität München, Munich, Germany Purpose/Objective: Accurate segmentation of organs-at-risk (OARs) and tumors in preclinical radiotherapy research is challenged by the low contrast-to-noise ratio of conventional cone-beam CT (CBCT). Photon-counting CBCT (PC-CBCT) provides multi-energy bin spectral data, offering improved tissue differentiation. We aimed to 1) develop and evaluate a novel deep learning architecture, SpecCTSegNet, designed to exploit this spectral information, and 2) quantify how segmentation accuracy scales with spectral binning (single-, dual-, vs. tri-bin). Material/Methods: We designed SpecCTSegNet, a U-Net-based architecture featuring three key innovations: a physics- informed input block to fuse spectral channels, residual blocks with Squeeze-and-Excitation (SE) attention for adaptive feature recalibration, and an Atrous Spatial Pyramid Pooling (ASPP) module to capture multi-scale context. A dedicated pipeline was developed to simulate realistic spectral PC-CBCT from 83 digital phantoms with expert-annotated structures (13 OARs + tumors per mouse) [1]. Spectral projections modeled polychromatic X-ray attenuation (65 kVp) through tissue-specific attenuation coefficients, incorporated an analytically generated 80-channel detector response (1-80 keV), and included Poisson noise. After applying a 20 keV threshold, projections were reconstructed using filtered back-projection. Three datasets were created: single-bin (20-80 keV), dual-bin (low: 20-36 keV, high: 37-80 keV), and tri-bin (low: 20-31 keV, medium: 32-41 keV, high: 42-80 keV). The network was trained independently for each configuration (75 training, 8 test cases) and evaluated using Dice Similarity Coefficient (DSC) and 95th percentile Hausdorff Distance (HD95). Results:

Conclusion: vDSC and sDSC offer the strongest correlations with clinician scoring and editing time, suggesting their potential as surrogates for clinical validation of AI- generated contours.However, poorer correlations were seen for bowel, highlighting challenges in using quantitative metrics for large structures that have high quantitative accuracy but contain discrepancies in localised clinically-significant areas, generating low QS and high AT. Encouragingly, the strong correlations observed for the highly variable sigmoid segmentation suggests that quantitative measures can reliably reflect clinical utility. This will enable more efficient assessment of improvements in sigmoid accuracy when refining and retraining the model. Whilst these findings support the use of quantitative metrics to streamline AI segmentation evaluation, further testing is required over further treatment regions and OARs. References: [1] Image-Based Deep Learning Enables the Reduction of Gastro-Intestinal Toxicity in Pelvic Radiotherapy, Thomas, C. (Author). 1 Oct 2023, Student thesis: PhD Thesis[2] Implementation of in-house pelvic radiotherapy auto-contouring in real world clinical practice, Luis Ribeiro et. Al. RCR Global AI Conference 2025 Proceedings, 2025[3] Automated Contouring and Planning in Radiation Therapy: What Is ‘Clinically Acceptable’? Baroudi H, et al. Diagnostics. 2023; 13(4):667. Keywords: Autosegmentation, pelvis, metrics

S1556

Physics - Autosegmentation

ESTRO 2026

mouse scans. Nat Commun 11:5626. Keywords: Photon Count CBCT ， deep learning ， Segmentation

Digital Poster 1393 GTVN auto-segmentation for head and neck cancer; iterative modelling, oncologist evaluation, and pathways to bias assessment Victoria Butterworth 1,2 , Michael Woodward 3,2 , Thomas Young 1,4 , Anil Mistry 3,2 , Christopher Thomas 3,2 , Sarah Misson 3,2 , Delali Adjogatse 1,4 , Mary Lei 1,4 , Philip Touska 5 , Dijana Vilic 3,2 , Andrew King 2 , Teresa Guerrero Urbano 1,4 1 Department of Radiotherapy, Guy's and St Thomas' NHS Foundation Trust, London, United Kingdom. 2 School of Biomedical Engineering and Imaging Sciences, King's College London, London, United Kingdom. 3 Department of Medical Physics, Guy's and St Thomas' NHS Foundation Trust, London, United Kingdom. 4 School of Cancer and Pharmaceutical Sciences, King's College London, London, United Kingdom. 5 Department of Radiology, Guy's and St Thomas' NHS Foundation Trust, London, United Kingdom Purpose/Objective: Commercial auto-segmentation tools are widely adopted for organs-at-risk in head and neck radiotherapy, yet a notable gap remains for gross tumour volumes (GTV)1. With increasing demands from adaptive workflows and ongoing workforce pressures, accurate and efficient auto-segmentation tools are vital to sustain high-quality patient care and throughput. We therefore developed, iteratively refined and evaluated a nodal GTV (GTVN) model trained on a federated institutional data lake2. Material/Methods: Patients were drawn from the retrospective RT-HaND dataset3. Eligible cases had oropharynx, hypopharynx, nasopharynx or larynx primaries, treated with definitive RT/chemoRT or cetuximab+RT, and contrast- enhanced planning CT with radiologist peer-reviewed GTVN contours. Three model stages were developed using nnU-Net v2 (3D_fullres, 1000 epochs, 5-fold cross-validation):Stage 1: node-positive patients only (n=135 train; n=32 test).Stage 2: node-positive + node- negative (n=270 train; same test).Stage 3: post- processing to remove volumes <10 mm equivalent diameter, with metrics recalculated on equivalently processed clinical contours.Primary endpoints were volumetric Dice Similarity Coefficient (vDSC), Surface Dice Coefficient (SDC), per-node sensitivity and precision. Paired two-tailed Wilcoxon signed-rank tests were used. Two clinical oncologists qualitatively scored Stage 3 outputs using a 5-point Likert scale4.

The combination of spectral data and specialized network architecture yielded substantial performance gains (Figure 1). Tri-bin achieved highest mean DSC (0.906), outperforming dual-bin (0.893) and single-bin (0.885). An ablation study confirmed SpecCTSegNet's attention mechanisms and multi-scale features were critical, robustly outperforming standard U-Net (DSC: 0.894 vs. 0.906 with tri-bin). For radiotherapy, tumor precision improved dramatically: HD95 decreased from 0.67 mm (single-bin) to 0.22 mm (tri-bin)—a 67% reduction. Notably, the tri-bin framework outperformed both manual expert segmentations (DSC: 0.906 vs. 0.813) [1] and state-of-the-art methods (AIMOS [2]: 0.889) on contrast-enhanced micro-CT, with excellent performance for critical organs: lung (97.3% DSC), heart (97.2%), intestine (94.9%), and liver (92.2%). Conclusion: We developed a physics-informed attention network that substantially improves automated OAR and tumor segmentation in preclinical CBCT, especially with tri- bin spectral data. SpecCTSegNet holds promise as a robust tool for high-fidelity delineation in treatment planning and response assessment, potentially eliminating the need for contrast agents, which can complicate dose calculations. Future work will validate this framework on experimental PC-CBCT data from our system under development. References: [1]Rosenhain S et al. (2018). A preclinical micro- computed tomography database including 3D whole body organ segmentations. Sci Data 5:180294.[2]Schoppe O et al. (2020). Deep learning- enabled multi-organ segmentation in whole-body

S1557

Physics - Autosegmentation

ESTRO 2026

Results: Stage 1: Median vDSC 0.78 (IQR: 0.68-0.88). Sensitivity per voxel was 0.77 but notably lower per node (0.66), reflecting the model’s propensity to overlook smaller positive nodes. A higher precision on a voxel basis (0.86 vs 0.76) highlights the model’s ability to delineate the boundaries of detected nodes well but a susceptibility to segment additional nodes. Stage 2: Inclusion of node-negative patients did not significantly change test vDSC or sensitivity or precision per node vs Stage 1 (paired Wilcoxon p>0.05).Stage 3: Post-processing increased per-node precision from median 0.83 to 1.0 (W=0.0, p<0.001), whilst sensitivity remained unchanged (median 1.0; W=3.0, p=1.0) reflecting that post-processing mostly removed small false positives.

world Head and Neck Cancer Dataset for Research,” Clin Oncol, vol. 47, p. 103935, Nov. 20254X. Chen et al., “Deep learning–based automatic segmentation of cardiac substructures for lung cancers,” Radiotherapy

and Oncology, vol. 191, p. 110061, Feb. 2024 Keywords: Artificial Intelligence, Bias, multi- disciplinary

Digital Poster Highlight 1454

Domain-specific fine-tuning of a 3D foundation model for automated head and neck organ at risk segmentation Yuqing Xia 1 , Samer Jabor 2 , Arkajyoti Roy 3 , Neil Kirby 1 , Nikos Papanikolaou 1 1 Radiation Oncology, University of Texas Health Science Center at San Antonio, San Antonio, USA. 2 Computer Science, St Mary's University, San Antonio, USA. 3 Operations & Analytics, University of Texas at San Antonio, San Antonio, USA Purpose/Objective: Accurate segmentation of organs at risk (OARs) is essential for high-precision head-and-neck (H&N) radiotherapy planning. Conventional convolutional neural network (CNN) architectures, such as nnU-Net, have demonstrated strong performance but require large, homogeneous datasets and substantial training time. Foundation models like the Segment Anything Model (SAM) provide flexible, prompt-based segmentation but have limited ability to capture the complex three-dimensional spatial context characteristic of medical imaging. This study fine-tunes a 3D SAM-Med model (SAM-FT) for automated segmentation of H&N OARs and targets through domain-specific adaptation, interactive prompting, and composite loss optimization. The performance of SAM-FT was compared with SAM-Med3D and nnU-Net in terms of segmentation accuracy, computational efficiency, and clinical applicability for radiotherapy contouring. Material/Methods: A total of 211 anonymized H&N CT datasets containing 20 delineated OARs from our cancer center were preprocessed and standardized. Eighty percent of the data were used for training and twenty percent for testing. SAM-FT was optimized using variable-length interactive prompting (1–20 simulated clicks) and a composite Dice–IoU–Cross-Entropy loss to enhance spatial and boundary accuracy. nnU-Net was trained using its automated configuration pipeline. Model performance was assessed through five-fold cross- validation using Dice similarity coefficient, Intersection-over-Union (IoU), precision and recall.

Qualitative scoring (n=30) for Stage 3 showed median Likert=3 (mean>3 for both clinicians). In cases without false positives/negatives, median=4 (range 3-5). vDSC showed weak correlation with Likert (ρ=0.19, p=0.32), while SDC correlated significantly (ρ=0.51, p=0.004), aligning better with clinician judgement. Conclusion: This model demonstrates strong performance on larger nodes and substantial precision gains after simple post-processing. Leveraging a federated data lake with rich demographic and clinical metadata enables future examination of bias and supports equitable model deployment. Multidisciplinary buy-in from radiologists and oncologists will be key to setting acceptance thresholds and defining use as a timesaving, editable baseline with acceptable clinical risk1. References: 1Royal College of Radiologists, “Autocontouring in Radiotherapy: Guidance for Clinicians,” 2024. Accessed: Apr. 25, 2025. https://www.rcr.ac.uk/media/rqjlnlny/rcr-auto- contouring-in-radiotherapy-2024.pdf2V. Butterworth et al., “Data-centric artificial intelligence and cancer research: construction of a real-world head and neck treatment data repository,” ESMO Real World Data and Digital Oncology, vol. 9, p. 100162, Sep. 20253T. Young et al., “RT-HaND_C: A Multi-Source, Validated Real-

Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page 8 Page 9 Page 10 Page 11 Page 12 Page 13 Page 14 Page 15 Page 16 Page 17 Page 18 Page 19 Page 20 Page 21 Page 22 Page 23 Page 24 Page 25 Page 26 Page 27 Page 28 Page 29 Page 30 Page 31 Page 32 Page 33 Page 34 Page 35 Page 36 Page 37 Page 38 Page 39 Page 40 Page 41 Page 42 Page 43 Page 44 Page 45 Page 46 Page 47 Page 48 Page 49 Page 50 Page 51 Page 52 Page 53 Page 54 Page 55 Page 56 Page 57 Page 58 Page 59 Page 60 Page 61 Page 62 Page 63 Page 64 Page 65 Page 66 Page 67 Page 68 Page 69 Page 70 Page 71 Page 72 Page 73 Page 74 Page 75 Page 76 Page 77 Page 78 Page 79 Page 80 Page 81 Page 82 Page 83 Page 84 Page 85 Page 86 Page 87 Page 88 Page 89 Page 90 Page 91 Page 92 Page 93 Page 94 Page 95 Page 96 Page 97 Page 98 Page 99 Page 100 Page 101 Page 102 Page 103 Page 104 Page 105 Page 106 Page 107 Page 108 Page 109 Page 110 Page 111 Page 112 Page 113 Page 114 Page 115 Page 116 Page 117 Page 118 Page 119 Page 120 Page 121 Page 122 Page 123 Page 124 Page 125 Page 126 Page 127 Page 128 Page 129 Page 130 Page 131 Page 132 Page 133 Page 134 Page 135 Page 136 Page 137 Page 138 Page 139 Page 140 Page 141 Page 142 Page 143 Page 144 Page 145 Page 146 Page 147 Page 148 Page 149 Page 150 Page 151 Page 152 Page 153 Page 154 Page 155 Page 156 Page 157 Page 158 Page 159 Page 160 Page 161 Page 162 Page 163 Page 164 Page 165 Page 166 Page 167 Page 168 Page 169 Page 170 Page 171 Page 172 Page 173 Page 174 Page 175 Page 176 Page 177 Page 178 Page 179 Page 180 Page 181 Page 182 Page 183 Page 184 Page 185 Page 186 Page 187 Page 188 Page 189 Page 190 Page 191 Page 192 Page 193 Page 194 Page 195 Page 196 Page 197 Page 198 Page 199 Page 200

www.estro.org

Made with FlippingBook - Share PDF online