ESTRO 2026 - Abstract Book PART II

S1573

Physics - Autosegmentation

ESTRO 2026

Poster Discussion 2671

acceptability (average: 1.8; Fig.2). RTT1 achieved the highest scores across all structures (2.8–3.0; Fig.2). RTT4 consistently rated colleagues’ delineations lower and expressed uncertainty in several cases (Fig.2). In the questionnaire, RTT4 confirmed their critical approach and indicated relying on clinical information (such as potential resections and tumor location) when uncertain.

Individual editing patterns of auto-segmentation in clinical use and their relationship with clinical acceptability Rita Simões 1 , Sandra van der Velden 1 , Mark J Gooding 2,3 , Djamal Boukerroui 2 , Peter Remeijer 1 , Tomas M Janssen 1 1 Department of Radiation Oncology, Netherlands Cancer Institute, Amsterdam, Netherlands. 2 Inpictura Limited, ., Abingdon, United Kingdom. 3 Division of Cancer Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, United Kingdom Purpose/Objective: Organs-of-interest (OOI) auto-segmentations are routinely manually edited, but the absence of objective editing criteria leads to deviations [1] that may reduce efficiency or quality, undermining the purpose of auto-segmentation. These deviations are rarely assessed for clinical acceptability. Building on a prior longitudinal analysis [2], this study aimed to quantify editing variability among five RTTs within one clinic and evaluate whether observed differences align with each RTT’s perception of clinical acceptability. Material/Methods: Auto-generated and clinical OOI contours were collected from 195 head and neck patients treated at our institute between January 2023 and June 2024. Clinical contours were based on deep learning auto- segmentations (Mirada DLC) and edited by RTTs. The responsible RTT was recorded for each patient. The analysis focused on the five RTTs with the highest case volume and segmentations of the left parotid and submandibular gland, oral cavity and glottis. AIQUALIS (Inpictura Ltd., Abingdon, UK) was used for the analysis. Editing behavior was quantified using the normalized added path length (nAPL: APL divided by the clinical structure path length).For each RTT- structure combination, we selected three cases (minimum, median, maximum nAPL) and showed their clinical segmentations blindly to all RTTs, who classified them as clinically acceptable (1) or unacceptable (0). Adding the scores per RTT reflects the number of contours (ranging from 0 to 3) considered clinically acceptable. An open-ended questionnaire explored sources of editing variability. Results: RTT5 edited the left parotid significantly less than their colleagues (Fig.1), and RTT5’s contours were rated the least clinically acceptable (average score: 2.2; Fig.2). RTT3 made more extensive oral cavity edits (Fig.1), but these were also considered less clinically acceptable (average score: 2.2; Fig.2). RTT2 showed high local variability in editing the left submandibular (Fig.1, 3D map), and received the lowest overall scores in clinical

Conclusion: Both excessive and minimal editing were associated with lower peer-rated clinical acceptability. To fully realize the benefits of auto-segmentation in practice, quality assurance should address not only model performance but also their use. Identifying deviations in editing behavior should prompt efforts toward standardization, ensuring edits are performed in a manner that maximizes workflow efficiency and treatment quality. References: [1] Nealon, K. A., Han, E. Y., Kry, S. F., Nguyen, C., Pham, M., Reed, V. K. et al. (2024). Monitoring variations in the use of automated contouring software. Practical radiation oncology, 14(1), e75-

Made with FlippingBook - Share PDF online