ESTRO 2026 - Abstract Book PART II

S1593

Physics - Autosegmentation

ESTRO 2026

modelling

among the 300 frames reviewed. Figure 2 shows the frequency of each contour being selected as correct. Both observers most frequently selected the SAM-2 average contour. Expert review confirmed the tool’s effectiveness, with specificity of 0.984 ± 0.005 and precision of 0.889 ± 0.000, with true positives defined as manual labels that were not selected as correct segmentation for an outlier frame. Contour differences were most pronounced in outlier frames. Conclusion: Our method accurately detects annotation errors, validated by expert review, offering a time-efficient alternative to manual dataset cleaning prior to DL model training. Future work will assess its impact on DL model performance.

Poster Discussion 4101 Automated annotation error detection and correction for manual 2D cine-MRI segmentations using Segment Anything 2 Pia A.W. Görts 1,2 , Rob H.N. Tijssen 1,2 , Marcel Breeuwer 2,3 , Coen W. Hurkmans 1,3 1 Department of Radiation Oncology, Catharina Hospital Eindhoven, Eindhoven, Netherlands. 2 Department of Biomedical Engineering, Technical University Eindhoven, Eindhoven, Netherlands. 3 Department of Electrical Engineering, Technical University Eindhoven, Eindhoven, Netherlands Purpose/Objective: MR-linacs enable intra-fractional tumour motion tracking via real-time cine MRI. While deformable image registration (DIR) is clinically used for tumour segmentation, deep learning (DL) methods are gaining traction due to their speed and comparable accuracy [1]. The performance of supervised DL models relies heavily on the quality of annotated data. Manual annotation of cine images is extremely time- consuming and is often done by one observer. This can lead to annotation errors, making dataset cleaning essential [2]. To reduce manual effort and improve data quality, we propose an automated method for detecting annotation errors in cine MRIs. Material/Methods: 0.35 T sagittal 2D cine-MRI data of 42 patients (TrackRAD2025 dataset [1]), was used for this study. Our outlier detection tool leverages Segment Anything 2 (SAM-2) to segment entire cine sequences from a single mask prompt. Prediction accuracy depends on prompt quality, which is tied to the manual input label. The method is visualised in Figure 1a where each frame of a sequence of N frames was used as mask prompt to segment the sequence yielding N x N SAM-2 predictions for each patient. A heatmap of Dice Similarity Coefficient (DSC) scores between SAM-2 outputs and manual labels was used to identify mislabelled outlier frames (Figure 1b). Additionally, a SAM-2 average contour was generated for each frame using the N predictions.Two radiation oncologists conducted a blinded review of 30 consecutive frames from 10 patients. For each frame, they were shown the SAM-2 default contour (using frame 0 as prompt), the SAM-2 average contour, and the original label. Experts could select one, multiple, or none as correct tumour segmentations. Results: The average percentage of outlier frames identified by the outlier detection tool across the 42 patients was 5.80% ± 3.59%. Nine outlier frames were identified

References: [1] Wang Y et al., Medical Physics 52, 2025. https://doi.org/10.1002/mp.17964[2]

Made with FlippingBook - Share PDF online