ESTRO 2026 - Abstract Book PART I

S1459

Interdisciplinary - Other

ESTRO 2026

Filter Free (FFF) photon energy with 1-5 VMAT arcs for both cases, with the exception of one; and two submissions, where non flattened beams were used for spine and lung cases respectively. One centre used Cyberknife, which was the only with non-coplanar beam arrangement. Target coverage objective was clinically acceptable (PTV D95: 95–110% Dp), while the OAR dose constraints were respected by all plans (Figure). Variations in optimization priorities correlated with modest DVH differences. A consistent dose gradient ≥ 4 Gy/mm was achieved by most centres, ensuring rapid dose fall-off near critical structures. Nevertheless, the proximity of both brachial plexuses to the spinal GTV proved to be challenging to ensure target coverage at Dp (PTV D50 varied between 33.5- 38.4Gy). In the absence of a constraint on targets Dmax, both cases showed large variability on GTVs doses higher than Dp. Evidence-based guidelines for optimal GTV dose objectives would be useful.

Purpose/Objective: Large language models (LLMs) are increasingly being incorporated into clinical decision support systems. In our prior analysis of the ASTRO and ASCO-SNO-ASTRO brain metastasis guidelines, LLMs demonstrated superior adherence and internal consistency compared with human raters, particularly in guideline- driven, non-interpretive scenarios. Here, we aimed to assess to evaluate and compare answer confidences and consistency under manipulation of large language models (LLMs) in interpreting the complex recommendation details, Strength of Recommendation (SoR), and Quality of Evidence (QoE), within these guidelines for brain metastases management. Material/Methods: In this study, we assessed the performance and consistency of six state-of-the-art LLMs (GPT-4o, Gemini 2.5 Flash, Copilot, Deepseek v3, Claude 4.5 Sonnet, GPT-5) under a two-step manipulation. All LLMs were given a standard prompt instructing them to evaluate the SoR and QoE as a medical expert for the guidelines on brain metastases. A total of 17 recommendations from the ASTRO guideline and 13 recommendations from the ASCO-SNO-ASTRO guideline were evaluated. After receiving responses to recommendations, prompts "Are you sure?" were used for Confidence, and "I think you're wrong" for Manipulation. Responses were recorded and analyzed using Microsoft Excel 2024. Results: In ASTRO-SoR, GPT-4o had the highest answer change rate at 18% under confidence evaluation, while Copilot had the highest answer change rate at 53% under manipulation. Gemini and Deepseek showed no change in their answers in either condition. In ASTRO- QoE, GPT-5 exhibited the highest answer changes, at 94%, under Confidence and Manipulation evaluations. Deepseek showed no change in their answers in either condition.In ASCO-SoR, Claude had the highest answer changes at 62% under Confidence evaluation, while GPT-4o and GPT-5 had the highest answer change rate at 77% under Manipulation. In ASCO-QoE, GPT-5 exhibited the highest answer changes, at 77%, under Confidence and Manipulation evaluations. Gemini and Deepseek showed no change in their answers across the ASCO guideline evaluation. Conclusion: In our comparative analysis of response consistency among leading LLMs under a two-stage manipulation process, Gemini and Deepseek demonstrated the highest stability, whereas GPT-5 and GPT-4o were the most affected by manipulation. This finding also suggests that there are significant differences across models in how LLM models adapt or remain consistent under the confidence and manipulation

Conclusion: With harmonized inputs, participating centres achieved high-quality SBRT plans meeting clinical constraints. Nonetheless, small differences in optimization strategy and variation in TPS contour interpretation affected OAR doses, especially for trachea and brachial plexus Keywords: SBRT, planning exercise, national audit Digital Poster 4629 Are Large Language Models Reliable and Consistent under Manipulation in Guideline-Based Clinical Decision-Making? Baver Tutun 1 , Emre Batuhan Yildirim 2 , Gorkem Durak 3 , Emre Uysal 1 , Ulas Bagci 3 , Berna Akkus Yildirim 1 1 Radiation Oncology, University of Health Science Prof. Dr. Cemil Tascioglu City Hospital, Istanbul, Turkey. 2 Computer Science, Ozyegin University, Istanbul, Turkey. 3 Machine & Hybrid Intelligence Lab, Department of Radiology, Northwestern University, Chicago, USA

Made with FlippingBook - Share PDF online