S1385
Interdisciplinary - Health economics & health services research
ESTRO 2026
and prostate cancer (6.7%) as those most frequently investigated. Conclusion: LLM-based data extraction systems can achieve high performance in retrieving structured information from study summaries of radiotherapy trials, enabling scalable, automated curation of large databases. This offers a powerful approach for clinical research and facilitates the realization of efficient, transparent and auditable pipelines. Nevertheless, human-based verification by manually classifying data-subsets is crucial to verify the system achieves sufficient performance and answers questions like a human researcher would do. References: Christ SM, Fritsak M, Kobeissi G, et al. Navigating Scientific Progress in Radiation Oncology: Comprehensive Analysis of Clinical Trials From the Past Two Decades Using the ClinicalTrials.gov Database. JCO Glob Oncol. 2025 Apr;11:e2400615. doi: 10.1200/GO-24-00615.Dennstädt F, Fauser S, Cihoric N, et al. Implementing a Resource-Light and Low-Code Large Language Model System for Information Extraction from Mammography Reports: A Pilot Study. J Imaging Inform Med. 2025 Sep 10. doi: 10.1007/s10278-025-01659-4.Dennstädt F, Zink J, Putora PM, et al. Title and abstract screening for literature reviews using large language models: an exploratory study in the biomedical domain. Syst Rev. 2024 Jun 15;13(1):158. doi: 10.1186/s13643-024-02575- 4. Keywords: Clinical Trials, Automated Data Extraction, LLM A review of radiotherapy clinical trial provision across two regional specialist cancer centres Lee Whiteside 1 , Sharon McGinn 2 , Daniel Hutton 1 1 Radiotherapy, The Christie NHS Foundation Trust, Manchester, United Kingdom. 2 Radiotherapy, The Clatterbridge Cancer Centre, Liverpool, United Kingdom Purpose/Objective: Research conducted through clinical trials is essential for evidence generation pertaining to efficacy of new therapies. However, clinical trial availability is a key component to patient accessibility. To recruit to a study, it must first be open at a local radiotherapy provider. Furthermore, efforts should be made to balance research portfolios to provide broad equity of Digital Poster 731 access to advanced treatments. In England, the Specialised Services Clinical Networks (SSCN) are tasked with promoting inclusive access to radiotherapy within defined geographical regions. This
Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany. 5 BIH Biomedical Innovation Academy, BIH-Charité (Junior) Clinician Scientist Program, Berlin Institute of Health der Charité - Universitätsmedizin Berlin, Berlin, Germany Purpose/Objective: The rapid growth of oncological publications and trials has created a need for automated methods of knowledge analysis. Large language models (LLMs) show promise in automated classification and data extraction and could be highly valuable for clinical trial data curation. Radiation oncology, with its fast- expanding evidence base and heterogeneous reporting registries such as ClinicalTrials.gov, could highly benefit from such automated data extraction. Material/Methods: We evaluated an LLM-based data extraction system on radiation oncology trials registered on ClinicalTrials.gov. 47 questions covering different aspects such as trial design, tumor type, stage, radiotherapy modality, systemic therapy, and outcomes were defined. For given trial summaries, these questions were answered using the data- element-extractor Python library, which is a framework developed for LLM-based data extraction, utilizing the LLM Rombos-LLM-V2.6-Qwen-14, which is a model with demonstrated high performance in previous related studies. In a first step the performance of the system in answering the defined question was evaluated on a ground truth dataset of 94 randomly selected, manually classified radiotherapy trials. The system performance was assessed using accuracy, precision, recall, and F1- score. In a second step an analysis of a comprehensive dataset of 4,253 radiotherapy trials was conducted for questions with verified, sufficiently high performance, defined as F1-score ≥ 80.0% for Boolean questions or an accuracy ≥ 80.0% for Non-Boolean questions. Results: Across all 47 questions, overall accuracy (=exact- match-ratio) was 89.3%. On Boolean questions (n=40), the system achieved an accuracy of 90.2%, precision of 86.6%, recall of 61.0%, and an F1-score of 71.6%. High performance was confirmed for 23 questions, including key questions on tumor type and radiotherapy techniques, though results varied with insufficient performance on individual questions. The automated analysis of the entire dataset identified 76.7% of trials as randomized and 27.3% as multicentric. Chemotherapy was reported in 51.6%, immunotherapy in 24.4%, and hormonal therapy in 4.5% of trials. Regarding radiotherapy techniques, 13.5% trials included SBRT, 11.2% IMRT, 2.8% brachytherapy, and 2.5% proton therapy. Among tumor entities, the system identified NSCLC (11.5%)
Made with FlippingBook - Share PDF online