S2307
Physics - Machine learning and AI algorithms
ESTRO 2026
high accuracy in extracting most clinical variables from initial consultation notes (Figure 2). Variables such as age, sex, smoking history, and histology were extracted with high accuracy ( ≥ 98%). Tumor location and RT intent were identified correctly in 75.5% and 65.7% of cases, respectively, likely reflecting inconsistent documentation.. The model showed the lowest accuracy for disease stage ( ≥ 3), with correct extraction in 57.3% of cases, likely due to variability in staging formats (e.g. TNM versus I–III).
Digital Poster Highlight 3355 Large language model-based automation of clinical data extraction in esophageal cancer Chloe Min Seo Choi 1 , Sulov Chalise 2 , Nikhil P Mankuzhy 3 , Andreas Rimner 4 , Abraham J Wu 3 , Anyi Li 1 , Harini Veeraraghavan 1 1 Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, USA. 2 Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, USA. 3 Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, USA. 4 Department of Radiation Oncology, University of Freiburg, Freiburg, Germany Purpose/Objective: Prediction of treatment outcomes and radiotherapy (RT)-induced toxicities is an active area of research in radiation oncology. Traditional and deep learning models require large numbers of carefully curated data for providing accurate and generalizable performance. Manual curation of clinical variables is time-consuming. Hence, the goal of this work was to leverage the capabilities of large language models (LLMs) to assist with clinical data curation from free- text patient medical records Material/Methods: A retrospective dataset from 143 patients with esophageal cancer who received RT was included in this feasibility study. Initial radiation oncology consultation notes were collected for all patients. An in-house automated structured variable extraction pipeline was developed using text embeddings (text- embedding-ada-002) and a GPT-4–based LLM deployed through the Azure OpenAI Service. Standardized model inputs were generated using the LangChain library in Python (ver 3.8.19), consisting of a predefined prompt template combined with each patient’s consultation note (Figure 1). Using this pipeline without any finetuning, the following clinical variables were extracted: age, sex, cancer stage bin ( ≥ III), smoking status, tumor location, histology, and RT intent. The extracted variables were compared with a clinician-curated ground truth database to evaluate model performance.
Figure 2: Accuracy of clinical variable extraction using an LLM versus manually curated data Conclusion: This study demonstrated the feasibility of using an LLM to automatically extract clinical variables from unstructured consultation notes in esophageal cancer patients. The model performed well for simple demographic variables but was less accurate for context-dependent ones such as disease stage and RT intent. Inaccuracies in stage and RT intent extraction resulted from variability in documentation. These results suggest that LLMs can streamline data preprocessing for outcome studies, though further domain-specific optimization is needed to improve reliability. Keywords: large language model, extraction, prediction Whole Slide Image Interpretability Agent for Translational Integration in Radiotherapy Juan Felipe Duran, Martin Vallières, Shirin Abbasinejad Enger Medical Physics Unit, Department of Oncology, McGill University, Montreal, Canada Purpose/Objective: Conventional whole slide image (WSI) interpretability in oncology relies on attention heatmaps, which often highlight visually prominent regions without clarifying which tissue compartments drive prognostic performance. We introduce a WSI Interpretability Agent (WSI-IA) that allows a pathologist or a radiation Digital Poster 3411
Figure 1: Overview of the proposed clinical variable extraction framework Results: The proposed LLM extraction pipeline demonstrated
Made with FlippingBook - Share PDF online