ESTRO 2026 - Abstract Book PART II

S2103

Physics - Image acquisition and processing

ESTRO 2026

Across 15 episodes on a single case, the exploration– exploitation policy yields a clear improvement in both cumulative reward and final score (2.a). Episode-wise trends show a steady increase in the continuous score (2.b), indicating consistent policy refinement. Within episodes, the step-level dynamics stabilize: the differential reward converges to a positive value, meaning each action contributes additively to the global return and the policy reaches a locally steady regime (2.c). Because the optimization objective is dominated by the learned reward model, we expect the RM-score to increase over episodes, showing alignment between the policy updates and the reward signal. Most evaluation metrics move in the desired direction: NCC increases while stdJ, and fold rate decrease, and DICE remains essentially stable within an acceptable fluctuation range. Together, these results demonstrate effective learning with improving image alignment quality and regularization (low folds), consistent with a policy that transitions from exploration to exploitation as it converges toward a higher-reward operating point. Conclusion: The proposed method is able to focus on expert feedback, to reduce non-physical deformations and increase consistency on propagated contours across phases with downstream checks on motion-aware planning metrics. References: [1] Kaufmann, T., Weng, P., Bengs, V., & Hüllermeier, E. (2024). A Survey of Reinforcement Learning from Human Feedback. arXiv preprint arXiv:2312.14925.[2] Hu, M., Zhang, J., Matkovic, L., Liu, T., & Yang, X. (2023). Reinforcement learning in medical image analysis: Concepts, applications, challenges, and future directions. Journal of Applied Clinical Medical Physics, 24, e13898. https://doi.org/10.1002/acm2.13898 Keywords: Reinforcement Learning, DIR, Adaptative RT Verification of metal artifact reduction effectiveness in Varian HyperSight system Bartosz Pawalowski 1 , Ewelina Nowak 2 , Zuzanna Wroblewicz 2 , Maksymilian Wosicki 2 , Tomasz Piotrowski 2,3 1 Medical Physics Department, Greater Poland Cancer Centre, Poznan, Poland. 2 Medical Physics Department, Greater Poland Cancer Centre, Pozna ń , Poland. 3 Department of Electroradiology, Poznan University of Medical Sciences, Poznan, Poland Digital Poster 5110

Results: Results from an example training run (15 episodes × 100 steps) on a single registration case (patient/phase).

Made with FlippingBook - Share PDF online