ESTRO 2026 - Abstract Book PART II

S2102

Physics - Image acquisition and processing

ESTRO 2026

Digital Poster Highlight 5066 Reinforcement Learning with Human-Guided Reward Model for Improved MRI/4D-CT Deformable Image Registration Unai HARLOUCHET 1 , Ziad KHEIL 2,1 , Emmanuelle Claeys 3 , Soleakhena Ken 1,2 1 Engineering and Medical Physics, Institut Universitaire du Cancer de Toulouse – Oncopole Claudius Regaud, Toulouse, France. 2 RADOPT, Centre de Recherches en Cancérologie de Toulouse (CRCT), Toulouse, France. 3 A.D.R.I.A, 6 Institut de Recherche en Informatique de Toulouse (IRIT), Toulouse, France

and the single-stage method inside the GI contours were 0.72 ± 0.13 and 0.57 ± 0.30 respectively.

Purpose/Objective: We propose to enhance deformable image

registration (DIR) through Reinforcement Learning from Human Feedback (RLHF) [1] to produce smooth, topology-preserving deformations and warp 3DMRI annotations across all 4DCT phases. We evaluate how human guidance improves deformation plausibility, inter-phase label fidelity, and overall clinical acceptability of contours and planning decisions, rather than relying on metric-driven optimisation alone. Material/Methods: We first collect deformation fields predicted by a trained DNN on 780 CT/MR pairs from different patients and respiratory phases (1.a). Then, expert raters assess the visual similarity between the warped 3DMR and a single phase of the fixed 4DCT images, on 160 randomly selected cases (1.b). These ratings are used to train and fine-tune a Attention-U-Net based reward model (RM) that captures human preferences for alignment quality [1].Reinforcement Learning (RL) [2] is then applied to refine the deformation fields based on the learned reward signal, preserving the initial smoothness, anatomical plausibility, and metric- based alignment while incorporating expert-aligned deformations. The optimization (1.c) emphasizes human-aligned quality while enforcing anatomical plausibility via penalties on foldings and Jacobian dispersion; conventional similarity/overlap terms are used only as auxiliary signals. Iterations stop when the human-aligned score stabilizes under safety constraints [2].

Two-stagesCTmethod.NetworkEpochsGas- Cavity DICEGlobal sCT DICEPix2pix500.72 ± 0.130.91 ± 0.01Standard singlestagesCTmethod.Pix2pix500.57 ± 0.300.99 ± 0.00Table 1. Comparisonbetweentwo- stagesCTmethodtothe standard single- stagesCTmethod. Mean and stdvalueswerecalculatedamongtest patients. Conclusion: We demonstrated that the autosegmentation- mediated DIR-free sCT generation enables excellent anatomical correspondence and demonstrates improved abdominal gas cavity definition between MRI and sCT images compared to a single stage MRI- CT DL sCT model. Future work will include a prospective dosimetric impact assessment incorporating assessments of MRgRT treatment plans. References: [1] Rippke, C., et al. (2024). A body mass index-based method for “MR-only” abdominal MR-guided adaptive radiotherapy. Zeitschrift für Medizinische Physik, 34(3), 456–467. https://doi.org/10.1016/j.zemedi.2022.12.001[2] Emami, H., et al. (2021). SA-GAN: Structure-Aware GAN for Organ-Preserving Synthetic CT Generation. In: de Bruijne, M., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2021. MICCAI 2021. Lecture Notes in Computer Science(), vol 12906. Springer, Cham. https://doi.org/10.1007/978-3- 030-87231-1_46[3] Thummerer, A., et al. (2025). SynthRAD2025 Grand Challenge dataset: Generating synthetic CTs for radiotherapy from head to abdomen. Medical Physics, 52. https://doi.org/10.48550/arXiv.2502.17609 Keywords: SyntheticCT, MRI-Only, DeepLearning

Made with FlippingBook - Share PDF online