ESTRO 2026 - Abstract Book PART II

S2316

Physics - Machine learning and AI algorithms

ESTRO 2026

Material/Methods: Single-institutional data from 609 head-and-neck cancer patients (train:487/val:61/test:61) was used, each with co-registered planning CT, PET, T1w-MRI, and T2w-MRI scans. A generative model was designed in the latent space of a pre-trained variational autoencoder (VAE) [2], with a 3D MedNeXt U-Net sampler [3] trained using Latent Rectified Flow (LRF) [4], which reformulates the generation as a deterministic ordinary differential equation (ODE) for efficient few-step latent sampling. These sampled latents are passed through the VAE decoder to generate quad-modal 3D phantoms at 1mm ³ isotropic resolution (Fig. 1).

Conclusion: We present a fast, proof-of-concept generative engine that synthesises co-registered, high-resolution quad- modal digital phantoms in the head-and-neck region. The approach enables few-step, deterministic sampling and yields statistically realistic, functionally aligned digital patient-like phantoms. Our approach provides a promising direction, and with further development, could help pave the way for scalable, diverse virtual patient cohorts, addressing a core demand for future VCTs and the validation of AI algorithms. References: [1] Faivre-Finn C, et al. The concept of virtual clinical trials: A game changer in radiation oncology research? Radiother Oncol. 2025; in press.[2] Guo P, et al. MAISI: Medical AI for Synthetic Imaging. WACV. 2025.[3] Roy S, et al. MedNeXt: Transformer-driven scaling of ConvNets for medical image segmentation. arXiv. 2023;2303.09975.[4] Liu X, et al. Flow straight and fast: Learning to generate and transfer data with rectified flow. ICLR. 2023.[5] Heusel M, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. NeurIPS. 2017.[6] Isensee F, et al. nnU- Net: Self-adapting framework for U-Net-based medical

Evaluation was performed on 106 randomly generated cases to assess:(i) Statistical fidelity, quantifying distributional similarity to real data via Fréchet Inception Distance (FID) [5] and Maximum Mean Discrepancy (MMD); and(ii) Functional alignment, using an nnU-Net [6] pre-trained on real data to segment the primary tumor and nodal gross tumor volumes (GTVt/n) from paired modalities (CT-PET vs. T2-PET). The Dice Similarity Coefficient (DSC) between synthetic modalities measured cross-modal consistency. Results: The generated cohort yielded FID=86.27; MMD=0.23. Visual inspection (Fig. 2) confirmed high realism and geometric consistency between modalities. Functional alignment, evaluated via the downstream segmentation task, yielded a mean aggregated GTV DSC of 0.53±0.35 between synthetic modalities (CT- PET vs. T2-PET), compared to 0.76±0.24 on the real test set.

Made with FlippingBook - Share PDF online