Draft:Medical Image Translation

Medical Image Translation

Background

Medical imaging is extremely important across almost all areas of medicine.[1] It helps with diagnosis, treatment planning, and monitoring patients, but no one imaging modality can collect all information on a patient.[1] So, we have imaging methods like MRI for soft tissue contrast, CT scans for fast bone and tissue density, and PET scans for metabolic and functional activity.[1][2] In real clinical practice, doctors often combine these scans because each of them sees different physical or biological properties, and together they give a more complete picture of what is going on in the body. MRI is usually preferred when we care about detailed soft tissue structures like the brain or muscles, CT is fast and very good for emergencies and bone detail, and PET adds information about how active a tumor or a brain region is on a metabolic level.[1][2]

However, acquiring all of these images can be costly, time consuming, and potentially harmful if the patient is continuously exposed to excessive radiation, especially from repeated CT and PET scans.[1][3] Each extra scan means more time in the hospital, more appointments to schedule, and more stress for the patient. This becomes a serious issue for people who need many follow up scans, like cancer patients who are imaged again and again to track how treatment is working.[3] Additionally, in our new age of AI, data has become increasingly important, with some machine learning models requiring thousands to trillions of data points to reach high performance.[4] It is incredibly impractical, expensive, and potentially dangerous to acquire this much data by scanning real patients over and over, since there are both physical risks from radiation and ethical limits on what we can ask patients to do purely for data collection.[3][4]

Because of this, researchers sought out other methods to generate this data.[4][5] One popular method has been using AI models to generate images of one modality from another, so that a single real scan can be turned into multiple synthetic scans in different modalities.[5] This idea is often called image translation or cross modality synthesis, and it aims to keep the underlying anatomy the same while changing the appearance to match a target modality such as CT or PET.[5] As shown in an image below, researchers were able to convert a MRI scan to synthetic PET scans.[6] Results like this suggest that if these models are accurate and structurally reliable, we could reduce the number of physical scans needed, cut down on radiation exposure, and still provide rich multimodal information for both clinical decision making and AI model training.[4][5][6]

Traditional Methods

During the initial development of this method, researchers used convolutional neural networks to predict the synthetic images.[7] One study worked to convert MRI images into synthetic CT images using paired MRI and CT scans from patients, where each MRI had a matching CT that the model tried to learn.[7] This worked by passing the MRI images into deep convolution neural networks and then predicting the pixels of how it thinks its matching CT scan would look. The model then compared the fake CT scan to the real scan and updated the model to make the generated images more accurate. While this model was surprisingly decent, it still had a lot of flaws. It tended to average over uncertainty, which is what happens when a regression style model is pushed to produce one single prediction in places where many outputs are possible.[7][8] This caused it to produce overly smooth outputs that blur fine structures and lesions that are essential in identifying different problems from medical images. Additionally, it also struggled with different scanners and anatomies, so performance dropped when using images from different sites or with different protocols, which limited how useful these early CNN based methods were outside of very controlled datasets.[8]

The second attempt was with different types of Generative Adversarial Networks, or GANs.[7][9] This model worked by having two different convolutional neural networks. One was the generator. This worked to generate synthetic images based on the given medical images. The second model is the discriminator. This model worked to discern between the real image the generator was working on generating and the synthetic image it actually generated. If the discriminator is able to correctly identify the real and fake image, then the generator updates to help produce better images, and if the discriminator cannot correctly identify the real and fake image, then the discriminator updates to be able to identify the real and fake images better.[9] One study worked to convert MRI scans to CT scans using this framework, so the generator took MRI as input and tried to output a realistic CT, while the discriminator judged whether that CT looked real when paired with the input MRI.[7][9][10] The models produced sharper and better images than just plain convolutional neural networks, but they required very aligned medical image pairs for training, since even small misalignments between MRI and CT could confuse the discriminator and hurt performance.[7][9][10]

While results with regular CNNs and different types of GANs were promising, they still had very clear flaws. The models often hallucinated and were missing structure, meaning that they could invent or remove small details like lesions or vessel branches if those changes made the output look more realistic overall.[4] Additionally, generated images were often blurry at times and models had poor generalization to real world applications, especially when applied to data from new scanners, new hospitals, or slightly different imaging protocols.[8][4] This makes the models good for research, data augmentation, and other AI applications, where having more or sharper images can still be helpful, but they fall short for real world clinical applications where every structure and boundary needs to be trustworthy. These limitations in smoothness, hallucinations, and generalization are a big reason why newer work has started looking at more structurally aware methods and diffusion based models that try to keep the anatomy intact while still generating high quality synthetic images.[8][4]

Modern Methods

Diffusion model in image translation

Due to limitations of GAN-based models, such as training instability and mode collapse, diffusion models have been proposed as a better alternative for medical image translation.[11][12][13] Diffusion models were first introduced in Denoising Diffusion Probabilistic Models (DDPM), inspired by nonequilibrium thermodynamic considerations.[11] In this framework, data generation is modeled as a parameterized Markov chain that gradually removes noise from an initially noisy signal.

The training process consists of two parts: a forward (diffusion) process and a reverse (denoising) process. During the forward process, Gaussian noise N(0,I) is added to the image over a predefined number of timesteps T, eventually transforming it into a fully noisy image.

In the reverse process, a U-Net model is typically learned to approximate the noise that should be removed to recover the previous state. A well-trained model can then be used to repeatedly remove noise for T steps to restore the original clean image that appears to be sampled from the same data distribution.

Unpaired image translation

Conditional diffusion models extend the ability of diffusion models to generate images in a more constrained feature space by incorporating conditions such as text and images. Learning from unpaired data is important because paired images across different domains are very scarce in the medical imaging field. Most recent work in unpaired medical image translation has focused on adding spatial control and anatomical constraints to the denoising process.[13][12] A typical learning algorithm that allows image translation using an unpaired image dataset is as follows. We use ContourDiff’s learning algorithm for illustration.[13]

A general DDPM learning algorithm uses a U-Net to approximate the noise that should be removed in order to generate an image from the target image distribution. For a conditional DDPM, at every denoising timestep you add a conditioning term paired with the noisy input image—for example, a contour mask generated with a Canny filter. This process is repeated across different images and timesteps until the model converges and learns the model parameters. During inference, if the output domain or image distribution of the image generator is MRI scans, we can use information from another image modality such as CT scans; in ContourDiff, Canny contours of CT scans are used to translate a CT image into an MRI image. This is unpaired image translation because it does not require any cross-domain image pairing.

Implications

With these major advancements in image translation technology, relatively accurate translations of MRI to CT and PET scans are now available.[14][15] These synthetic images, specifically the synthetic CT scan, are now accurate enough to help with dose calculation for radiotherapy in certain treatment sites, where the dose calculated on synthetic CT is very close to the dose calculated on regular planning CT.[14][16][17] In some recent work, synthetic CT has already been integrated into MRI only planning workflows, which means that the synthetic image is not just a visual extra but actually feeds into real treatment planning and quality assurance.[16][17]

Additionally these synthetic images can be used for downstream AI models.[15][5] These models are for medical imaging segmentation, detection, and classification, and they require an abundance of data to work well in practice.[5][18] Synthetic images give researchers a way to expand and balance their datasets, for example by creating more examples of rare lesions or by simulating different scanners and acquisition protocols, and several studies have shown that adding synthetic data can improve performance on real test sets.[18][19] As the technology continues to advance, we will see more personalized, faster, and efficient imaging, since a single real scan could be turned into multiple useful modalities for both clinicians and algorithms.[14][15][5] Additionally, we will see even better downstream models as we will have exponentially more data, not only from new patients but also from high quality synthetic images that fill in missing modalities and help models learn richer multimodal representations.[15][5][19]

Limitations

Medical image translation still faces several limitations. First, the synthetic image cannot capture all of the information an actual scanned image will provide.[20] For example, synthetic CT images translated from MRI images may show accurate bone outlines and body contour, but they cannot show subtle fractures and calcifications that only a CT scanner can pick up, because those features are not identifiable in an MRI image to begin with. However, there are many other uses for synthetic images aside from diagnosing.

Another limitation is having different images across different hospitals.[12] Hospitals use different scanners, imaging protocols, and reconstruction parameters, which means that images can vary dramatically from one hospital to another. A model trained on one dataset might not translate well scans collected elsewhere. This lack of consistency makes it hard to deploy translation models on large scales, since each clinical environment may require its own fine tuning and validation.

A related challenge is the scarcity of ground truth.[21] Without access to many accurate paired images, it becomes difficult to evaluate how well a translation model is performing. Researchers often rely on indirect metrics like FID, KID, or segmentation performance, but none of these fully capture whether a synthetic scan is clinically reliable.[21] This gap between algorithmic success and clinical validation continues to be a major barrier.

High computational cost is another limitation.[11] Diffusion models require many iterative steps to generate a single image, and this can be time consuming and resource intensive. While GPUs and optimization strategies can speed up the process, diffusion is still slower than GAN based methods, which may matter in time sensitive clinical environments.

Finally, there are broader ethical concerns around the use of synthetic medical data.[20] If a synthetic CT scan is used for radiation therapy planning and the model introduces a subtle boundary error, determining accountability becomes very difficult. Researchers must be careful not to introduce biases or unrealistic patterns that could affect downstream tasks. It is also important that clinicians clearly understand the limitations of synthetic images so that they do not rely on them as if they were true ground truth scans.[20]

While modern techniques have reduced many of the common issues seen in earlier models, image translation is still an evolving field that requires careful validation, responsible deployment, and continued research.

Applications

Medical image translation has a wide range of applications across diagnostics, treatment planning, research, and the development of new AI tools. One example is radiation therapy planning, where clinicians need both MRI and CT information; MRI provides the soft tissue contrast to identify tumors and critical structures, while CT gives electron density information required for dose calculations. If a CT image can be generated from MRI, the planning workflow becomes much easier.[22] This reduces the number of appointments a patient needs and helps clinicians prepare treatment plans more quickly.

Another application is in diagnostic imaging. For example, PET scans can identify cancer activity or neurological conditions, but they involve radiation exposure and the injection of radioactive tracers.[22] If high quality synthetic PET can be created from MRI, clinicians may be able to screen or analyze certain conditions without exposing the patient to any radiation. While synthetic PET is not ready to replace real PET scans, it can provide helpful supplemental information in cases where PET is unavailable or not advisable.

Image translation also helps build datasets for machine learning. Many medical imaging projects suffer from missing modalities. A dataset might have thousands of MRI scans but only a handful of CT or PET scans for the same patient.[23] Translation models can fill in these gaps, allowing researchers to train multimodal models that would otherwise be impossible to build. This improves segmentation performance, disease classification models, and any tasks that benefit from looking at complementary imaging information.

Another application is cross modality fusion, where synthetic modalities are combined with real ones to improve decision making. For example, a radiologist could view synthetic CT alongside real MRI to get both density and soft tissue detail.[22] This fusion can help with identifying fractures, tumors, or structural abnormalities that are more visible in one modality.

Finally, image translation can support research, training, and education. Synthetic multimodal datasets can be used to train medical students, radiologists, or machine learning models by exposing them to a wider variety of imaging appearances, including rare or hard to collect cases. Researchers can investigate how MRI intensity patterns relate to CT density in bone structures, or how certain lesions might appear across different modalities.[23] This helps accelerate the development of new imaging biomarkers.

Overall, the applications of medical image translation extend across nearly every domain where imaging information matters. The ability to generate missing modalities, enhance datasets, and support clinical decision making makes this technology a promising addition to both clinical and research workflows.

References

  1. ^ a b c d e Bushberg, J. T., Seibert, J. A., Leidholdt, E. M., & Boone, J. M. The Essential Physics of Medical Imaging. Lippincott Williams & Wilkins, 2012.
  2. ^ a b Cherry, S. R., Sorenson, J. A., & Phelps, M. E. Physics in Nuclear Medicine. Elsevier Health Sciences, 2012.
  3. ^ a b c Smith-Bindman, R. et al. (2009). Radiation dose associated with common CT examinations. Archives of Internal Medicine.
  4. ^ a b c d e f g Zhou, S. K. et al. (2021). A review of deep learning in medical imaging. Medical Image Analysis.
  5. ^ a b c d e f g h Koetzier, L. R. et al. (2024). Generating synthetic data for medical imaging. Radiology.
  6. ^ a b Theodorou, B. et al. (2022). MRI2PET: PET synthesis from MRI. MICCAI Workshop.
  7. ^ a b c d e f Nie, D. et al. (2017). Medical image synthesis with context-aware GANs. MICCAI.
  8. ^ a b c d Wolterink, J. M. et al. (2017). Deep MR to CT synthesis. MICCAI.
  9. ^ a b c d Isola, P. et al. (2017). Image-to-image translation with conditional GANs. CVPR.
  10. ^ a b Cohen, J. P. et al. (2018). Distribution matching losses hallucinate features. MICCAI.
  11. ^ a b c Ho, J. et al. (2020). Denoising Diffusion Probabilistic Models. arXiv.
  12. ^ a b c Guan, H. & Liu, M. (2021). Domain adaptation for medical imaging. arXiv.
  13. ^ a b c Chen, Y. et al. (2025). ContourDiff: Unpaired translation. ML4H.
  14. ^ a b c Kazemifar, S. et al. (2020). Synthetic CT for MRI-only radiotherapy. Medical Physics.
  15. ^ a b c d Dayarathna, S. et al. (2024). Deep learning-based multimodal synthesis. Medical Image Analysis.
  16. ^ a b Kim, H. et al. (2024). Synthetic CT for radiotherapy. Scientific Reports.
  17. ^ a b Lui, J. C. F. et al. (2025). Synthetic CT dosimetric accuracy. Medical Physics.
  18. ^ a b Pezoulas, V. C. et al. (2024). Synthetic data review. Informatics in Medicine Unlocked.
  19. ^ a b Ibrahim, M. et al. (2024). Generative AI for medical data. Systematic review.
  20. ^ a b c Jin, W. et al. (2025). Ethical Medical Image Synthesis. arXiv.
  21. ^ a b Oktay, O. et al. (2021). Synthetic CT for MRI-only planning. Medical Physics.
  22. ^ a b c Mehranian, A. & Zaidi, H. (2019). Synthetic PET/CT. Eur. J. Nucl. Med. Mol. Imaging.
  23. ^ a b Chartsias, H. et al. (2020). Factorised representation learning. Medical Image Analysis.

Content Disclaimer

Informasi ini disarikan dari Wikipedia dan disajikan kembali untuk tujuan edukasi. Konten tersedia di bawah lisensi CC BY-SA 3.0. Kami tidak bertanggung jawab atas ketidakakuratan data yang bersumber dari kontribusi publik tersebut.

  1. The information displayed on this website is sourced in part or in whole from Wikipedia and has been adapted for the purpose of restating it. We strive to provide accurate and relevant information, however:
  2. There is no guarantee of absolute accuracy. Wikipedia is an open, collaborative project that can be edited by anyone, so information is subject to change.
  3. It is not intended to constitute professional advice. The content displayed is for informational and educational purposes only. For important decisions (e.g., medical, legal, or financial), please consult a professional.
  4. Content copyright. Wikipedia is licensed under the Creative Commons Attribution-ShareAlike License (CC BY-SA). This means that content may be reused with appropriate attribution and shared under a similar license.
  5. Responsible use. Any risk arising from the use of information from this website is entirely the responsibility of the user.