Medical Image Translation Using Convolutional Neural Networks

Deep convolutional neural network — DCNN - Deep Learning with Keras

Ashwin K. Avula (Student)1, Gian Marco. Conte M.D., Ph.D. (Mentor)2, Bradley J. Erickson M.D., Ph.D. (Mentor)2


Gliomas are a category of tumors that affect the brain and spinal regions. These tumors are known to grow alongside and infiltrate normal brain tissue, making surgical removal difficult and complicating treatment procedures.[1] In these cases, several Magnetic Resonance Imaging (MRI) scans are acquired to evaluate these tumors for diagnosis and plan treatment. For example, T1-weighted MRIs emphasize white matter while suppressing fluid; thus, yielding a more anatomical representation of the brain, while T2-weighted MRIs highlight fluid regions and softer tissue, such as tumor and inflamed tissue.[2] Both scans contain different but vital information and are necessary for glioma treatment. However, one drawback of acquiring several different scans from a patient is that it is very costly; moreover, the inefficiency could impose more significant health risks for patients.[3] In this study, Convolutional Neural Networks (CNNs) and machine learning techniques were implemented to create an accurate image translation algorithm. In doing this, just one MRI sequence (a set of scans) could be acquired and converted into all other necessary sequences; for example, T1-weighted MRIs could be translated to T2-weighted MRIs. Thus, hospitals can create a more streamlined and cost-effective treatment process. Using a CNN architecture adapted from [4], T2-weighted MRIs were translated from T1-weighted MRIs with an average absolute error of 0.05142 per translated image.

II. Introduction

i. Image-to-Image Translation

Image-to-Image translation is a field of computer vision and graphics artificial intelligence where the goal is to learn the mapping between an input and output image.[5] Image translation algorithms perform a series of computations on input images to create new output images. This concept can be observed below in Figure 1, where the image of two horses is generated from an image of two zebras via image translation. The computations that these algorithms perform to achieve accurate image translation is determined using machine learning techniques. While this seems like a reasonably rudimentary concept, image translation can be applied to many meaningful fields such as object transfiguration and photo enhancement.[5]

Figure 1. Example Image Translation Process [5]

ii. Understanding the Purpose

This study focuses on medical imaging and the implementation of image translation in the field of radiology; particularly on medical scans from glioma patients. Gliomas are tumors that occur in the brain and spine. Only 25% of patients diagnosed with severe cases of gliomas survive more than a year past diagnosis.[6] Nowadays, a comprehensive treatment plan for gliomas relies on the information obtained from several different MRI sequences. Figure 2 depicts various MRI sequences from one patient, showing how each sequence conveys different information.

Figure 2. Example Information Differences between MRI Sequences [7.8,9]

While each MRI sequence displays the same anatomical features, each MRI sequence reflects different tissue information. However, acquiring several different scans from a patient can cost thousands of dollars, and it can be challenging for patients to lie still for the long periods of time necessary for the scans to be taken. This paper proposes the idea of image translation, where just one MRI sequence is acquired and converted into all other necessary sequences. In doing this, hospitals can create a more streamlined and cost-effective treatment process.

iii. Data

In data science and machine learning, the actual data is often the most valuable aspect of the project. In this study, the well-known multi-sequence BraTS dataset will be used to train a machine learning image translation algorithm. [7,8,9] The BraTS dataset is an open-source collection of MRI scans from around 300 glioma patients, where each patient had five MRI sequences taken. Figure 3 exhibits some sample axial scans from the BraTS dataset. While there were several MRI sequences available from each patient, this study will focus on the T1- and T2-weighted scans, as these are two of the most commonly acquired scans. Depicted in Figure 4, the T1-image represents the white matter and anatomical features of the brain, and the T2-image represents more of the cerebrospinal fluid and grey matter regions in the brain.

Figure 3. Sample sequences from three patients (one per row) from the BraTS Dataset [7,8,9]

Figure 4. T1- and T2-weighted MRI Comparison [7,8,9]

iv. Machine Learning Implementation

In this study, convolutional neural networks (CNNs) were implemented to arrive at a functional and accurate image translation algorithm. Neural networks are layers of computing units modeled after neurons in the human brain. Much like the human brain, these networks are designed to recognise patterns from data. [10] In particular, CNNs are a class of neural networks that specialise in analysing visual imagery. Figure 5 below shows a rudimentary CNN and the concept of extracting purposeful features from image data.
Deep convolutional neural network — DCNN - Deep Learning with Keras

Figure 5. Example Convolutional Neural Network [11]

v. Supervised Learning

In their simplest form, CNNs used for image translation are categorised as supervised machine learning algorithms. As shown in Figure 6, supervised machine learning consists of three main phases.
Machine Learning Explained: Understanding Supervised, Unsupervised, an

Figure 6. Supervised Machine Learning Diagram [12]

  1. Data Acquisition

In phase one, data is acquired for training and validation purposes.
2. Training
In phase two, a model is created and trained to learn the mapping between the input and output variables. During this model training process, input data (training dataset) and its corresponding outputs (desired output) are fed to the model. The model acts on this input data and generates a prediction. The prediction is the effect from the action, and in this implementation, is the synthesized or translated image. Then, loss metrics calculate model inaccuracy by comparing the model prediction to the desired output. These error values are returned to the model as feedback to fine-tune the model’s weights for the next cycle of training. This described process is one training loop, also known as an epoch; in practice, many of these training loops are conducted to minimise model error.
At its core, machine learning and most human-based skills follow the cycle exhibited below.

Figure 7. Intuitive Model Training Process[13]

For example, if one were to touch a hot tea kettle, their hand would get burnt, and in the process, feedback in the form of firing neurons would alert the brain never to do it again. Intuitively, when a machine learning model creates a significant mistake when predicting, the loss metrics would respond by returning a punishing high loss in hopes that the model’s weights will be adjusted, and the model will ‘learn’ not to make the same mistakes.[14]
Finally, in phase three, input images are passed through the optimized CNN, and the predictions are again compared to the desired output to determine the model’s final accuracy statistics. Images used in phase three for validation are separate from the images used during training. This allows for unbiased results from the model when presented with new data.

III. Dependencies

In the past, machine learning algorithms would require large amounts of storage and state-of-the-art hardware capable of completing computationally intensive tasks. However, using open-source cloud computing environments such as Google Colaboratory, an everyday consumer laptop can train the most sophisticated deep neural networks.[15] Within Google Colaboratory, Python – a versatile programming language for data science and a plethora of other fields – is used to create and execute the machine learning algorithms.[16] Finally, TensorFlow and Keras – popular machine learning packages for Python – were implemented for their built-in functions that allow for quick prototype and experimentation when it comes to neural networks.[17,18] In machine learning, libraries and environments like Colaboratory allow researchers to progress from idea to result with the least amount of delay.

IV. Methods:

i. Loss Metrics

One of the biggest struggles in machine learning is determining loss metrics when training models. Loss metrics express how incorrect the model’s prediction is when training and choosing an inexact metric will lead to poor training. For image translation, luminance, contrast, and structure are three main factors that loss metrics must gauge. In a medical image, luminance relates to the brightness of pixels, contrast pertains to the detail, and structure refers to the intensity of different textures. Together, these features detail tissue and organ pixel values in a medical image.[19] In this study, Mean Absolute Error (MAE), Euclidean distance, and the Structural Similarity Index Measure (SSIM) were implemented as loss metrics during training. MAE and the Euclidean distance generally are viewed as metrics that gauge the differences in geometry and structure between images. In addition to this, the SSIM is viewed as a metric that gauges the differences in contrast, luminance, and structure between images.[20,21,22,23] Below, Figures 8 – 10 show equations for the three implemented loss metrics. By combining these metrics in proportion, certain metrics can be weighted, or emphasised, over others during training. Thus, one can conceptually determine how the model learns when training, to enhance performance.

ii. Training

As explained in the Machine Learning Implementation section, model training consists of training loops in which loss metrics fine-tune a model’s weights by calculating error from predictions.
For training, T1- and T2-weighted slices were passed to the training algorithm together as paired data. During training, the model performs the image translation algorithm on an input T1-weighted image and returns another image. The image returned is the model’s prediction; in this case, the predicted corresponding T2-weighted MRI image. After the model creates its prediction, the algorithm determines how accurate the prediction is by comparing it to the desired output, or the actual corresponding T2-image for the input T1-weighted image. From this, the algorithm fine-tunes the CNN’s weights according to how inaccurate the prediction was. Thus, the next time the model creates a prediction, it will hopefully be more accurate since the model has ‘learnt’ from its previous mistakes.
In this study, several models with distinct learning rates were trained and optimised to arrive at an optimal model for T1 to T2 mapping. Each model contained the same CNN architecture, and was trained for 500 epochs. For each epoch in the process, each model was given 1000 images to translate and calculate prediction error. Using Google’s Colaboratory, each model took around twelve days to complete training due to GPU limits.

Figure 14. Sample Output During Training

Periodically through the training process, the model returned performance updates such as Figure 14. Within this figure, one can quantitatively determine model performance using the loss metrics printed at the bottom of the figure. The T1-weighted image (input), the model’s translated T2-weighted image prediction (synthesised image/prediction), the true desired T2-weighted output image (ground truth), and the error map depicting the difference between the prediction and the ground truth are all given in the figure for qualitative analysis. Model performance is qualitatively and quantitatively observed throughout the training. Based on these measures, one can adjust the model’s parameters to optimise it.
Below, Figure 15 exhibits how the model ‘learnt’ over time. This figure demonstrates how the model’s predictions improved in quality and approached the desired output through iterative training.

Figure 15. Model Improvement through Training

V. Validation & Results

As stated before in the Machine Learning Implementation section, phase three of a supervised learning approach consists of passing the validation data through the fully trained model to obtain performance statistics. To replicate real-world circumstances, an entire 3D T1-weighted MRI volume was used as a validation test. The trained model then used each 2D image in the 3D volume and iteratively translated each T1-image into a T2-image. In doing this, the model’s final accuracy can be calculated. The results of this validation test are shown below in Table 1. Loss values for the entire 3D volume and the average loss per 2D image in the volume were calculated using the formulae described in Figures 8, 11, 12, and 13.

Total Volume Error 0.81000 4.39369
Avg. Error per 2D Image 0.00854 0.05142

Table 1. Validation Error

In addition to this, selected images from the validation test can be observed below. Figures 16 – 18 qualitatively demonstrate the model’s ability to translate the input T1-weighted image to resemble the ground truth with minimal error.

Figure 16. Selected Image from Validation Test

Figure 17. Selected Image from Validation Test

Figure 18. Selected Image from Validation Test


The model created in this study was able to synthesize T2-weighted images from T1-weighted images with an average MAE of 0.05142, an average MSE of 0.00854, a total MAE of 4.39365, and a total MSE of 0.05142. In comparison, more sophisticated dense convolutional neural networks tested in [24] yielded a more accurate model with an average MAE of 0.033 when reconstructing T2-images from T1-images. However, compared to certain models trained on the BraTS dataset in [25], the model created in this study yielded a lower total MAE. Overall, the model created in this study can outperform certain T1 to T2 translation models, but not all.
While this model is not accurate enough for clinical use, additional optimisation and training will yield a superior model. As noted in the qualitative training and testing figures, this model struggled with tumour contrast but accurately replicated the anatomy and ventricle structures in the brain. To combat this error, loss metric equations and weights can be adjusted to account for more contrast-specific training. In addition to this, learning rates and training length can be altered to optimise the model further.

VII. Conclusions

Currently, glioma treatment relies on the information obtained from several MRI sequences of a patient’s brain. While this provides the most comprehensive information for diagnosis and treatment, it is cumbersome and costly for patients. However, by implementing a supervised machine learning approach with convolutional neural networks, models can be created to translate MRI sequences to other sequences to shorten the diagnosis and treatment process for radiotherapy. In this study, a CNN was created and optimized to convert 2D T1-weighted to T2-weighted MRI images of the brain in glioma patients.
Image translation was successfully achieved in this study as T1-weighted MRIs were translated to T2-weighted MRIs with an average absolute error of 0.05142. By implementing distance-based loss metrics such as the Euclidean distance and mean absolute error, this model was able to reconstruct the brain’s anatomy accurately in the synthesised T2-images. However, it struggled in the translation of contrast. Much of the error depicted in Figures 16 – 18 was a result of the model’s inability to reconstruct the contrast and luminance features of a T2-weighted image.
In the future, learning rates, loss metric weights, and training lengths can be altered to optimise the model further. To further improve this study’s relevance, several different models would be trained to translate the various combinations of MRI sequences available in the BraTS dataset. Finally, a Graphical User Interface (GUI) will be created to combine these models in a toolkit-like application so users can translate images and datasets of specific MRI sequences into any desired sequence.

VIII. Acknowledgments

I would like to acknowledge Dr. Weston, Dr. Conte, and Dr. Philbrick of Dr. Erickson’s Radiology Informatics Lab at the Mayo Clinic for providing vital technology, resources, and suggestions throughout this research process. This work was made possible thanks to the Mayo Clinic High School Mentorship and Summer Externship Programs.

IX. References

  1. Pichardo, Gabriela. “Brain Cancer and Gliomas.” WebMD. WebMD, January 26, 2020.
  2. Dr. Graham Lloyd-Jones BA MBBS MRCP FRCR – Consultant Radiologist -. (n.d.). MRI interpretation T1 v T2 images. Retrieved August 15, 2020, from
  3. MRI. (n.d.). Retrieved August 14, 2020, from
  4. Long, J., Shelhamer, E., & Darrell, T. (2015, March 08). Fully Convolutional Networks for Semantic Segmentation. Retrieved August 14, 2020, from
  5. Hao, Yongfu. “Image-to-Image Translation.” Medium. Towards Data Science, March 19, 2019.
  6. Claus, Elizabeth B., et al. “Survival and Low-Grade Glioma: the Emergence of Genetic Information.” Neurosurgical Focus 38, no. 1 (2015).
  7. B. H. Menze., et al. \”The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS)\”, IEEE Transactions on Medical Imaging 34(10), 1993-2024 (2015) DOI: 10.1109/TMI.2014.2377694
  8. S. Bakas., et al. \”Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features\”, Nature Scientific Data, 4:170117 (2017) DOI: 10.1038/sdata.2017.117
  9. S. Bakas., et al. \”Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge\”, arXiv preprint arXiv:1811.02629 (2018)
  10. “A Beginner\’s Guide to Neural Networks and Deep Learning.” Pathmind.
  11. Nelson, Daniel. “What Are Convolutional Neural Networks?” Unite.AI, May 24, 2020.
  12. “Machine Learning Explained: Understanding Supervised, Unsupervised, and Reinforcement Learning.” Datafloq.
  13. Pradhan, S. (2018, April 10). Closing the feedback loop in your product. Retrieved August 18, 2020, from
  14. Parmar, R. (2018, September 02). Common Loss functions in machine learning. Retrieved August 18, 2020, from
  15. “Colaboratory.” Google. Google, n.d.
  16. “Welcome to”
  17. “Why TensorFlow.” TensorFlow.
  18. Team, Keras. “Simple. Flexible. Powerful.” Keras.
  19. Osadebey, M., et al. Standardized quality metric system for structural brain magnetic resonance images in multi-center neuroimaging study. BMC Med Imaging 18, 31 (2018).
  20. Stephanie. “Absolute Error & Mean Absolute Error (MAE).” Statistics How To, October 14, 2018.
  21. Bogomolny, Alexander. “The Distance Formula.” Interactive Mathematics Miscellany and Puzzles.
  22. “SSIM: Structural Similarity Index.” imatest.
  23. Sharma, N. (2019, January 15). Importance of Distance Metrics in Machine Learning Modelling. Retrieved August 13, 2020, from
  24. Xiang, L., et al. (2018). Ultra-Fast T2-Weighted MR Reconstruction Using Complementary T1-Weighted Information. Medical image computing and computer-assisted intervention : MICCAI … International Conference on Medical Image Computing and Computer-Assisted Intervention, 11070, 215–223.
  25. Yang, Q., Li, N., Zhao, Z., Fan, X., Chang, E. I., & Xu, Y. (2020). MRI Cross-Modality Image-to-Image Translation. Scientific Reports, 10(1). DOI:10.1038/s41598-020-60520-6

About the author

Ashwin Kumarasamy Avula is an 18-year-old rising freshman at the University of Wisconsin-Madison. He is interested in Artificial Intelligence and Computer Science, and hopes to complete a bachelor’s in computer engineering.

Leave a Comment

Your email address will not be published. Required fields are marked *