PU-NET DEEP LEARNING ARCHITECTURE FOR GLIOMAS BRAIN TUMOR SEGMENTATION IN MAGNETIC RESONANCE IMAGES

Automatic medical image segmentation is one of the main tasks for many organs and pathology structures delineation. It is also a crucial technique in the posterior clinical examination of brain tumors, like applying radiotherapy or tumor restrictions. Various image segmentation techniques have been proposed and applied to different image types. Recently, it has been shown that the deep learning approach accurately segments images, and its implementation is usually straightforward. In this paper, we proposed a novel approach, called PU-NET, for automatic brain tumor segmentation in multi-modal magnetic resonance images (MRI). We introduced an input processing block to a customized fully convolutional network derived from the U-Net network to handle the multi-modal inputs. We performed experiments over the Brain Tumor Segmentation(BRATS) dataset collected in 2018 and achieved Dice scores of 90 . 5% , 82 . 7%, and 80 . 3% for the whole tumor, tumor core, and enhancing tumor classes, respectively. This study provides promising results compared to the deep learning methods used in this context.


INTRODUCTION
Gliomas are among the most life-threatening forms of cancer affecting the human brain or spinal cord.They are a common type of primary brain tumors, mainly growing in the glial cells, which adjoin neurons and provide nourishment and protection, unlike metastasis cancers, which grow anywhere in the body and spread to the brain.
Gliomas are typically classified into two categories: low-grade gliomas and high-grade gliomas.Low-grade gliomas are benign tumors characterized by a slow growth rate and regular morphological shapes in the outward appearance.This category can be treated surgically in the case of early diagnosis, and the survival rate may reach up to 10 years.However, it can be recurrent and grow again if regular examinations and safety precautions are not applied.They may spread and become life-threatening.Whereas highgrade gliomas are more aggressive and still considered untreatable because of their fast-growing, complete restriction is impossible because of their irregular contours in the morphology shape, which means tumor cells quickly affect random nearby healthy cells.Some treatment options are often available after surgeries, such as radiotherapy, chemotherapy, or combined approaches.These tumors have a high recurrent probability and a poor survival rate of 2 to 5 years (Bush and Chang, 2016;Menze et al., 2015;Moini and Piran, 2020;Recht and Bernstein, 1995).
Early detection and accurate delineations are crucial for any cancer type, mainly brain tumors, to apply treatments correctly, reduce mortality rates, and ensure good patient quality of life.For this, different imaging modalities are used.These include Magnetic Resonance Imaging (MRI) or Computed tomography (CT) to detect these abnormal masses.MRI is currently the best non-invasive diagnostic tool for gliomas due to its potential in soft tissue imaging based on hydrogen atoms magnetization and its diversity in imaging modalities, which exhibit complementary information about tumor segments.It also provides safe, three-dimensional(3D), high-resolution, detailed anatomical structures without any radiation-related concerns (Brandal G, 2018;Sharma et al., 2010).
In the past, specialists and radiologists performed glioma segmentation manually, which is tedious, timeconsuming and greatly depends on human experiences and skills.These are reasons that motivate automatic segmentation.However, automatic segmentation is challenging.Because the anatomy of patients have many variations, the noise due to artefacts that may occur during the acquisition phase may affect image resolution and the low contrast between tissues, which leads to the absence of well-precise bounds between healthy brain tissues and abnormalities (Is ¸in et al., 2016;Withey and Koles, 2007).
Over the years, several automatic medical image segmentation techniques have been proposed and used.However, they are not accurate enough compared to manual segmentation.Therefore, developing more robust approaches to segment images is still a hot topic.Recently, deep learning with convolutional neural networks gained significant popularity as a segmentation technique for general image segmentation applications (Minaee et al., 2021) and is highly used for brain glioma segmentation.The fully convolutional networks enable the localization of pathological structures with high accuracy.They outperform the traditional segmentation techniques.
In the last few years, brain tumor segmentation has received great interest (Ghaffari et al., 2020), and various challenges have been documented under the BRATS title acronym for Brain Tumor Segmentation.The majority of contributions involved the utilization of deep learning techniques and achieved the top ranks.For example, in BRATS 2017 release, (Kamnitsas et al., 2018) proposed an approach that explores multiple deep network architectures trained separately and Ensemble prediction maps to construct the final segmentation results; this achieved the first place on the challenge.Cascaded neural networks with anisotropic and dilated convolutions in the second place introduced by (Wang et al., 2017), where the cascaded design is used to segment the whole tumor bounding box first, which was used to feed the next model to segment the tumor core, and finally, Enhancing the tumor is segmented with the same process.
In BRATS 2018, (Myronenko, 2019) was the first winner with its proposed 3D encoder-decoder architecture, accompanied by a variational autoencoder part starting from the encoder endpoint.It aims to reconstruct the original input image to regularize the shared encoder.The second winner was (Isensee et al., 2019), who proposed using 3D baseline U-Net with minor modification and focused on region-based prediction and a co-training process that uses additional data.(McKinley et al., 2019) was the third winner with its preposition, which is a shallow network similar to the famous U-Net architecture that uses a densely connected block of dilated convolutions and introduces a new loss function based on binary cross-entropy loss function to calculate the label's uncertainty.(Wang et al., 2019) propounded a work to segment brain glioma structures.It aims to use test time augmentation by augmenting images with 3D rotation, flipping, scaling, and adding noise to training and test images using different underpinning 3D network structures to perform multiclass segmentation and cascaded networks.(A.Albiol et al., 2019)preferred to get inspired by the wellknown two-dimensional (2D) deep learning models like VGG, inception2, inception3, and dense-like models and extend them to 3D versions.All these models were used to segment brain gliomas separately in addition to a final ensemble result.(Rui Hua et al., 2019) developed a cascaded 3D V-Nets framework to handle the problem.The segmentation was performed in a cascaded way where the whole tumor was segmented first by three ensembled V-Nets.The detected tumor is segmented to the other tumor parts necrosis, edema, and enhancing tumor using two ensembled V-Nets.(Kermi et al., 2019) proposed four channels of 2D U-Net architecture to segment gliomas and used residual blocks instead of plain blocks in the original U-Net.To address the class imbalance problem, Weighted Cross-Entropy (WCE) and generalized Dice loss were used.(Marcinkiewicz et al., 2019) suggested segmenting brain gliomas in 2D two cascaded stages based on a convolutional neural network inspired by U-Net, the first stage applied to detect regions of interest and the second stage used to perform multi-class classification.In the next BRATS challenges 2019, the top-ranked approaches (Jiang et al., 2020;McKinley et al., 2020;Zhao et al., 2020) were also based on deep learning architectures.Many additional proposed methods exist in (Crimi et al., 2018;2019)of different proposed 2D and 3D-based networks.Outside the BRATS challenge, a multitude of research studies has explored deep learning-based approaches (Akbar et al., 2022;Noori et al., 2019;Zhang et al., 2020;Yogananda et al., 2020), which have exhibited favourable achievements too.These works often employ a 2D and 3D U-Net architecture as a foundation and incorporate customized enhancements.
The significant limitations of these existing works are the increased computational complexity, high memory requirements, and long-running time due to using the 3D format and complex operations in deep learning architectures like dilated convolutions and residual blocks, which entail processing many parameters.
This paper presents a study of applying a simple and efficient deep learning approach to multi-modal MRI images, called PU-NET, to segment images of brain tumors.It is based on a two-dimensional (2D) convolutional neural network using an updated U-Net version and multi-view analysis and fusion.The results are promising compared with the baseline and existing technique.The rest of the paper is organized as follows.Section 2 describes the proposed PU-NET method.Experimental results, along with an evaluation study, are reported in Section 3 and Section 4, respectively.Finally, we conclude and provide future directions in Section 6.

METHODOLOGY
In this work, we propose a deep-learning architecture for brain tumor segmentation based on the encoder-decoder aspect and inspired by the reputed network used to segment biomedical images called U-Net (Ronneberger et al., 2015)with significant changes.This architecture is called PU-NET, which refers to its two main parts: an input processing part and a customized U-Net network part.They are based on a central block known as a plain block.It encompasses two consecutive convolution layers of (3 × 3) kernel size with stride(1,1), each followed by a Batch normalization layer added to ensure the network data normalization and improve the training convergence and speed, followed by a LeakyRelu layer as an activation function with a slope α set to 0.01 Eq.1.
In contrast with the original proposition, all used convolutions are padded to preserve and mitigate the loss of information at the image borders.This loss is significant as the pooling operation applied later will induce information loss.
We also proposed using the LeakyRelu activation instead of the Relu function in the original proposition to prevent the dying Relu (Lu et al., 2019;Mastromichalakis, 2020) problem that may happen where most of the Relu Eq. 2 neurons only output zero because of negative inputs during the learning process, which may cause the inactivity of a significant part of the network neurons and negatively affect the results.
The input processing part handles the multi-modal MRI scans quickly and efficiently.It consists of four input layers, each for a specific modality 2D slice (T1ce, T1, T2, Flair), each followed by a plain block to extract relevant features from each modality separately as a first step.Then, a concatenation layer is used to merge the four outputs and transfer the output to the U-Net part as the primary input.
The U-Net network part keeps the same divisions as the original proposal (Ronneberger et al., 2015).
It comprises three parts: An encoder path(contracting path), a decoder path(an expansive path), and a simple bridge between them.The encoder path is similar to the convolution neural network (CNN) feed-forward pass.It contains three levels against four in the original U-Net to alleviate the network's training parameters and reduce memory consumption, complexity, and running time.Each level includes a plain convolution block, except for the first one, where the concatenation output replaces it, followed by a max-pooling layer with stride two and a dropout layer that regularizes the network to avoid over-fitting.This path is used for extracting relevant features and capturing the context of the input images to enable the segmentation task.
The expansive path is symmetric to the contracting path in the number of levels, each comprising a stack of layers.It starts with a padded convolution transpose of (3 × 3) kernel size with stride(2,2) used to recover some loosed information in previous convolution layers from the feature map.It is also followed by a concatenation layer that merges the output of the convolution plain block in the contracting side with the corresponding output of the convolution transpose to get more information about the spatial resolution after a convolution plain block is applied, followed by a dropout layer.The two paths are related by a bridge, which is a plain block.
Finally, a (1 × 1) convolution with sigmoid activation is applied to generate the output probabilities for the segmentation map. Figure 1 exposes the detailed PU-NET architecture layers.

DATASET PRE-PROCESSING
In this study, we propose to use the brain tumor segmentation (BRATS) dataset 2018 version (Bakas et al., 2017;2018;Menze et al., 2015), which includes two main files named Training and Validation data.
The training data contains MRI images of patients diagnosed with high-grade and low-grade gliomas (Glioblastoma).Each patient file encompasses four coregistered MRI modalities: native pre-contrast (T1), post-contrast T1-weighted (T1ce), T2-weighted (T2), and T2 Fluid Attenuated Inversion Recovery (FLAIR) in addition to the manual segmentation file where all pixels were segmented into four classes with four rates (0, 1, 2, 4) summarized in Table 1.The data was acquired with different clinical protocols and scanners at 19 institutions.All BRATS multi-modal scans are available as Nifty files (.nii.gz), and they are  Normalizing the data is an essential step in all machine learning tasks to avoid dominating some features by others and gain the correct information from all relevant features.In our case, we propose to use z-score normalization (Pal and Sudeep, 2016;Reinhold et al., 2019) Where v is the voxel value and µ, σ are the voxels' mean and standard deviation, respectively.

IMPLEMENTATION DETAILS
Our deep learning proposed approach was implemented using python3 over the Tensorflow Keras library and executed on the UB2-HPC (University of BATNA 2) GPU node, which contains 4 GPUs configured with CUDA 10.0 and CUDNN 5.6.7.In our case, the official BRATS training data (MICCAI BRATS 2018 Data Training) was randomly split into Train set 80% and Valid set 20% with 42 as a random state to build the model, which is later tested over the official validation set (MICCAI BRATS 2018 Validation Data), provided by the organization of 66 patients without Ground truth, which is used in our case as a test set.To obtain results that are more significant in a very efficient way, we proposed resolving the problem as a binary segmentation problem rather than multiclass segmentation, where we turned the original label classes to the proposed classes in the challenge, we trained the model for each class separately, and the results were fused.The new classes are exhibited in Table 3. Where: Initial filters N°represents the number of convolution kernels, which doubles for each encoder plain block and halved for each decoder plain block and transposed layer.Batch size is the number of training samples used in one iteration.Epochs are the number of times the algorithm trains over all samples.The dropout rate is the proportion of randomly selected nodes set to zero with early stopping used to stop the learning process if the validation dice score didn't improve after ten patience epochs and the Model checkpoint active to save the best model with the best validation dice score, all used to prevent over-fitting.Finally, the Adam optimizer with default parameters was used to fine-tune the network weights.

EVALUATION METRICS
All evaluation metrics used are those proposed by the challenge organization (Menze et al., 2015).The evaluation metrics used are Dice Score, Sensitivity, Specificity, and Hausdorff 95 distance; those metrics are highly recommended for medical image segmentation (Taha and Hanbury, 2015;Huttenlocher et al., 1993).Most of the time, the healthy tissue pixels are larger than the tumor's ones, which means that the data suffer from the class imbalance distribution problem, which prevents the right learning and leads the model to learn only about the frequent class label where the image segmentation problem focuses on the infrequent class (Small and Ventura, 2017).All these metrics are well-adapted to address the class imbalance problem in such cases.They only consider the segmentation class and not the background class (Jadon, 2020),

EXPERIMENTAL RESULTS
Table 5 exhibits the results of our PU-NET approach on the training, validation, and test data respectively (Table 2) with the proposed evaluation metrics in section 3.3.The results are also reported over the MRI views separately in addition to the validation and Test ensemble (Ens) results realized by multi-view label fusion using majority voting, proving the efficiency of the multi-view exploration and fusion in improving the result.The ensembled 5fold validation result is also reported.it is important to note that all Test results were collected from the online submission system CBICA server https://ipp.cbica.upenn.edu/which is an Image Processing Portal available for authorized users to access the Center for Biomedical Image Computing and Analytics computing cluster and imaging analytics.
The following Fig. 2 exhibits some examples from the test set Table 2 segmented by our PU-NET.Colours in the figure refer to labels in table 1: red: label 2, Green: label 1, blue: label 4.

PU-NET AGAINST BASELINE TECHNIQUES
In this section, our primary goal is to evaluate the effectiveness of our 2D PU-NET model by specifically examining the impact of the input processing block.We compare baseline techniques that propose different approaches for processing multi-modal MRI data.
The first baseline technique involves stacking each corresponding slice from each modality (T1ce, T1, T2, and Flair) as RGBA color images.These stacked images are provided directly to the U-Net part of our proposed model as input without utilizing the input processing block.
The second baseline technique involves providing the multi-modal MRI data in the 3D format to the PU-NET architecture.This means the data is preserved in its volumetric structure instead of processed as 2D slices.Additionally, the architecture operations are modified to accommodate the 3D format.The results obtained in Table 6 provide compelling evidence that our PU-NET model performs exceptionally well when compared to stacking slices as RGBA color images.These results further validate the efficiency and effectiveness of our input processing block.
However, it is worth noting that the 3D version of our model exhibits some drawbacks.Specifically, it requires a significantly longer training time, exceeding 20 hours under GPU for each class label.In contrast, our 2D PU-NET model takes a maximum of 1 hour for execution.Furthermore, the results obtained from the 3D version diverge significantly from those of our 2D PU-NET model, particularly in the TC and ET classes.

PU-NET AGAINST EXISTING METHODS
In this section, table 7 summarizes the results of the state-of-the-art techniques that refer to the currently existing approach or method that has achieved high performance in the field to compare them against our PU-NET results to analyze the strengths and weaknesses of both methods and determine if our approach outperforms state-of-the-art or provides any notable improvements.

DISCUSSION
From Table 7, our proposed method PU-NET outperforms the 2D approaches in terms of both Dice and Hausdorff95 metrics.While it is not away from the top-ranked approaches in the BRATS 2018 challenge and this is due to the fine parameters tuning; it is important to consider, too, that (Myronenko, 2019), (Isensee et al., 2019), and(McKinley et al., 2019) utilized data augmentation techniques and additional data in the co-training process.In contrast, our work did not incorporate any additional or augmented data.Furthermore, when computing the statistical p-values between the dice scores, we found values of 0.61,0.72 and 0.66, respectively, suggesting that the results are not highly significant compared to our findings.
Regarding (Wang et al., 2019), our 2D approach outperformed all of their proposed 3D architectures with data augmentation, except for  et al., 2019).However, it is important to note that the p-values of 0.87 and 0.90 suggest these differences are not statistically significant.Moreover, our PU-NET approach demonstrated greater efficiency than the extended architectures proposed by (A.Albiol et al., 2019), which are among the popular deep learning architectures used in the context of image segmentation, both (Akbar et al., 2022;Yogananda et al., 2020) works which are out of the BRATS 2018 challenge proposed 3D-based architectures with complex U-Net designs.Still, our results surpass them and support our the idea that sometimes complex architectures and using 3D format can increase the computational complexity without necessarily being efficient, so it is justifiable to shift the research focus towards 2D simple architectures.
In our perspective, the strength of our PU-NET model lies in incorporating the input processing block as an initial separate feature extractor for each MRI modality.It serves as a robust mechanism for integrating the multi-modal complementary information and improving the tumor segmentation accuracy.In addition, the simplicity of our U-Net architecture part and the use of 2D slices address major limitations in deep learning-based approaches, specifically regarding time and memory requirements, which are advantageous when working with largesize medical datasets.As with any approach, there are certain limitations associated with our proposed method; the input processing block in our approach is specifically designed to handle datasets with multimodal images.While this benefits such datasets, it may not be suitable or optimal for datasets with single-modality images.Similar to other deep learning architectures, the interpretability of our approach is limited.Deep learning models, including our PU-NET, often function as complex black boxes, challenging understanding of the underlying decision-making process.Interpretability is an ongoing area of research in deep learning, and further efforts are needed to enhance the transparency and explainability of models like ours.Indeed, while our approach may have limitations, it is important to recognize that its performance remains competitive.The results obtained with our 2D PU-NET architecture demonstrate promise and suggest potential avenues for further improvement in future research.

CONCLUSION
This study investigated a novel approach inspired by the reputed U-Net architecture to tackle the brain tumor segmentation problem from multi-modal MRI images called PU-NET.We introduced an input processing block to an updated U-Net that deals with multi-input 2D images collected from the different modalities.We suggested exploring this task in the three MRI views coronal, sagittal, and axial and aggregating the final predictions to generate the final segmentation and benefit from the 3D contextual information.The results were later compared against the RGBA and the 3D versions and verified against the existing works where our method achieved interesting results, and using data augmentation and additional training data seemed to be the next verified steps with our method for more results enhancement.

Fig. 1 :
Fig. 1: The PU-NET model architecture used to segment brain tumor structures to get a normal distribution of all voxels in each image modality by computing the mean and standard deviation only for the brain region (nonzero part).(Weninger et al., 2019) and update each voxel value by a new value computed by Equation 3.

Fig. 2 :
Fig. 2: Segmentation result of the brain tumor structures by the Proposed PU-NET on three different patients.

Table 1 :
Table 1 below summarizes the dataset information: The MICCAI BRATS 2018 Data Training information

Table 2 :
The PU-NET Model used datasets

Table 3 :
The segmentation Classes

Table 4 :
PU-NET network parameters

Table 5 :
PU-NET training, validation, and Test Results.

Table 6 :
PU-NET versus baseline technique

Table 7 :
PU-NET and state-of-the-art techniques comparison results