Multivariate mathematical morphology for DCE-MRI image analysis in angiogenesis studies

We propose a new computer aided detection framework for tumours acquired on DCE-MRI (Dynamic Contrast Enhanced Magnetic Resonance Imaging) series on small animals. In this approach we consider DCE-MRI series as multivariate images. A full multivariate segmentation method based on dimensionality reduction, noise filtering, supervised classification and stochastic watershed is explained and tested on several data sets. The two main key-points introduced in this paper are noise reduction preserving contours and spatio temporal segmentation by stochastic watershed. Noise reduction is performed in a special way that selects factorial axes of Factor Correspondence Analysis in order to preserves contours. Then a spatio-temporal approach based on stochastic watershed is used to segment tumours. The results obtained are in accordance with the diagnosis of the medical doctors.

INTRODUCTION DCE-MRI (Dynamic Contrast Enhanced MRI) time series is a medical imaging modality useful to characterise the process of tissue vascularisation. As tumours correspond to zones of angiogenesis, where the vascularisation is increased, DCE-MRI series are a convenient way of identifying or characterising potential tumours. Hence, DCE-MRI provides an additive and functional information to more current morphological images. Due to the increasing amounts of images, the creation of tools to assist medical doctors is of great interest for the analysis of these images and the detection of candidate tumour regions. In this paper, our goal is to supply an automatic tool to help users to localise tumours, based on this vascular information. An automatic process is required to limit operator variability during tumour delineation. In current morphological images such question is very challenging because, currently, gray level differences between tumour and adjacent tissues are not sufficient for a confident and stable separation between tissues. Conversely DCE imaging provides potential richer information to determine this difference. However this information is distributed among all the sequence of images, requiring mathematical development. Here we show the results on DCE-MRI images of recent developments in mathematical morphology segmentation for multivariate images.
Our developments are particularly useful for a visual evaluation of the tumour extension, but also as a pre-processing step before pharmacokinetic modeling, evaluated then on images with less noise. The corresponding evaluation of functional parameters such as blood flow, blood volume, permeably-surface of capillaries, etc. (Sourbron and Buckley, 2012) should be improved accordingly.
The considered images are DCE-MRI series of L = 512 channels of size N × N, N = 128, pixels acquired at a regular step of t = 1 second, in time, on mice presenting tumours (Balvay et al., 2005). In DCE-MRI imagery, tumours are regions corresponding to the accumulation of the contrast product. This accumulation is characterised by an increasing kinetics of the temporal signal for each pixel of the tumour. Our aim is to show the potential of a new method for segmentation purpose in medical imagery. The tests, made on 25 different series, were used to develop the new method based on hyperspectral mathematical morphology. The objective is to achieve computed aided diagnosis of tumours.
From an image processing point of view, DCE-MRI images are time series which fulfil a fundamental temporal coherence hypothesis: at each time step, the MRI image is acquired on the same object of interest and the images are registered, i.e., any pixel has the same spatial position for every time step (i.e., for every image channel). In our case, in each experiment the channels are registered.
Due to these assumptions, the sequences of DCE-MRI images may be interpreted as multivariate, i.e., hyperspectral images, of the time evolution of the scene observed. Some images of the sequence, interpreted as channels of a series, are shown in Fig. 1. The tumours (PC3 -human prostatic) are characterised by an hyper-vascularised ring and an hypoxic centre or even a necrotic centre which is due to the distance of the afferent vessels from the centre. This vessels are repulsed from the centre by tumoural proliferation. More details about acquisition conditions are given on the appendix section.
In order to get a better understanding of the images, several portions are labeled in Fig. 2 : i) a portion of the tumour is in green, ii) a portion of the heart cavities is in blue, iii) a portion of the background is in red and iv) a portion of the lungs is in white. Hyperspectral images are multivariate discrete functions with several tens or even hundreds of spectral bands. In a formal way, for each pixel of a 2D (or a 3D), hyperspectral image is viewed as a vector, with values associated with a given wavelength, time or any index j. Each wavelength, time or index has a corresponding image, named channel, in two (or three) dimensions. In the text below, we use the term of spectrum and spectral channel to describe temporal phenomena. The segmentation of hyperspectral images by hyperspectral mathematical morphology has been used for remote sensing images (Benediktsson et al., 2005;Noyel et al., 2011). However, we show here the generality of our approach since the methods are also useful for DCE-MRI series analysis.

STATE-OF-THE-ART ON DCE-MRI SERIES ANALYSIS
Dynamic Contrast Enhanced Imaging (DCE-Imaging) is an increasingly used non-invasive imaging strategy to analyse tissue micro-vascularisation and perfusion. This technique is based on 1) an injection of a bolus of contrast agent, 2) an imaging modality (CT, MRI or Ultrasound imaging) in a sequential mode (Ivancevic et al., 2001), and 3) a method to analyse the kinetics of tissue enhancement over time (Brix et al., 2012;Sourbron and Buckley, 2012). DCE-Imaging has been widely demonstrated to be useful in detection and characterisation of lesions such as ischemia and tumours, in a variety of tissues such as brain, heart, breast and liver (Van Dijke et al., 1996), as well as in prediction and evaluation of the effects of therapies (Zahra et al., 2007;O'Connor et al., 2008;Leach et al., 2012). DCE-Imaging is especially useful in analysing tumour angiogenesis, i.e., the fast growth of a new and chaotic capillary network in tumours induced by growing factors secreted by the fast multiplying tumour cells (Brasch et al., 2000). This functional imaging field is therefore growing rapidly an constitutes a valuable addition to traditional morphological imaging, signal intensity, shape and size of the lesions. However, several meta-analyses (Zahra et al., 2007;O'Connor et al., 2008) have underlined the difficulties in comparing DCE imaging results between different centres due to differences in data acquisition and analysis of this technique, as well as to signal to noise limits. Signal to noise limits are due to fast dynamic acquisitions and due to motion artefacts induced by breathing, heart beating, bowel movements as well as involuntary patient movements. To improve reproducibility, the challenge consists in minimising the influence of the local experimental conditions on data.
In Ding et al. (2009), a method based on Karhunen Loeve Transform has been presented in order to reduce noise on cardiac cine MRI. The factorial axes are selected according to the autocorrelation function of each eigenimage. The axes are retained in a contiguous way by an automatic criterion based on half-maximum height of the autocorrelation peak. In Ding et al. (2010), an extension of this method for spatially variant noise has been presented. It is based on the eigenvalue distribution of random matrices.
In Balvay et al. (2011), an improvement of Signal to Noise Ratio for Dynamic Contrast-Enhanced Computed Tomography and Magnetic Resonance Imaging with PCA is explained. A new criterion, the fraction of residual information, is proposed to automatically select the factor axes. It takes into account the temporal order of the images in the series.
In this paper we present the most interesting results on DCE-MRI series of a method that reduces the noise by Factor Correspondence Analysis, followed by a classification and a segmentation stage. Readers interested in more details and proofs on our method are invited to consult (Noyel, 2008c). A short overview of some of the results of this study has been presented in (Noyel et al., 2008b).

PAPER ORGANISATION
The first part of this paper introduces some pre-requisites on hyper-spectral image processing. The second part is focused on filtering and data reduction of hyperspectral images. In the third part the classification step is explained and in the fourth part the segmentation by standard and stochastic watershed of DCE-MRI series is presented. The full image analysis process is illustrated and validated by an application to the automatic detection of tumours on animals.

PRE-REQUISITES NOTATIONS
In order to analyse DCE-MRI series with methods based on multivariate image processing, we use a specific notation for hyperspectral images. By using this generic notation, we consider the temporal dimension of the image as the "spectral dimension". Let : (1) be an hyperspectral image, where: Several approaches exist to analyse DCE-MRI: 1. spatial analysis: channel f λ j by channel f λ j 2. spectral analysis: vector 3. spatio-spectral analysis: simultaneous use of both approaches f λ .
The methods are illustrated in Fig. 3. Generally, the spatio-spectral approach gives the best results.

NECESSITY OF DATA REDUCTION
DCE-MRI images are time series composed of several hundreds of time channels. As channels are not statistically independent, the reduction of spectral dimension is necessary: 1. to reduce "Hughes phenomenon" (Hughes, 1968) also called the "curse of dimension" 2. to reduce the amount of data and therefore to reduce the computational time.
Hughes phenomenon has been studied in the case of hyperspectral images, among others in Landgrebe (2002) and Lennon (2002). To tackle this problem several data reduction methods exist, e.g., Correspondence Analysis, Principal Component Analysis, Independent Component Analysis, etc. Modeling the spectrum is also useful for reducing the spectral dimension as the example shown in this paper.

METHODOLOGY TO SEGMENT DCE-MRI TIME SERIES
The overall proposed methodology to segment DCE-MRI time series is composed of three steps (Fig. 4): 1. a filtering of the images and a reduction of their spectral dimension 2. a classification of the vector pixels to a given number of classes 3. a segmentation step based on a function to flood by a watershed transform. This function combines the spatial and spectral information. It consists of a probability density function of contours The first two steps are included in the preprocessing step, which in this paper is based exclusively on the spectral information. The spectral approach is actually a global analysis on the image, because we compare all the vector pixels between them. On the other hand spatial approach is more local, because a given vector pixel is mainly compared to its nearest neighbours.

FILTERING AND DATA REDUCTION
Multivariate image denoising and dimensionality reduction is addressed in this study with Factor Correspondence Analysis (FCA; Benzécri, 1973). A factorial space of reduced dimension is obtained with respect to the spectral dimension of the original image space. This reduced space is composed of factor pixels c f α of the hyperspectral image f λ .
Factor Correspondence Analysis is useful to reduce spectrum dimension (Noyel et al., 2007c). Similar results can be obtained with other methods such as Principal Component Analysis (PCA) or Independent Component Analysis (ICA), etc. We have given priority to FCA because it is efficient in segmenting multivariate images with positive pixels values.
The original image f λ can be reconstructed from a limited number of factors leading to a good approximation f λ of this original image f λ . The reconstructed image contains, under certain conditions, a spectral noise that is smaller than the noise on the original image. Therefore, a method which reduces the noise on the factor pixels c f α will be explained in this section.
Additionally, the reduction of the spectral dimension of the image f λ by fitting a spectrum model on the filtered image f λ will be presented. This is to take advantage of the prior knowledge of the spectrum. By modeling the spectra, some maps of the parameters of the model are obtained. These maps constitute a reduced space which is useful for further classification and segmentation.

Introduction to FCA
Data analysis is a transformation ζ of the space of the original image f λ , with a dimension L, into a space of another hyperspectral image c f α , of reduced dimension K < L, and a set of parameters: ζ : (2) with: -c f α the factor pixels of the hyperspectral image. It is the coordinates of the vector pixels on the factorial axes.
-d f αλ the factors of the channels. They are the coordinates of the channels on the factorial axes.
µ α is the inertia of factorial axis α.
ν i. the marginal frequency of the vector pixel ν . j the marginal frequency of the channel f λ j : For data analysis, a limited number K of factors is usually selected. Therefore, data analysis is a projection of the pixels of the original image f λ into a space of smaller dimension K < L and often K L.
The reconstruction ζ −1 of the image f λ is a pseudo-inverse transform. It is an exact transform if all the axes are kept (K = L − 1). It consists in partially reconstructing the image f λ from the pixels factors c f α and some other parameters. The reconstructed image f λ , with a limited number of factors, is an approximation of the original image:

Selection of the factor axes
The number of factorial axes to be kept needs to be chosen. It depends of: the part of inertia (or variance) that they explained in the data cloud. Several tests, based on inertia, exist that allow one to choose the correct number of axes such as the "Kaiser criterion" (Kaiser, 1960) or the "scree test" (Cattell, 1966).
the amount of information contained in the factor pixels. We have introduced a new criterion based on the signal to noise ratio of the factor pixels.  19% Fig. 6. The factor pixels on the 9 firsts axes and their inertias. The factors pixels typeset upright are kept; those italicized are rejected due to a signal to noise ratio which is below a given threshold.
In Fig. 5 one can notice that the first 5 axes contains the main part of the total variance or inertia (about 30%). However, by observing ( Fig. 6) the images of the factor axes c f α k we notice that the factor pixels c f α 7 and c f α 8 contain mainly noise while the factor pixel c f α 9 contains mainly signal. In order to quantify the amount of information contained in the factor pixels, their signal to noise ratio is estimated by the method proposed in Noyel et al. (2008a).
The channel c f α k (or f λ j ) is considered as a realisation of a random function. For each factorial axis, the centred spatial covariance is estimated by assuming that c f α k (x) is a stationary function: with c f α k the centred channel α k : It is known that the covariance of a noisy random function shows a discontinuity for h → 0. This discontinuity, equal to the variance of the noise, is called the "nugget" effect (Matheron, 1970;1975). This is illustrated in Fig. 7, where the covariance is plotted for a channel without much noise and for a noisy channel. In the latter case, a peak at the origin of the covariance can be noticed. The variance of the signal can be estimated by the difference between g α k (0) and the nugget effect. It can be computed by an automatic extraction of the peak at the origin of the covariance image after a morphological opening γ. The structuring element is chosen as small as possible. It is a square of size 3 × 3 pixels. From this extraction, a signal to noise ratio is estimated, according to the following definition: with g the centred covariance and γ the morphological opening.
By observing the signal to noise ratio (SNR) of the different factors in Fig. 7, it appears that the channels c f α 7 and c f α 8 are noisier than others. In what follows, we propose channels with a SNR greater than 0.3 are retained for reconstruction. Covariance before (g) and after a morphological opening (γg) on the factor pixels channels c f α 1 (without much noise) and c f α 100 (noisy channel).  The pixels factors typeset upright in Table 1 and in Fig. 6 are kept for data analysis, while those italicized are rejected. We notice that the selected factorial axes, are not contiguous in terms of inertia, as only components with a SNR > 0.3 are kept.

Reconstruction
In Fig. 9, some channels of the image f λ are displayed before and after reconstruction. On channel f λ 12 , we notice that one ventricle of the heart appears dark while the other appears bright. This opposition is preserved after the reconstruction of the channel f λ 12 using the selection of axes by SNR. One can also notice, that the image f λ is a good reconstruction of the image f λ and that a part of the noise, in the original image f λ , is removed by FCA reconstruction. Therefore, the denoising has been made by a spectral filtering of FCA, which preserves the spatial structures of the image. This is a crucial improvement before segmenting the images by mathematical morphology.
. Five channels of the hyperspectral image (i.e., the DCE-MRI series) before ( f λ i ) and after reconstruction ( f λ i ) from 16 axes of a FCA.

Noise reduction
Like Principal Component Analysis, FCA is useful to perform a spectral filtering of the data. In Benzécri (1964; and in Orfeuil (1973), FCA is used to filter arrays of "Euclidean data marred with errors". The use of PCA to filter hyperspectral data was primarily illustrated in Berman (1985) and in Green et al. (1988).
A study about noise reduction by FCA on hyperspectral images is available in Noyel (2008c). In the following text, only the interesting results obtained in this previous study are used.
In Noyel (2008c), we have shown two sequences of FCA and reconstruction steps reduce the noise more than just a single sequence.
A sequence of one FCA and a reconstruction f (1) λ is equal to the reconstructed image. It is written: Two sequences of FCA-reconstruction are defined as: with t ε a spectral translation of the data. Actually we add the constant ε to the value of the image f λ in order to recover positive values, as required for FCA: Therefore the second sequence of FCAreconstruction is not identical to the first one because the data have been modified by the translation t ε .
For ε we propose to use the minimum of all the data in the first reconstructed image f More details about the two sequences of FCAreconstruction are given in the appendix.
Notice that the inertia is evaluated on the axes of the original, i.e., first, FCA.
In order to compare the original DCE-MRI series f λ with the series after 2 sequences of FCAreconstruction with 16 axes f λ = f (2) λ , the residues are computed channel by channel: An hyperspectral image of residues r λ may be defined: of which the channels r λ j are equal to: The centered spatial covariance g r λ j is also measured on the residues of the channels: Fig. 10, some channels after 2 FCA reconstructions and their residues are presented. Notice that some noise has been removed. This noise is due to the acquisition process and is mainly located at the centre of the image. The centered covariances g r λ j contains a peak at their origin, which means that the residues correspond to noise.
Fig. 10. Three channels of the DCE-MRI series before, f λ i , and after 2 FCA-reconstructions with 16 axes f (2) λ , their residues r λ and the covariances on the residues g r λ . The histogram of the residues has been normalised for visualisation. Fig. 11. 5 spectra before and after 2 FCAreconstructions (with 16 axes) and the corresponding residues.
Some spectra of the reconstructed image after 2 FCA are plotted in Fig. 11. The filtered spectra by 2 sequences of FCA-reconstruction f λ (x i ) have a smaller variability than the original spectra. Moreover after the filtering stage the general trend of the spectrum is enhanced. For example, the signal of the spectra corresponding to the tumour signal is increasing. This corresponds to the accumulation of the product of contrast inside the tumour.
In conclusion we propose a noise reduction method which consists in applying two series of FCAreconstructions. This approach reduces the temporal (spectral) noise while preserving the contours. This is an important point for a further segmentation.

STRONG DIMENSIONALITY REDUCTION BY SPECTRUM MODELING
Following the noise reduction which preserves the spatial information by data analysis, an additional dimensionality reduction by spectrum modeling is proposed.
In this approach, a parametric model is fitted to each spectrum and consequently, the images of parameters can be seen as maps. These maps are useful for classification and morphological segmentation. We notice that the set of these maps of parameters (p 1 (x), . . . , p M (x)) constitutes a multivariate image with a reduced dimension: The fitted model may take into account the physical phenomena, such as the physiological pharmacokinetic model used in Brochot et al. (2006). Their model is based on differential equations on six compartments: arterial and venous plasma, tumour (split into capillaries and interstitium), and the rest of the body (also split into capillaries and interstitium). However, in the current study we adopt a simpler linear model. More precisely, for each spectrum (time series) of the filtered image f λ , a line model is fitted after removing the first 20 values which correspond to a transitory phenomenon (Fig. 12): with j 1 the first value after the peak ( j 1 = 21 for our images).
This model is made with two parameters: the slope p 1 = a and the intercept p 2 = b.
In order to take into account some information contained in the transitory part of the time series, a third parameter is used. It corresponds to the amplitude of the signal over the twenty first values of the spectrum. This transitory part is characteristic of the injection of the contrast agent used in this imaging modality. This parameter is called the rise p 3 = m and it is defined by: Hence, by fitting the model for each spectrum f λ (x i ), three maps of parameters are obtained (Fig. 13). We remark that the dimensionality reduction is very important because the original multivariate image with 512 channels is transformed into another multivariate image with only 3 channels. Even with this reduced amount of information, the main morphological structures of the mouse (the heart cavities, the tumour and the lungs) are clearly apparent.
To conclude, spectrum modeling is very efficient in reducing the number of channels for tumour segmentation. It could be interesting in further studies to fit a a model taking into account physical properties of the injection of the contrast agent.

TEMPORAL CLASSIFICATION OF DCE-MRI TIME SERIES
After performing a denoising step and a dimensionality reduction, the classification of pixels is made in the temporal dimension. Supervised and unsupervised methods are considered.
UNSUPERVISED APPROACH: REGIONAL IMPROVED K -MEANS For the unsupervised approach, a k-means classification and a model classification are compared.
A k-means classification of pixels is based on the Euclidean metric (Diday, 1971;Hartigan and Wong, 1979). This classifier must be used in a space in which the associated metric is Euclidean. It is the case for the factor space of FCA or of PCA. However, using k-means in the image space would overweight channels with large dynamics. So classifying is done in the factor space instead. For DCE-MRI series, the kmeans classification is performed with 5 classes in the factor image space c f α of the second sequence of FCAreconstruction. In Fig. 14 some anatomical parts of the mouse have been classified into 5 classes. The number of classes has been empirically defined. The classes are as follows: (1) the green class corresponds largely to the tumour -see top right corner of the image, (2) the red class corresponds mainly to the background, (3) the blue class corresponds to the heart cavities, (4) the black class corresponds to the lung and (5) the cyan class is an intermediate class. The results have not been completely satisfactory, so we have introduced an alternative approach of classification, which clusters each spectrum in comparison to some reference spectra obtained on the k-means classification. For each class obtained by k-means, a mean filtered spectrum is computed sp kmeans f λ . On these mean spectra, For each spectrum f λ (x i ) of the filtered image by two sequences of FCA-reconstruction the line model is   Channels mean spectrum As one can notice in Figs. 14 and 24, the classification κ mod,5 f λ by the model approach seems to be more robust than the k-means classification because a statistical noise filtering has been made when fitting a model with 3 parameters. Then, the k-means classification has been computed on 19 channels while the model classification has been computed on 3 channels with few noise. Therefore, the model decreases the entropy of the image by introducing a prior information present in the shape of the spectra. Nevertheless, the k-means classification is necessary as a first step, in order to estimate the mean spectra δ Each class of the training set, T = train = (t 1 ,t 2 ,t 3 ,t 4 ), is made of 80 vector-pixels , f λ (x i ) with 512 components, selected by an operator. By measuring the mean spectra of the filtered image f λ on these classes sp train f λ , we notice that the kinetics of train classes are different (Fig. 18). The LDA is performed into three different spaces: the PCA factor space of the spectra of the training set: Fig. 19, the classifications to three different spaces are very similar. The LDA in the filtered image space or the LDA in the PCA factor space of the training set are a bit better than the LDA in the parameters space. The training classification error and the test errors are computed on 80 pixels vector of the training set by a 5-fold cross validation (Hastie et al., 2003). Both classification errors are equal to zero.
reference LDA on f λ LDA on ζ train ( f λ (x)) LDA on p Fig. 19. Semi-supervised classification LDA in 4 classes in 3 different spaces: the filtered image space, the PCA factor space of the spectra of the training set, the parameters space.

CLASSIFICATION ON SIMILAR DCE-MRI SERIES
As we want to classify several DCE-MRI series of large image databases, it is necessary to develop classification methods which are robust in all series, without the needs for a training set for each image series.
By testing our methods on another series called "serim460", we can notice in Fig. 20 that the unsupervised classifications (k-means and model) are correct compared to the given reference. However, the LDA classification on the new series is not correct whatever the image space. In this case the training set is from another series ("serim447").
In the series "serim447" and "serim460", some mean spectra are measured into similar zones (tumour, heart cavities, background and lung). In Fig. 21 we notice that the range of the spectra are not the same for both images. This is the origin of the problem of the classification for a supervised method such as LDA. . The training set of the LDA is from the series "serim447".  Fig. 21. Selected areas and associated mean spectra for the series "serim447" and "serim460" after a sequence of 2 FCA-reconstructions (a zoom is made on the series "serim447").
In order to use LDA on other image series, using the initial training pixels selected on the first image, the range of the grey levels of the images must be similar. Otherwise the projection of other series in the classification space of the pixels would be incoherent. Consequently, we introduced a range normalisation method based on histogram anamorphosis. To get more robust results, the multivariate image of parameters p is used. For each parameter, the cumulative distribution function (cdf) of the values is estimated. The cdf is the primitive of the density function estimated by an histogram. It is composed of 255 classes defined on each map of parameters of the initial series. The cdf of each image is transformed by a numerical anamorphosis on the grey-tone values in order to be similar to the reference cdf of the initial series "serim447" (Fig. 22).
The LDA classification κ LDA,4 p on series "serim460" after cdf normalisation (Fig. 23) gives much better results than the same classification without cdf normalisation (Fig. 20). p on the maps of parameters of the series "serim460" after cdf normalisation. The training has been made on the series "serim447". For validation purposes, we compare the classifications by k-means, model and LDA on other series (Fig. 24). The classifications based on the model approach are more robust than those obtained with kmeans. LDA classifications also give good results. The heart cavities are correctly classified and the tumours are characterised by extended classes in green.
In the case of tumours starting to die in their centre, which would make them potentially smaller than their real size, our method identifies viable tissues (i.e., functional tissues). The central zones which are not included inside the tumours are zones of severe ischemia or necrosis. Currently, more and more DCE maps are analysed in the following way: i) identification of the "necrosis" ratio (volume of the necrosis divided by total volume); ii) characterisation of viable tissues of the tumour. Without both analysis, all the circulatory parameters are underestimated during the growth of the tumour which leaves in its centre more and more necrosis (or fibrosis).

SPATIO-TEMPORAL SEGMENTATION BY PROBABILISTIC WATERSHED
After having filtered the temporal noise and having reduced the temporal dimension, the classification produced a partition of the image into non connected classes only based on temporal information. Our aim is now to segment the series with smoother contours and regular classes by combining the spatial and the spectral dimension in the segmentation process.
In order to segment hyperspectral images, we introduced a general method based on deterministic watershed (WS) in Noyel et al. (2007c) and another one based on stochastic WS in Noyel et al. (2007a;. Based on previous work we want to present the potential of application of our methods on DCE-MRI series in order to segment regions to be tumour candidates.

PRINCIPLE OF THE SEGMENTATION OF MULTIVARIATE IMAGES BY WS
The watershed transformation (WS) is one of the most powerful tools for segmenting images and was introduced in Beucher and Lantuéjoul (1979). According to the flooding paradigm, the watershed lines associate a catchment basin to each minimum of the landscape to flood (i.e., a scalar or greyscale image; Beucher and Meyer, 1992). Typically, the landscape to flood is a gradient function which defines the transitions between the regions. Using the watershed on a scalar image without any preparation leads to a strong over-segmentation (due to a large number of minima). There are two alternatives in order to get rid of the over-segmentation. The first one consists in first determining markers for each region of interest. Then, using the homotopy modification, only the local minima of the gradient function are imposed by the markers of the regions. The extraction of the markers, especially for generic images, is a difficult task. The second alternative involves hierarchical approaches either based on non-parametric merging of catchment basins (waterfall algorithm) or based on the selection of the most significant minima. These minima are selected according to different criteria such as dynamics, area or volume extinction values. Extinction functions (Meyer, 2001) are used to remove the non selected minima.
The general paradigm of WS-based segmentation of multivariate images (Fig. 25) requires two different inputs: (1) some markers for the regions of interest mrk and (2) a landscape to flood g which describes the "likelihood" of the frontiers between the regions. The markers can be chosen interactively by a user, or automatically by means of a morphological criterion ξ N (Meyer, 2001), or with the classes of a previous spectral classification. The landscape to flood is a scalar function (i.e., a greyscale image). For the deterministic WS, it is usually a gradient (actually its norm), or a distance function. For the stochastic WS, the function to flood is a probability density function (pdf) of the contours appearing in the image. The extracted markers are imposed as sources of the landscape to flood and the WS is computed. The results denoted W S(g, mrk) or W S(g, ξ N ).

SPECTRAL DISTANCES AND GRADIENT ON MULTIVARIATE IMAGES
A gradient image, actually its norm, is usually chosen as a function to flood. After normalisation, the norm of a gradient image is a scalar function with values in the reduced interval [0, 1], i.e., ρ(x) : E → [0, 1]. In order to define a gradient, two approaches are considered: the standard symmetric morphological gradient on each marginal channel and a metric-based vectorial gradient on all channels (Noyel et al., 2007c).
The morphological gradient is defined for scalar images f as the difference between a dilation and an erosion by a unit structuring element B, i.e., The morphological gradient can be generalised to multivariate functions (Hanbury and Serra, 2001) with the following metric-based gradient: Various metric distances d(f λ (x), f λ (y)) between two vector pixels, useful for multispectral images, are available for this gradient such as: the Euclidean distance: where Σ is the covariance matrix between variables (channels) of f λ . If channels are uncorrelated, the covariance matrix is diagonal. The diagonal values are equal to the channels variance σ 2 λ j \ j ∈ {1, 2, . . . , L}. Therefore, the Mahalanobis distance becomes the distance inverse of variances: An important point is to choose an appropriate distance depending on the space used for image representation: Chi-squared distance d χ 2 and distance of inverse variances d 1/σ 2 are adapted to the image space and Euclidean distance to factorial space. More details on multivariate gradients are given in (Noyel et al., 2007c;. Another example of a multivariate gradient is given in Scheunders (2002).

INTRODUCTION TO STOCHASTIC WS
In a classical watershed, small regions strongly depend on the position of the markers, or on the volume (i.e., the integral of the grey levels) of the catchment basins, associated with their minima. In order to improve segmentation results, stochastic watershed aims at enhancing the contours of significant regions which are relatively independent of the position of the markers. Stochastic WS method is described in Angulo and Jeulin (2007) and Noyel et al. (2007a;. Starting from a series of M realisations of N uniform or regionalized random germs (or markers) are made on a landscape to flood (for example a gradient). With these M segmentations, the probability density function of contours pd f (x) is estimated by the Parzen window method (Duda and Hart, 1973) with a gaussian kernel (typically with a 3 pixels standard deviation working on contours of one pixel width). Due to the smoothing effect of the method, the WS lines with a very low probability, which correspond to non significant boundaries, are removed.
To obtain closed contours, the pdf image is segmented by a watershed segmentation into R regions. The stochastic WS needs two parameters: 1. M realisations of germs. The method is almost independent on M if it is large enough (between 20-50).
2. N germs (or markers): if N is small, a segmentation in large regions is privileged; if N is too large, the over-segmentation of sg mrk i leads to a very smooth pdf, which looses its properties to select the R regions.
As shown in Angulo and Jeulin (2007), it is straightforward to use N > R.
The originality of our approach of stochastic WS for DCE-MRI series is to condition the germs used to build the pdf by a previous classification.

Pre-processing of the temporal classification
As the classification is based on temporal information we want to introduce it by conditioning the random markers used to generate the probability density function of contours.
In order to do this, a pre-processing stage is necessary for two reasons: to reduce the "segmentation noise" appearing as the smallest connected components of the classification κ.
to introduce the necessary degrees of freedom to perform each WS used to build the pdf.
Therefore an anti-extensive transform is performed on each class of κ by a morphological erosion with a structuring element of size 3 × 3 pixels. An alternative is to make an area opening (Soille, 1999). Then an extensive transformation such as a closing by reconstruction (Soille, 1999) is performed to fill the holes inside the largest connected components. As the classes of the transformed classification are not anymore a partition of the image, a "void" class is introduced. It corresponds to the class appearing in place of the transformed connected components.
For the DCE-MRI sequences the complete transform ϒ is processed on the LDA classification on the parameters space κ LDA,4 p (Fig. 26).

Extension of stochastic WS to multivariate images
The extension of stochastic WS to multivariate images was introduced in Noyel et al. Segmentation of DCE-MRI series by a standard WS on a distance-based gradient. Before introducing the way to extend stochastic WS to multivariate images, let us show that the segmentation by a standard WS on a distance based gradient has limited efficiency.
For the gradient based segmentations, a distance adapted to the image space is used: the Euclidean distance for the image of the factor pixels c f α the distance of inverse variances for the image of parameters p the Euclidean distance for the image of the PCA factor space obtained from the spectra of the training set c train β .
Comments: The distance-based gradient is computed after a morphological leveling on each channel of the considered image in order to get a smoother gradient. A morphological leveling is a morphological transformation that reduces the positive and negative "peaks" according to a reference while preserving the transitions of the objects. See Meyer (2004) for more details. The reference for the leveling is obtained by a gaussian filter of size 11 × 11 pixels.
A WS segmentation with a volume criterion is performed in Fig. 27. A marker-controlled WS is performed in Fig. 28. The markers are made by a morphological opening on each connected component of the classification with an hexagonal structuring element of size 5.
We notice that the segmentation is not perfect, especially for the tumour. However, the marker based segmentation in the PCA factor space of the spectra of the training set seems to be the best segmentation. . Probability density function for multivariate images In Noyel et al. (2007a), we studied two ways to extend the probability density function of contours to multispectral images: 1. the first one is a marginal approach (i.e., channel by channel) called marginal pdf mpdf (algorithm in Table 2) 2. the second one is a vectorial approach (i.e., vector pixel by vector pixel) called vectorial pdf vpdf (algorithm in Table 3).
with w j = 1/L, j ∈ [1, . . . , L] in the image space and w j equal to the inertia axes in the factorial space.
A probabilistic gradient was also defined in Angulo and Jeulin (2007) to ponder the enhancement of the largest regions by the introduction of smallest regions. It is defined as ρ prob = mpdf + ρ d : after normalisation in [0, 1] of the weighted marginal pdf mpdf and the metric-based gradient ρ d .
In order to obtain a partition from the mpdf, the vpdf or the gradient ρ prob , these probabilistic functions are segmented, for instance by a hierarchical WS with a volume criterion, as studied in Noyel et al. (2007a). In such a case, the goal is not to find all the regions. The stochastic WS addresses the problem of image segmentation in few pertinent regions according to a combined criterion of contrast and size. In the present study, as we discuss below, the segmentation of the pdf is obtained from the WS with a volume criterion.

CONDITIONING OF THE GERMS OF THE PDF BY A PREVIOUS CLASSIFICATION
The pdf of contours with uniform random germs contains only spatial information. By conditioning the random germs by the spectral classification, we introduce a spatio-spectral pdf. These germs are going to be regionalized by a pre-segmentation obtained by a pre-processing of the spectral classification. Several kinds of germs have been tested: The pdf of contours with uniform random germs is constructed without any prior information about the spatial/spectral distribution of the image. Spectral information is introduced in the pdf by conditioning the germs by the previous transformed classification κ. To do this, it is possible to use point germs or random ball germs whose location is conditioned by the classification. An exhaustive study of the germs is presented in Noyel (2008c) and Noyel et al. (2010). Below, we present random ball germs regionalized by a classification where each connected class may be hit one time, mrk κ−b i (x). For the detection of tumours in DCE-MRI series we prefer to use the last kind of regionalized random balls germs mrk κ−∪b−conn i (x).
The procedure is as follows: the transformed classification κ is composed of connected classes, κ = ∪ k C k with C k ∩C k = / 0, for k = k . The new void class, which appears after the transformation of κ, is written C 0 . Then random germs are drawn conditionally to the connected components C k of the filtered classification κ. To do this, the following rejection method is used: random point germs are uniformly distributed. If a point germ m falls inside a connected component C k of minimal area S and not yet marked, then it is kept, otherwise it is rejected. Therefore not all the germs are kept. These point germs are called random point germs regionalized by the classification κ. However, these regionalized point germs are sampling all the classes, independently of their prior estimate of class size/shape. In order to address this limitation, we propose to use random balls as germs.
The centres of the balls are the random point germs and the radii r are uniformly distributed between 0 and a maximum radius Rmax: U [1, Rmax]. At each step, only the intersection, B(m, r) ∩ C k , between the ball B(m, r) and the connected component C k is kept as a germ. Then the union is made with the previous germs. At the end of the "fall" of the random germs, the connected classes are considered as markers for the watershed used to build the pdf of contours.
These balls are called random balls germs regionalized by the classification κ and noted mrk κ−∪b−conn i (x).
The algorithm in table 4 sketches the process. Note that if N is the number of random germs to be generated, the effective number of implanted germs is lower than N.  if C k , such as m ∈ C k , is not marked then 7: end if 10: end for 11: Label each connected regions in the image of markers After computing the marginal pdf of contours mpdf with random balls germs mrk κ−∪b−conn i (x), from the representation of pixels in the parameters space, the pdf are segmented by a hierarchical WS with a volume criterion (Fig. 29). The classification used to build the pdf is the LDA classification in the PCA space of the training set of parameters c train β .
In Noyel et al. (2010), the pdf built with these germs seemed to give better results than others. Therefore, the results shown use these germs.
In order to better understand the process to build the regionalized random balls-germs mrk κ−∪b−conn

VALIDATION OF THE METHOD: APPLICATION TO COMPUTER AIDED DETECTION OF TUMOURS
After presenting the way to compute the stochastic WS with regionalized random balls-germs, we are going to apply it to computer aided detection of tumours on several DCE-MRI series.
In order to detect potentially tumourous areas, the DCE-MRI series are first segmented by stochastic WS. Then the regions of the segmentation are classified in potentially tumourous (or not tumourous) areas. The whole analysis flowchart is presented in Fig. 31. It combines the different parts introduced earlier in this paper: pre-processing stage: a noise reduction by FCA and model fitting training stage of the classifier: the LDA classifier is trained on some reference pixels selected on a reference image of the parameters.
classification stage: a LDA after normalising the histograms of the maps of the parameters which model the spectra. The histograms are normalised in order to match the histogram of the reference image.
segmentation stage: a stochastic WS with regionalized random-balls germs mrk κ−∪b−conn i (x) conditioned by the classification. Starting from the segmentation by stochastic watershed, the detection of potential tumours depends on two criteria which have been empirically determined according to a prior knowledge on DCE-MRI series: 1. a positive mean slope parameter a because the contrast agent tends to accumulate in these areas during the acquisition 2. a mean intercept b higher than given a threshold (800) after histogram normalisation. With this parameter, the areas of the background with a small positive slope are removed from the detection.
So that medical doctors can evaluate the pertinence of the detected zones, confidence maps on the parameters were built. For each zone, some coefficients of variation were computed. These coefficients β are defined as the ratio between the standard deviation, σ , and the mean, mean, of the parameters for the considered region: Then, the confidence maps were thresholded: for β a at 5 and for β b at 1. For coefficients close to zero, the considered region is more likely classified as cancerous. A look up table is applied on the confidence maps: in blue is the highest risk (β = 0) and in red is the lowest risk (β a ≥ 5 or β b ≥ 1).
The detection det(x) and the confidence maps for the series "serim447" are in Fig. 32. We notice that for Fig. 32. Detection of potentially cancerous areas. LDA classification in the parameter space κ LDA,4 p , morphological transform of the classification used to condition the germs of the pdf mrk κ , mpdf with random balls germs regionalized by the classification mpd f (p, mrk κ ) (N = 100 points, M = 100 realisations, area S = 10 pixels, Rmax = 30 pixels), segmentation by volumic WS in R = 20 regions sg R−vol (mpd f ), reference re f , detection det(x) of potentially cancerous zones (mean(a) > 0 and mean(b) > 800), confidence maps for the slope β a and for the intercept β b . the largest potentially cancerous zone, the risk is high (in blue). This corresponds to the tumour specified by the medical doctors. On the other hand, the smallest zone in red, for which the risk is low, is not a tumour. Therefore, our detection method works.
The same approach has been applied to 25 series of images. In the Fig. 33 some results are presented for 6 series. The potentially cancerous detected zones correspond to the references given by the doctors. In the series "serim450" and "serim457" some potentially cancerous zones with a higher risk are even detected while they were not selected in the reference. In the others series, similar results have been obtained. All the tumours marked by clinicians have been detected.

CONCLUSION
In this paper, an automatic method of detection of potentially cancerous zones on DCE-MRI series is presented. The results have been tested on a limited number of images. They are very promising and in agreement with the references given by medical doctors.
Our method is composed of four stages. In the pre-processing stage, a dimensionality reduction and a noise reduction are first performed with the Factor Correspondence Analysis and the pixel-based spectrum modeling. These operations preserve the spatial contours of the image. Then in the classification stage Linear Discriminant Analysis is performed on a subset of training pixels. In the third stage, the image is segmented by stochastic watershed with random-balls markers regionalized by the previous classification. The originality of this approach is the combination of the spatial and temporal information to produce a "multivariate gradient" representing the probability density function of contours. These probability maps are segmented by stochastic watershed which is very useful when segmenting the low contrasted regions corresponding to tumours, since it regularises the contours. The last stage is a detection of potentially tumourous zones by statistical criteria. A confidence maps is associated to the selected zones. These maps show the risk of the regions to be cancerous.
A method of computer aided detection of potentially cancerous zones on DCE MRI sequences of small animal has been detailed. It seems to be very promising for low contrasted data sets. Further systematic tests should be performed, in order to validate the method on a larger data sets. In the future, some physical models could be fitted on the temporal series in place of the actual model.

APPENDIX EXPERIMENTAL CONDITIONS
As explained in Brochot et al. (2006), here are the details about experimental conditions: animals used and MRI examination.
Animals. Experiments were performed on nude nu/nu male mice (Laboratoire Iffa Credo, L'Arbresle, France), in full compliance with the National Institutes of Health recommendations for animal care. Approximately 1.5 × 10 6 PC-3 human tumour cells were implanted subcutaneously into the flank of each mouse, as described in Pradel et al. (2003). For imaging, the animals were anesthetised by a peritoneal injection of ketamine (Rompun, Bayer, Leverkusen, Germany) and xylazine (Imalgène, Mérial, Lyon, France).
MRI examination. MRI examination was performed using a 1.5-T system (Sigma, General Electrics, Milwaukee, WI, USA) and a custom smallanimal dedicated coil. A sagittal 2D T1-weighted spin echo sequence (TE 11 ms, TR 400 ms, FOV 8 × 8 cm, 256 × 128 matrix, 1 NEX) was used to check adequate positioning of the animal and to select the axial plane level containing the left ventricle cavity and one or two flank tumours. The dynamic acquisition was performed using a single-slice T1-weighted 2D fast spoiled gradient recalled (FSPGR) sequence: TR 15 ms, TE 2.2 ms, flip angle of 608, bandwidth 31.25 kHz, 256×76 matrix for the asymmetric FOV of 7 × 3 cm, 5 mm slice. The single slice was positioned at the level selected by the previous sagittal sequence, and dynamic acquisition was performed with 10 baseline images and after a caudal vein bolus injection of 0.045 mmol Gd/kg of a macromolecular contrast agent (Vistarem, Guerbet, Aulnay-Sous-Bois, France).

NOISE REDUCTION
In this section, more details are given about our method of noise reduction based on two sequences of FCA and reconstruction (see section "Noise reduction" on page 7). This approach needs the subtraction of a constant ε as shown in Eqs. 7, 8 and 9.
The idea of subtracting a constant from the data is based on the fact that after one sequence of FCAreconstruction some values of the reconstructed image f (1) λ turn out negative. This is due to the fact that only a limited number of the factorial axes are kept for the reconstruction. These negative values, however, have no physical meaning. They are corrected in the first reconstructed image. Then a second sequence of FCA-reconstruction is applied because the data set has been modified. In order to be consistent, the constant is added after the second sequence of FCAreconstruction also.
The data are classified by k-means on the factor pixels of the first FCA c f (1) α and on the factor pixels of the second FCA c f (2) α . One can notice, in Fig. 34, that the classification is better with two FCA than with one FCA. α . Starting from the sixteen retained factorial axes of the first FCA (see section 3.1.2), the image is partially reconstructed and a second sequence of FCAreconstruction is applied with a spectral translation (equation 7).
In order to verify the importance of two sequences of FCA-reconstruction, the SNR are estimated: on the channels of the original image f λ , on the channels of the first reconstructed image f (1) λ and on the channels of the second reconstructed image f (2) λ . The SNR are also estimated on the factor pixels of the first FCA c f (1) α and of the second FCA c f (2) α ( fig. 35). After performing the first sequence of FCA-reconstruction, an improvement of the SNR is noticed in the image space. However, the second sequence of FCA does not improve the SNR in the image space.
In the factor space, the SNR is improved after the second FCA in comparison with the SNR, in the factor space, after the first FCA.
Why is it necessary to apply two FCA to filter the noise in the factor space, while only one FCA is necessary in the image space? During the reconstruction stage the factor pixels are weighted by their inertia. Therefore the weight of the noise is reduced because it appears on the factor pixels with a small inertia. Moreover, the image reconstruction is based on the product of the marginal frequencies, ν i. ν . j , which corresponds to the reconstruction of the barycentre. This barycentre gives the general appearance of the image. However, in order to remove the noise on the factor pixels two FCA are necessary. A hyperspectral signal to noise ratio between the original image (with noise) and the reconstructed image (filtered) may be defined as the ratio between the sum of the variance of the signal for each channel and the sum of the variance of the noise for each channel: