AUTOMATIC SEGMENTATION AND CLASSIFICATION OF CELLS FROM BRONCHO ALVEOLAR LAVAGE

Broncho alveolar lavage is the most commonly used diagnostic tool for confirming alveolar hemorrhage. Golde has introduced a ranking score, based on the hemosiderin content of macrophages which enables ranking cells from 0 to 4 based on the degree of Prussian blue stain. We propose a complete image analysis scheme to automatically perform both the extraction of the cellular objects and the ranking of each cell according to the Golde score. The image analysis techniques used mainly involve clustering and mathematical morphology. A 2D histogram is clustered to extract the main cellular components, a color watershed is used to determine and refine the regions. Finally, the cellular components of interest are firstly classified according to their hue and secondly according to their staining repartition. The proposed image analysis technique is very fast and produces reliable and accurate results.


INTRODUCTION
Pathologists today need a high degree of precision and reproducibility in their analysis results.The key to this is objectivity and standardization in their laboratory practice.Static laboratory imaging is now well established since the first attempts to incorporate digital imaging in pathology.Many industrial companies provide such imaging workstations whose investigation fields are not reduced to pathology but more generally to biology.In particular, broncho alveolar lavage (BAL) is a simple investigation method carried out during a bronchial endoscopy by injection of sterile physiological salt solution (250 ml) in a pulmonary segment.This is followed by a soft aspiration recovery.BAL is the most commonly used diagnostic tool for confirming alveolar hemorrhage (AH).The diagnosis of alveolar hemorrhage is made by progressively bloodier BAL returns when hemorrhage is recent or by an increased number of hemosiderin-laden macrophages using Prussian blue staining (Lassence et al., 1995).To establish the diagnosis of alveolar hemorrhage in cells recovered by broncho alveolar Lavage, Golde and colleagues created a score (Golden et al., 1975) based on the hemosiderin content of alveolar macrophages stained with Prussian blue.Each cell is graded for hemosiderin content on a scale of 0 to 4 (0 = no stain, 4 = dense staining).The ranking for hemosiderin content uses the following scale: 0, no stain; 1, faint blue in one portion of the cytoplasm; 2, deep blue in a minor portion of the cell; 3, deep blue in most areas of the cytoplasm; and 4, deep blue throughout the cell.The total score for an average of 200 cells is divided by two to obtain the Golde score (GS) (Moumouni et al., 1993).As reported by Kahn et al. (1987), AH is defined by a Golde score greater than or equal to 20 and is considered mild and severe when the Golde score is 20 to 100 and higher than 100, respectively.To automate the computation of the Golde score on microscopical color images, we propose a complete image analysis scheme to automatically perform both the extraction of the cellular objects and the ranking of each cell according to the Golde score.The section "Materials and Methods" describes the imaging system and its calibration.The section "Image Analysis Scheme" presents the proposed segmentation and classification scheme.The section "Results and Discussion" presents experimental results.The last section presents conclusions.

MATERIALS AND METHODS
The imaging workstation we worked with consists in an Olympus BX 50 microscope with a Märzhäuser motorized scanning and a 3CCD JVC KY-F75 camera connected to a computer by IEEE 1394.Before attempting to acquire images, the system needs to be allowed sufficient time to warm up.To determine this thermal equilibrium, a flat field image of a slide has been acquired every 5 minutes during three hours.By computing the difference image between two successive flat field images and taking the mean gray value of the whole difference image, we can plot the time course of color values after the system has been switched on.The data indicates that the thermal stabilization of the system has been located after 90 minutes (only 0.1 mean gray level of difference).The 90 min warming-up period was therefore used in our experiments.The lighting level of the microscope can be modified by the user, but we have fixed it to a constant voltage of 9 V which correspond to a color temperature of 5500 K (D55 illuminant) representing the recommended one for microscopy.Once the lighting level has been fixed, the user can adjust the microscope condenser aperture which levels the amount of light passing through the optical lens.To determine the best suited aperture level of the condenser, a calibration slide with a set of neutral density filters with known transmissions (5, 10, 30, 50 and 80%) was used.Theoretically, photodiodes have a linear response with respect to incident light intensity (Puech and Giroud, 1999).We used this assumption to determine the correct aperture.However, some aperture levels of the condenser are not appropriated because too much light passing through the lens results in a saturation of the 3CCD camera sensors.Therefore, we first plot the mean luminance of a flat field image according to the condenser aperture for several neutral density filters and this allows us to reduce the range of the aperture to [0.2-0.5].For the evaluation of linearity, the mean luminance of a flat field image has been determined according to the transmission of neutral density filters for several condenser apertures within its allowed range.According to medical standards, the coefficient of determination of the regression line plotted over a set of measurements with different neutral density filters should be close to 0.99 (Puech and Giroud, 1999).We have then chosen the regression line having the closest determination factor which corresponds to an aperture of 0.25.This aperture is appropriated only for a given objective lens (20× magnification) and has to be determined for each objective (corresponds to 0.5 for 40× magnification).Once the microscope settings are fixed, images can be acquired in a reproducible way.Fig. 1 presents an example of a typical Prussian blue stained color image digitized by our capture system.All the digitized images are color images of size 752 × 576 at a 20× magnification.

IMAGE ANALYSIS SCHEME
To segment images stained by Prussian blue, we consider the following scheme (resumed in Fig. 2 In the sequel, we present each steps of the proposed segmentation and classification scheme.

2D HISTOGRAM CLUSTERING
To perform the clustering of a color image, several strategies can be used.They generally differ on the dimension of the data used to cluster the image but they share the same assumptions: homogeneous regions in the image plane give rise to explicit clusters in the colorimetric projection representation.A colorimetric projection representation can range from 1D to 3D (or higher) dimensions.Usually, the spatial repartition of the colors in a color space is used to cluster an image assuming that the colors of the objects (describing homogeneous regions in the image plane) are grouped around dominant color clusters in a colorimetric projection (the center of the colorimetric projection classes).A cluster therefore corresponds to a class of pixels which have similar color properties.Clusters are peaks for the 1D case (1D histogram projections) and clouds of colors for the 3D case (3D histogram projection).In the 1D case, the histograms of each band are considered separately (marginal approach) and the method is reduced to finding thresholds dissecting the band histograms (Celenk, 1990;Lezoray and Cardot, 2002;Busin et al., 2004).Since one has one 1D histogram per color band, several clusterings are obtained.The resulting clustering can be combined with different methods, such as intersection, majority vote, Demspter-Shafer or Bayesian theory.This method is known as multi thresholding (Kurugollu et al., 2001).In the 3D case, the vectorial aspect of color is taken into account (vectorial approach) and the clustering is considered as the classification of multi spectral data (Postaire et al., 1993;Soille, 1996;Park et al., 1998;Géraud et al., 2001).One has to note that a clustering is not itself a segmentation.A clustering produces a clustering map where to each pixel is associated a class.Since a color cluster in a color space is not necessarily associated to connected colored pixels in the image plane, the clustering can assign the same class to pixels that are not connected.It is therefore necessary to perform a labeling of the clustering map to obtain a real segmentation.The 1D clustering method suffers from the fact that a color cluster is not always present in each band and the combination of the different segmentations cannot always catch this spatial property of the colors.The 3D clustering method is handicapped with data sparseness on the one hand and with the complexity of the search algorithm on the other hand.
An interesting alternative to 1D and 3D clustering methods relies on the use of pairwise projections and especially bivariate histograms (2D histograms) which This can bring several advantages (Matas, 1995;Kurugollu et al., 2001;Xue et al., 2003;Lezoray and Cardot, 2003;Lezoray, 2004).The paucity of the data encountered in the 3D case is partially overcame and the search complexity is drastically reduced.Moreover it partially uses the spatial repartition of colors and offers an intermediate method to the 1D and 3D ones.Another advantage to be considered is the fact that a 2D histogram is nothing more than a gray-level image, therefore classical and fast gray-level image processing algorithms can be used to cluster a 2D histogram.A 2D histogram is the projection of a 3D histogram onto band-pair planes.Fig. 3 presents the 3D histogram (Fig. 3a) in the RGB color space and the three 2D histograms (Figs.3b-d) obtained by bandpair projections for the image of Fig. 1.One can see that most of the information is present in a single 2D projection.The RG projection has a distribution which is sufficient to encode the color content of the image and we will further consider only this band-pair projection.We choose this projection not only because this projection seems accurate but because it is the one which provides the better clusterings.One has to note that Figs.3b-d  From the clustered 2D histogram (which has to be up sampled to to its original resolution by replication of the labels), a clustering map is obtained.Indeed, each region in the clustered histogram image corresponds to a set of colors in the original image: to each (RGB) vector in the image plane is associated the corresponding label in the clustered 2D histogram.Fig. 5a present the clustering map obtained after establishing the class to which each pixel belongs.A clustering map not being a segmentation since unconnected pixels in the image plane can have the same label, this clustering map is labeled to obtain a segmentation map (a partition of the original image in regions) and not only a clustering.

SPATIAL REFINEMENT
The above clustering method provides relatively coarse segmentations since the only information that is used to construct the segmentation comes from the way colors do organize in the color space.This is a strong limitation since colors close one to each other in the color feature space do not always correspond to adjacent pixels in the image plane.The spatial information in the image has therefore to be taken into account to refine the segmentation.To do this we use a mathematical morphology way of proceeding.A label erosion is performed: in the segmentation map (Fig. 5b), if, for a pixel of label j, any of its neighbors has a label k = j, then the pixel is set to unassigned (see Fig. 6a).This corresponds to impose the boundaries of the regions to be unassigned.Unassigned regions are extended in the image by performing a morphological erosion with a square structuring element of size 5 × 5 (see Fig. 6b).The effect one attends with this processing is to remove small objects and to keep only the core part of each object to further refine its boundaries.The remaining connected components are then considered as seeds and a color watershed algorithm (Lezoray, 2003) is used to fill the unassigned areas (see Fig. 6c).
However, when spatially refining the segmentation, one looses the initial clustering information which was assigned to each pixel in the clustering map assessing whether a pixel belongs to the following classes: background, mucus elements and cells.We can recover this information by assigning to each spatially refined region its mean class in terms of number of pixels: to each region is assigned its majority class according to the initial clustering map.Performing this, one obtains a spatially refined clustering map (Fig. 6d).With this information, we can once again simplify the segmentation map by giving the same label to all regions which belong to the background class (see Fig. 6e).

CELL SEPARATION
In order to enable individual analysis of each cell, cells that are clustered together must be separated one from each other.The separation of clustered or overlapping objects has been extensively studied (Cong and Parvin, 2000), but no method has proved to be superior in all situations.We choose to use a classical but very efficient morphological process to separate clustered elements.First, the Euclidean distance to the objects boundaries is computed, its ultimate erode set is extracted which provides a seed for each individual cellular component to be extracted.A watershed is then applied on the inverse of the distance to obtain a correct segmentation map where clustered cells have been separated.Fig. 7 illustrates all the steps of the cell separation.

CELL CLASSIFICATION
Once an accurate segmentation is obtained, we perform a classification of the cellular objects.Cellular objects are regions of the segmentation map which correspond to the class of cells in the refined clustering map.The number of cells to be classified is therefore lower than the whole number of regions in the image since the regions which correspond to background and mucus are not considered.The classification proceeds in two steps.Since Prussian Blue staining colors iron in blue, the first thing to take into account to classify a cell is the presence of blue pixels within its area.To this aim, each region is analyzed and classified among three classes according to their mean hue (Fig. 8).If the mean hue is close to red (h ∈ [0; π/3] ∪ [5π/3; 2π],    i.e., from magenta to yellow) then the cell belongs to the first class (Golde score of 0: no blue pixels).If the mean hue is close to blue (h ∈ [π; 3 × π/2], i.e., from cyan to deep blue), the cell belongs to the third class (Golde score higher than 1: substantial presence of blue pixels).If the mean hue does not correspond to one of the two precedent configurations, i.e., the hue is between magenta and deep blue, the cell belongs to the second class (Golde score of 1: very few blue pixels).For this first classification based on hue, we assume that the hue of any pixel in an image colored by Prussian blue never belongs to [π/3; π] which is coherent with the staining process.9a, where the label LUT is provided by Fig. 9c.Cells having a score higher than one have to be distributed among the possible scores ranging from one to four.The main information that can be used to quantify the score of cells, which increases with the degree of staining, are color and staining intensity.Therefore, for all the candidate cells, their colors and staining intensity are analyzed.In terms of color two attributes are considered: -The mean hue of the cell h -The standard deviation of the hue of the cell σ h , In terms of staining intensity (which is close to extracting the texture), the pixels of each cell are classified among three classes: Intense staining (I ), Middle staining (M ), Low staining (L ) using the rules defined by Wolf et al. (1995) to partition the texture of cells in three classes.Fig. 10a presents the extraction of the different classes of staining intensity for cells of interest (the ones for which the Golde score is assumed to be higher than one, see Fig. 9b).In Fig. 10a, gray level corresponds to a class of staining intensity: I in dark gray, M in white and L in light gray.For a cell, three staining degree indicators are considered, one by class of cell pixels.For all the pixels of a cell belonging to a given staining class C among the three above mentioned, we consider the following quantity: (1) where p denotes a pixel in a cell, St(p) the staining class of a pixel (St(p) ∈ {I , M , L }), h(p) the hue of the pixel p, s(p) the saturation of the pixel p, q is a 8neighbor of p and n C the number of pixels of class C in the cell.To a cell are therefore attached three staining degree indicators, one by staining class of pixels.T C quantifies, for a given class of pixels, the mean color dispersion (considered as the product of hue and saturation) around the pixels of this class.Hue is an information the relevance of which is variable (Carron and Lambert, 1994).When color is highly saturated (chromatic pixels), hue is very relevant, and when saturation is low (achromatic pixels), hue is irrelevant.Therefore we blend hue and saturation together for a better accuracy of color dispersion measure.
All the cells are then classified among the remaining four possible Golde scores (from 1 to 4) with a k-means classifier.Each cell is described by a vector of five features (h, σ h , T I , T M , T L ).

RESULTS AND DISCUSSION
We have applied the proposed segmentation and classification scheme on 50 images of BAL stained by Prussian blue.These images were taken at random for a given slide sample.Once the whole scheme is performed, 5 sets of cells are obtained which corresponds to the different scores cells can have.Figs.11-15 present these sets on the pool of tested images.For Cells of scores 0 and 1, only subsets of the whole set of cells are provided.A visual evaluation of the different categories of cells enables us to see that they effectively correspond to an increasing hemosiderin content (the blue coloration becomes more intense and distributed in cells).At the end of the whole segmentation and classification scheme we propose, the computation of the Golde score is however still not possible.Indeed, the Golde score has to be computed on macrophages and the cellular component that have been extracted and classified by our scheme are not necessarily macrophages: one can for instance encounter lymphocytes.This will be done in future works.However, since each extracted cell has to be qualified as macrophage or non macrophage, a reference database of macrophages and non macrophages has to be constructed.The problem of designing a cell as a macrophage then becomes a supervised machine learning problem as opposed to all the classification techniques used in our scheme which are unsupervised.Therefore, to enable to compute the Golde score, we are currently working to the building of a learned model of macrophages.

CONCLUSION
An automated robust approach for the segmentation and classification of cells from broncho alveolar Lavage stained by Prussian blue has been presented.Segmentation is based on an hybrid combination of bivariate histogram clustering which is spatially refined by a watershed.Classification is based on a priori information on cells to classify.A first coarse classification uses mean hue to extract initial classes of cells.For cells having a blue hue, the staining intensity is analyzed according to its repartition in the cell.The whole segmentation and classification scheme is unsupervised since no parameter is used in the segmentation and unsupervised classifiers are used in the classification.The preliminary experimental results allow us to conclude that the algorithms perform well for the images acquired by our imaging system.But anyway, the system must be extensively tested on a more important number of images.On the other hand, some ways of improvement are currently under investigation.The first one we are currently investigating is to categorize extracted cells as macrophages or non macrophages in order to be able to compute the Golde score.

Fig. 1 .
Fig. 1.A sample color image stained by Prussian blue.
): A) Clustering, which extracts the main components of the image according to their color, B) Spatial refinement, which refines the boundaries of objects obtained from the clustering, C) Cell separation, which disaggregates touching or clustered cells, D) Cell classification, which classifies cells according to their hue and staining repartition.

Fig. 3 .
Fig. 3. 3D RGB color space projection and corresponding projections on band-pair planes (2D histograms) for the image of Fig. 1.
, which represents the projections, are in color for understanding purposes but a 2D histogram is a gray-level image of reduced size (256 × 256).Fig.4ashows such an image which has been down sampled by a factor 4 for speed computation reasons (the histogram has therefore a size of 64 × 64).This downsampling has few influence on the final clustering and simplifies it.To cluster this 2D histogram, we assume that the different objects of the image are present in the histogram around the dominant clusters (homogeneous regions in this gray-level image).Obviously these clusters also correspond to clusters in the 3D case.The main clusters of a 2D histogram (the main connected regions) are considered as the cluster centroids.The clustering of a 2D histogram H proceeds in several steps.Since the histograms are generally noisy (this is mainly due to the sparseness of the colors in the images), the latter is smoothed by mean curvature non linear diffusion named ϕ.4b presents the result of this smoothing.The result ϕ(H ) is reconstructed (by a morphological reconstruction process ψ) in the original histogram image H to obtain a noiseless and regularized version of the histogram H (2) = ψ(ϕ(H ), H ). Fig.4cpresents the result of this reconstruction.From this regularized histogram image we seek the main clusters with a k-means clustering.k = 3 since we want to extract three classes of pixels the color of which corresponds to background, mucus elements and cells.To ensure reproducibility, cluster centers are not initialized at random but spreaded out over the luminance axis in the band-pair plane projection.Fig.4dpresents the result of the clustering.

Fig. 4 .
Fig. 4. Illustration of the steps of the 2D clustering of the RG histogram of Fig. 1.

Fig. 5 .
Fig. 5. Clustering map of the clustered RG histogram of Fig. 4d, and the associated segmentation map after labelling connected regions.
Spatially refined clustering map.(e) Simplified segmentation map.

Fig. 6 .
Fig. 6.Spatial refinement of the clustering and segmentation maps.

Fig
Fig.9bpresents the corresponding classification on the cells depicted in Fig.9a, where the label LUT Fig. 10.Second classif ication based on color and staining intensity of cells.

Table 1 .
Number of cells in each class.