INTELLIGENT DETECTION AND CLASSIFICATION OF MICRO-CALCIFICATION IN COMPRESSED MAMMOGRAM IMAGE

The main contribution of this article is introducing an intelligent classifier to distinguish between benign and malignant areas of micro-calcification in companded mammogram image which is not proved or addressed elsewhere. This method does not require any manual processing technique for classification, thus it can be assimilated for identifying benign and malignant areas in intelligent way. Moreover it gives good classification responses for compressed mammogram image. The goal of the proposed method is twofold: one is to preserve the details in Region of Interest (ROI) at low bit rate without affecting the diagnostic related information and second is to classify and segment the micro-calcification area in reconstructed mammogram image with high accuracy. The prime contribution of this work is that details of ROI and NonROI regions extracted using multi-wavelet transform are coded at variable bit rate using proposed Region Based Set Partitioning in Hierarchical Trees (RBSPIHT) before storing or transmitting the image. Image reconstructed during retrieval or at the receiving end is preprocessed to remove the channel noise and to enhance the diagnostic contrast information. Then the preprocessed image is classified as normal or abnormal (benign or malignant) using Probabilistic neural network. Segmentation of cancerous region is done using Fuzzy C-means Clustering (FCC) algorithm and the cancerous area is computed. The experimental result shows that the proposed model performance is good at achieving high sensitivity of 97.27%, specificity of 94.38% at an average compression rate and Peak Signal to Noise Ratio (PSNR) of 0.5bpp and 58dB respectively.


INTRODUCTION
The motivation behind the proposed work is that report of World Health Organisation (WHO) says, breast cancer is one among the ten leading causes of death among female in the high and middle income countries during the last decade (Kamangar et al., 2006).Mammographic screening of women can reduce breast cancer mortality generating a large volume of mammograms requires huge storage space and efficient display device (Koning et al., 1995;Sickles, 1997).For optimizing storage space and bandwidth, size of the mammogram image is to be reduced for which compression techniques are the optimal solution.In this work, to preserve the diagnostic information lossless compression is done at Region of Interest (ROI) and lossy compression in Non-ROI.
Aiming at early diagnosis of breast cancer, computerized schemes have been developed for the detection of the cancerous areas in digital mammograms (Chan et al., 1987;Davies and Dance, 1992;Yoshida et al., 1996;Lee et al., 2000;Yu and Guan, 2000;Verma and Zakos, 2001).The mammographic appearance of the normal breast can vary depending on the age and genetic factors.The significant features indicating whether a mass is benign or malignant are its shape and characteristics of margin (Michael and Torosian, 2002).An early indication of breast cancer is micro calcification (Dhawan and Royer, 1988;Olson et al., 1988).The automated detection of micro calcification is helpful in early treatment of breast cancer but its detection is difficult due to its fuzzy nature.The micro-calcification is very small in size about a millimeter (mm) whereas its shape and size varies periodically.In an image, micro-calcification remains at region of low contrast and high frequency.In this work, Classification of micro-calcification as benign or malignant is done based on multi-wavelet features using PNN (Probabilistic Neural Network).
The earlier work related to the proposed methodology is presented herewith, Perlmutter et al. (1997), proposed a region of interest coding using embedded wavelet coding scheme in which ROI and Non-ROI regions in mammogram images are selected and coded at different rate.This elevates the compression ratio but suffers from loss of diagnosis information.Further an investigation has been done on the effect of JPEG 2000 in classification of images by Penedo et al. (2006), the result shows that classification accuracy is affected when compression ratio exceeds 40:1.Rapesta et al. (2011), proposed that usage of multiple ROI in series increases the diagnosing performance.The work takes the advantage of ROI's defined by radiologists or Computer Aided Design (CAD) system for different devices.This technique maintains the coding performance but leads to high computational complexity.
Various ROI based compression techniques using variable encoding techniques have been proposed in the literature (Duchowski and McCormick, 1995;Said and Pearlman, 1996;Tasdoken and Cuhadar, 2003;Xie and Shen, 2004;Bao et al., 2006;Liu and Pearlman , 2006;Yushin and Pearlman, 2007), where the quality of image is varied according to the diagnosing requirements.These techniques reduce the memory storage and access time but suffer from quality and computational issues.The work proposed by Hsu (2012), used watershed algorithm and vector quantization for various regions of digital mammogram which reduces the size of mammogram image with good picture quality.This technique is an amalgamation of various other techniques which requires more time and computational overhead.From the literature it is evident that transform based mammogram image compression techniques proves to be efficient in terms of compression ratio and image quality.Even though various transforms such as families of wavelet, curvelet, contourlet, and multi wavelet have been used in literature for compressing mammogram images, out of which multi-wavelet seems to be promising in preserving the diagnosing information at low bit rate.
The system for mammography is made up of preprocessing, detection and classification phases.The presence of micro-calcifications is an important sign for early detection of breast cancer.Hence it is required to filter the micro-calcification region from other region to diagnose.This work proposes an automated multilevel classification of companded mammogram image to detect the nature of cancer and the affected area.
Wavelet transform based Computer aided detection of micro-calcification of mammogram image is proposed by Boccignone et al. (2000).In this technique the mammogram image is transformed using wavelet transform and then the classification of calcification and background region is done using region information at the different decomposition levels.In this system accuracy is elevated but high frequency information are not preserved which reduces the diagnosing efficiency.CAD mammography system for detection and classification of micro-calcification is proposed by Lee et al. (2000), for automatic detection of micro-calcification.The various modules and techniques of CAD system for mammogram image classification have been discussed with its results, merits and demerits in the article by Cheng et al. (2003).Kestener et al. (2001) proposed a wavelet transform to perform a multifractral analysis of digitized mammograms.The texture discriminatory power of wavelet transform leads to significant improvement in computer assisted diagnosis in digitized mammograms.Fuzzy and scale space based approaches for detection of micro-calcification have been proposed by Cheng et al. (2004).Based on fuzzy entropy the image is fuzzified and further the image is enhanced and classified using scale space and Gaussian filter.This approach detects the micro-calcification even in dense breast whereas it suffers from high computational time.Analysis between multi-wavelet, wavelet, haralick and shape based features are carried out in the work proposed by Zadeh et al. (2004) and the result shows that multi-wavelet transform outperforms other approaches in classification of micro-calcification.
Diagnosis of breast cancer as normal or abnormal using wavelet and fuzzy approaches is proposed by Mousa et al. (2005), which produces high accuracy is classification but suffers from reduced memorization ability.To increase the memorization ability without reducing the ability of the network, novel network architecture and learning algorithm for classification of mass abnormalities is proposed by Verma (2008).This architecture uses an additional neuron in the hidden layer to increase the memorization of training data and accuracy.The survey of automatic mass detection and segmentation is provided in the article of Oliver et al. (2010), where the analysis of various approaches is being done in terms of receiving operating characteristic and free receiver operating characteristic.Classification of benign and malignant masses based on Zernike moments is proposed by Tahmasbi et al. (2011).In this work Zernike moments are extracted from the preprocessed image and further features are extracted to classify the most effective moments using multi-layer perception.This technique reduces the false negative and optimizes the false positive.
Automated segmentation and classification based on breast density and asymmetry is done by Tzikopoulos et al. (2011) and that proves to be efficient in terms of accuracy but suffers from computational and time overheads.Wavelet domain and polynomial classifier based classification of masses proposed by Nascimento et al. (2013), proves to be efficient in mass classification but leads to higher access time.The automatic classification in the system comprises of four distinct modules such as preprocessing module which separates the breast region from background region, finder module which locates the region of interest, detection module which is used to detect the calcified areas and classification module which classifies the calcified areas.This automated system is highly flexible with reliability but suffers from computational complexity.
All the work discussed above is found to have higher computational overhead due to the large volume of image and single level of classification.The aforementioned issues could be resolved by the proposed novel approach that classifies the companded mammogram image at various levels.The reduction of computational overheads in classification could be achieved by preserving the high frequency information in companded mammogram images.
The captured mammogram images are huge in size and range which requires enormous storage space and transmission time.Trying to reduce the size according to the display device will affect the diagnostic information.The proposed work resolves the problem by compressing the image and then reconstructing the image without affecting the diagnostic information.This process of compression-reconstruction could be done during storage-retrieval or transmission-reception.Quality of reconstructed images is validated by clas-sification using multi-wavelet based feature extraction and PNN.To assist the existing digital mammography system, the proposed methodology consists of following sub components  Multi-wavelet based image compression scheme combining the idea of coefficient reorganization and Region Based SPIHT (Spherical Partitioning in Hierarchical Tree).
 A multi-wavelet based micro-calcification feature detection scheme.
 PNN for classification of the breast image as normal or abnormal.Abnormality could be defined either as benign or malignant.
 Fuzzy C-means clustering for segmentation of cancerous region.

METHODS AND MATERIALS
The proposed architecture in Fig. 1

COMPANDING USING MULTIWAVELET TRANSFORM AND REGION BASED SPIHT
This component aims to preserve the diagnostic details of the image using multi-wavelet transform and Region Based SPIHT algorithm.The original image is transformed using multi-wavelet transform and further ROI and Non-ROI regions are coded separately at the bit rate of R1 and R2 respectively using SPIHT encoder.

MULTI-WAVELET TRANSFORM
Representing images in multi-scale is beneficial to various image processing applications.Wavelets are used extensively in multi-scale representation which is essentially used in compression and classification of images where the images are decomposed into detail and approximate sub images (Mallat, 1989).More than wavelet, multi-wavelet is a powerful multiscale analysis tool which uses numerous scaling function and mother wavelets (Strela et al., 1999;Shen et al., 2000).Orthogonality, short support, symmetry and number of simultaneous vanishing moments are the important properties of multi-wavelet transform which is beneficial in various image processing applications.
Multi-wavelet representation has k scaling functions φ1………φk , which satisfies the following equation where -L represents a low pass Quadrature Mirror Filter (QMF) and 2 is the normalization of the scaling function at scale 2. For each scaling function there exists a multi-wavelet vector ψ which satisfies the following equation where H represents a high pass QMF.
Multi-wavelet decomposition of mammogram images is done using the method proposed by Geronimo et al. (1994), in which filtering can be done either using critically sampled or over sampled methodologies, whereas in the proposed work sampling has been done using critical sampling method.From the decomposed image high frequency edges and low contrast information are preserved for improving the diagnosing efficiency.

REGION BASED SPIHT ALGORITHM
Enhanced version of Embedded Zero tree Wavelet (EZW) coding is SPIHT algorithm where coding is done by finding the self similarity of the transformed image coefficients across each band.In the proposed work, ROI and Non-ROI are identified from the transformed image and encoded at different bit rates R1 and R2 respectively using maximum shift method (Christopoulos et al., 2000).

PREPROCESSING AND FEATURE EXTRACTION
The reconstructed images are fed as input to preprocessing and feature extraction phase.The input images are compressed at various bit rates; hence there is less need to concentrate on preprocessing of images compressed at lower bit rates.

PREPROCESSING
The preprocessing phase of the proposed system is focused at removal of channel noise, enhancing the contrast and for removal of the background of mammogram images.The ROI containing abnormalities are separated from the background and further features are computed from the ROI.Channel noise is considered as salt and pepper noise which is removed using median filter (Jae, 1990) whereas histogram equalization technique (Nunes et al., 1999) is used to enhance the contrast and Otsu Global threshold (Otsu, 1979) is used for extracting the background from ROI. Otsu threshold minimizes the intra-class variance between two different pixels using bimodal distribution of gray level values.The preprocessed output images are shown in Fig. 2.

GLCM FEATURE EXTRACTION USING MULTI-WAVELET TRANSFORM
Features are extracted from preprocessed images using Gray Level Co-occurrence Matrix (GLCM) (Haralick et al., 1973) obtained from spectral domain.
Transformation of ROI from spatial to spectral domain is done using multi-wavelet transform as discussed above.Texture Features required for classification are extracted from transformed coefficients using GLCM.Let P[i,j] define a position vector and A is an n×n matrix.

CLASSIFICATION AND SEGMENTATION OF CANCEROUS AREA
Artificial Neural Network (ANN) is proved to be efficient for detecting micro-calcification in mammogram images with higher accuracy and lesser time.In Literature there are various neural networks such as multi-layer perception (MLP), RBF, Self Organizing Map (SOM) and Probabilistic Neural Network (PNN) which is used in classification of calcification.Out of which RBF and PNN are proved to be efficient in terms of accuracy and time (Sarvestan et al., 2010).)

PNN ALGORITHM FOR TRAINING PHASE
Step 1: From the MIAS Database, extract the GLCM feature vectors of transformed coefficients obtained through multiwavelet transform, then assign classes and class numbers.In this work class number k = 3 Step 2: Sort into k different sets such that each set belongs to single class.
Step 3: For each k Define Gaussian function corresponding to the class Find the sum of gamma output function

PNN ALGORITHM FOR TESTING PHASE
Step 1: From the MIAS Database, extract the test input GLCM feature vectors of transformed coefficients obtained through multi-wavelet transform which is given as input to the input node of PNN.
Step 2: Estimate Gaussian value for each group at the hidden nodes Step 3: The Gaussian values are given as input to the single output node Step 4: Sum all the inputs to the output node and multiply by a optimal constant Step 5: Find the maximum of classes; assign 1 for maximum and 0 for other classes

FUZZY C-MEANS ALGORITHM FOR SEGMENTATION OF CANCEROUS AREA
To segment the cancerous area from other regions effective Fuzzy C-means clustering algorithm (Dunn, 1973;Bezdek, 1981) where the operator represents the normalized, membership function m (m > 1) which is a weighting exponent function that determines the rate of fuzziness on each partition, V k represents the centroids of each cluster , U k denotes the degree of membership such that X i is a member of k th cluster, the matrix V is of the order C  n and V is a matrix of the order S  C. The Eq. 5 is modified using Lagrange multiplier which yields U ki and V k membership function U ki and cluster centre V k given as

ALGORITHM FOR SEGMENTATION USING FUZZY-C MEANS ALGORITHM:
Step 1: Read the input mammogram image and decide the number of clusters C. In this work C = 3 Step 2: Assign the value of  (threshold) and number of iteration as T.
Step 3: Assign the cluster centres, Step 4: Evaluate the degree of membership function using Eq.3 Step 5: Evaluate the centres of clusters using Eq.4 or the number of iteration q > t then write the output as clustering output, or else q = q + 1 go to Step4 Step 7: Extract the cancerous area from clustered output; perform morphological operations to calculate the area of the cancerous region.where f(i,k) and are the original and reconstructed images respectively.The original, transformed, compressed and reconstructed images are shown in Fig. 4.

) , ( ˆk i f
Efficiency of the compression system is numerically computed using Bit Rate (BR) defined as

PERFORMANCE OF THE COMPRESSION SYSTEM
The impact of proposed compression technique in mammogram images are numerically computed by using Peak Signal to Noise Ratio (PSNR), which is given as Where N is the number of pixels in the input mammogram image and B c is the number of bits required to represent compressed image.The entropy before and after transformation is computed to indicate the performance of multi-wavelet transforms.The numerical values for PSNR, MSE, Bit Rate and Entropy are computed for the set of normal, benign and malignant images which are shown in Table 1.The closely related work Somasundaram and Palaniappan (2011), is used for comparison.In this work the facial features are compressed in two bit streams using wavelet and RBSPIHT.To compare the proposed work with existing works; set of images from database are used to validate each algorithm and the comparison is shown in Table 2.For the varying Bit Rates, PSNR is computed for normal, benign and malignant images and the results are shown in Figs.5-7.

GLCM FEATURE EXTRACTION
GLCM Feature vectors used in the PNN classification are stated below and the extracted features from sample images in MIAS database are shown in Table 3 and Table 4.
where p is the probability of occurrence of a particular pixel value.
Energy: Ability to detect and visualize microcalcification can be improved using energy vector computation.Energy of mammogram image is computed by squaring and summing the pixels in transformed image and is given by Homogeneity: Closeness of the distribution in pixel elements of ROI in mammogram image is computed using homogeneity and is given as Kurtosis: Estimated kurtosis value can distinguish between the benign and malignant micro-calcification through peaks and flat probability distribution which is given by where I is the intensity of pixel value at x,y.
Contrast: Contrast features extracted are used in classification to locate micro-calcification.Contrast information is estimated as . ( 13) 2 Entropy: The statistical evaluation of randomness which characterizes the texture features in mammogram image is said to be entropy and is given by where I(x-y) represent intensity of pixel, N represents the number of samples in circle lines, σ represents standard deviation and m k represents the mean of subbands.For benign micro-calcification the distribution appears to be flat and for malignant micro-calcification it appears to be peak.
Image Anal Stereol 2015;34:183-198 Variance: The pixel intensities vary depending on mammogram image characteristics.This variation can be used for classification of micro-calcification which can be estimated as given below Standard deviation: Correlation: Nearby pixels of mammogram image are highly correlated which helps in identifying the similar regions.Correlation is estimated as Skewness: This parameter indicates the lack of symmetry in distribution of pixels.This estimation gives an idea about symmetry and lack of symmetry within ROI in mammogram image.Skewness is estimated as

PERFORMANCE ANALYSIS OF PNN CLASSIFIER
Optimal selection of sigma value is essential to train the PNN network for controlling the speed of Radial basis function (RBF).In the proposed work, Conjugate gradient algorithm (Sherrod, 2012)   ( Mathews Correlation Coefficient MCC (Yua et al., 2007): MCC is estimated to check the performance of classifier.The range of MCC lies between -1 to 1.If MCC is larger it indicates that the performance of classifier is good, MCC is given as F-measure: F-measure is the weighted average of precision and recall.For precision, the weights are varied using the variable β which is of the range 0 to infinity.F-measure is given as The confusion matrices are generated for various cases and the above parameters are calculated as shown in Tables 5-7.Further, comparison between the Multi Layer Perceptron (MLP), Radial Basis Functions (RBF) and PNN is given in Table 8.The performance index in Table 8 shows that PNN is optimal when compared with RBF and MLP.Receiver Operating Characteristic (ROC) curve, is a graphical plot used to indicate the performance of a classifier.In this work ROC curve is used to analyze the characteristics of classifier which is drawn using the True Positive Rate (TPR) against False Positive Rate (FPR) values.If the curve lies on the left corner of axis in the space, then the sensitivity and specificity are 100%, indicating the performance of classifier is good.If the deviation is more from the left corner then the value of sensitivity and specificity is less indicating poor performance of classifier.In this work, the ROC curve shown in Fig. 8 lies just below the left corner indicating good performance.The images which are identified as malignant or benign are passed to the segmentation phase.Using Fuzzy C-means clustering algorithm, malignant and benign regions are extracted (Fig. 9).The area of the extracted region is computed for benign and malignant images and is found to be greater than 0.41 mm for malignant and less than 0.41 mm for benign in average (Table 9) .

DISCUSSION
Size of the mammogram image poses a major challenge in storage and transmission system.Even though various standards are been developed for compressing mammogram images, diagnostic information is not preserved.It is found that region based compression doesn't affect the diagnosing efficiency as features of the image are preserved.Hence Region based encoding of mammogram image using multi-wavelet and region based SPIHT is proposed and the quality of reconstructed image is validated using classifier.
From the MIAS database, 50 normal images, 50 benign images and 50 malignant images are used for the experiment.The PSNR value remains almost same for the proposed work with SPIHT at and above the bit rate of 0.6 and shows more variation when less than 0.6 bpp.It is sufficient to encode the image at 0.5 bpp to preserve the diagnostic information.Usage of multiwavelet transform in place of wavelet transform has increased the PSNR than the existing work (Somasundaram and Palaniappan, 2011), used for facial feature compression.In this work the images are reconstructed at 0.6 bpp and then further proces-Image Anal Stereol 2015;34:183-198 sed.The reconstructed images are preprocessed and classified either as benign or malignant using PNN classifier.The performance of the PNN classifier is good at classification in comparison with MLP and RBF.The ROC curve lies below the left corner indicating the efficiency of PNN classifier as shown in Fig. 8.The image is segmented using Fuzzy Cmeans clustering algorithm to compute the area of benign and malignant images.The average area is computed as greater than 0.41mm for malignant images and less than 0.41mm for benign images The proposed combinatorial algorithm proves to be efficient in compression, feature extraction, classification and segmentation of mammogram images.The primary advantage of the proposed work is that high frequency details are preserved in mammogram images due to region based companding.The usage of PNN classifier for classification and Fuzzy Cmeans clustering for segmentation provides the secondary advantage of providing high accuracy in diagnosing benign and malignant regions.The proposed work is tested currently on MIAS database which could also be tested using other database.Further the work can be extended by using various mathematical transforms and mapped encoding techniques.

Fig. 2 .
Fig. 2. Phases of preprocessing (Red circle indicating ROI regions).PNN ARCHITECTURE FOR CLASSIFICATION OF MICRO-CALCIFICATIONProbabilistic Neural Network (PNN) consists of three layers (Fig.3): input layer, hidden layer and output layer.PNN can recognize k different classes; in this work three different classes are used.N nodes are present in the input layer; each node represents a feature vector of mammogram.The output of each input node is mapped to all nodes in the k classes of hidden layer, hence all nodes in hidden layer receives

Fig. 6 .
Fig. 6.Average PSNR for benign images against bit rate in bpp.

Fig. 7 .
Fig. 7. Average PSNR for malignant images against bit rate in bpp.
determines the efficiency of classifier in terms of true positive and true negatives indicating the proportion of true results.

Table 1 .
Entropy, compression rate, MSE and PSNR for sample normal and abnormal images.
Fig. 5. Average PSNR for normal images against bit rate in bpp.

Table 3 .
Contrast, correlation, energy and homogeneity features estimated from sample mammogram images.

Table 4 .
Variance, standard deviation, kurtosis, entropy and skewness features extracted from sample mammogram images.

Table 5 .
Classification results of PNN classifier (benign Vs normal).

Table 6 .
Classification results of PNN classifier (malignant Vs normal).
Fig. 8. ROC for detecting benign and malignant tumor using PNN classifier.

Table 9 .
Area computed for sample benign and malignant images in MIAS database.