MULTI-FEATURE MUTUAL INFORMATION IMAGE REGISTRATION

,


INTRODUCTION
Geometric alignment or registration of images is a fundamental task in numerous computer-assisted applications in medical imaging (Hawkes, 1998;Maintz and Viergever, 1998).Finding spatial relations between two or more images of the same modality and of the same subject is useful for observing and quantifying changes that anatomical structures undergo during a period of time, whereas registration of multimodal images is important for combining the complementary information of different imaging modalities.Furthermore, registration of images of different subjects allows comparative studies, which are needed to capture how a structure and its function vary in large populations, across age and gender and in different disease states.Registration algorithms can be classified according to the image features and the feature correspondence criterion used to compute geometric transformations between images.According to image features, registration algorithms can be divided into: fiducial-based, segmentation-based, and image-based.Image-based registration methods optimize a similarity measure between image features, which are usually intensities of image elements, i.e., pixels or voxels.The comparison and evaluation of retrospective intermodality brain registration techniques has shown that the best registration results were obtained by the imagebased registration technique using mutual information as the similarity measure (West et al., 1997).
Mutual information was first proposed for the purpose of image registration by Collignon et al. (1995) and Viola and Wells (1995), independently.The idea was further developed in their later publications (Wells et al., 1996;Maes et al., 1997).Since then, the mutual information measure, as well as its derivatives, have become the most popular and most studied similarity measure in the medical image registration community.The original idea of Collignon et al. (1995) and Viola and Wells (1995) has been implemented and modified by various researchers in their registration studies, resulting in numerous publications, a survey of which can be found in Pluim et al. (2003) and Maes et al. (2003).The main reason for the frequent use of the mutual information measure is that it only assumes that image features have a probabilistic relationship which is the loosest of assumptions which can be made.However, mutual information based registration can fail when statistical dependence between image features is weak or as a result of interpolation artifacts.Weak dependence between image intensities can either be due to fundamentally different imaging modalities, such as MR and ultrasound (Roche et al., 2000), noise (Tsao, 2003), low resolution of images (Pluim et al., 2000b) or small image overlap (Studholme et al., 1999).Interpolation artifacts are often enhanced due to image noise and low resolution of images (Pluim et al., 2000b;Tsao, 2003), but can be reduced by higher order interpolation schemes (Thevenaz and Unser, 2000).Because the traditional mutual information similarity measure ignores the dependency of feature values of neighboring image elements, some researchers argued that the problem of weak statistical dependence of intensities could be resolved by adding explicit spatial information into the registration process (Pluim et al., 2000a;Rueckert et al., 2000).Rodríguez-Carranza and Loew (2000) proposed to use the Jumarie S-entropy, which considers the difference of gray values of neighboring image elements.Pluim et al. (2000a) incorporated spatial information by multiplying the mutual information similarity measure with the similarity measure of gradients of corresponding points.Their results show that without the loss of accuracy the combined criterion is more robust than standard mutual information.However, ad hoc multiplication of two or more different criteria opens the problem of controlling the amount of contribution of each criterion.Rueckert et al. (2000) introduced an extension of mutual information called second order mutual information, which was estimated from the co-occurrence of gray values of neighboring image elements within the image and gray values of corresponding image elements of images to be registered.The approach requires estimation of joint probability distributions from a four-dimensional joint feature histogram.A potential disadvantage of this approach is that the high dimensionality of the joint histogram decreases the power of histogram statistics and makes estimation of the similarity measure less reliable.Estimation of high dimensional probability distributions can be avoided if entropy can be estimated directly from feature samples.In the work of Hero et al. (2002) it was shown that joint entropy can be efficiently estimated from the length of minimum spanning trees (MST) that connect all samples in the feature space.For multi-feature registration, the MST approach was implemented in the work of Sabuncu and Ramadge (2003), Neemuchwala et al. (2005), Neemuchwala et al. (2006) and Staring et al. (2009).A potential disadvantage of MST registration as introduced by Hero et al. (2002) is the time-consuming optimization that is required to obtain a minimum spanning tree for each evaluation of similarity measure.Sabuncu and Ramadge (2008) have shown how the alignment measure and a descent direction with respect to alignment parameters can be determined directly from MST, which makes MST-based registration more efficient.A different approach to more effective joint entropy estimation has been introduced by Špiclin et al. (2012), where the authors propose gradient based joint entropy minimization by implementation of Hilbert kernel joint density estimation.
In this paper we propose an extension of the standard single-feature mutual information similarity measure to a multi-feature mutual information measure, and solve the problem of efficient estimation of feature joint probability distribution in a high-dimensional feature space.The proposed measure incorporates spatial information by combining the commonly used image intensity feature with additional features.Additional features, such as intensity gradients or second order derivatives may be derived from basic features or obtained by multi-channel imaging.Furthermore, the proposed multi-feature mutual information measure can be computed for an arbitrary number of additional features, as long as the probability distributions of these features are approximately normal.

MATERIALS AND METHODS
Image registration is concerned with finding the geometrical transformation T that brings a floating image A into the best possible spatial correspondence with a reference image B. Let images A and B be represented by image features z a (x) and z b (x), respectively, where z(x) is the value of image feature z at position x in image space.Let SM denote the criterion function or similarity measure between two corresponding feature sets z a (x) and z b (T(x)) for a given transformation T. Registration is then concerned with finding the spatial transformation T ˆ that maximizes the similarity measure SM (1) The mutual information (MI) similarity measure assumes that image features z a (x) and z b (x) are observed values of random variables Z a and Z b , respectively.In terms of entropy, MI is defined as (Cover and Thomas, 1991) where H(Z a ) and H(Z b ) are corresponding entropies of random variables Z a and Z b .The entropy H(Z) of a random variable Z (Z a or Z b ) is defined as where p(z) (p(z a ) or p(z b )) is the probability density function describing the probability of feature z (z a or z b ).The third term in (Eq.2), H(Z a , Z b ), is the joint entropy of random variables Z a and Z b and is defined as where p(z a ,z b ) is the joint probability density function, defining the probability of feature values z a and z b appearing at corresponding positions x and T(x) in images A and B, respectively.In case of discrete values of image features, calculation of the entropy and the joint entropy can be simplified by calculating Shannon's entropy of discrete random variables ( ) ( )log ( ) In general, entropy is a measure of the amount of uncertainty, variability, or complexity of a random variable, whereas mutual information of two random variables is a measure of the amount of uncertainty of one random variable, knowing the value of the other variable.

NEW MULTI-FEATURE MUTUAL INFORMATION MEASURE
Let us extend the image representation from a single-feature z(x) to a multi-feature representation z(x), where z(x) is a vector function z(x) = (z 1 (x),…, z K (x)) defining the value of features z k (x), k = 1…K, at position x in image space.The values of features z k (x) are observed values of random variable Z k , comprising the vector of random variables Z. Multifeature mutual information of corresponding vectors of random variables Z a and Z b is defined as , respectively.In general, the entropy of a vector of random variables Z = (Z 1 ,…,Z K ) is defined as while in case when (Z 1 ,…,Z K ) are discrete random variables, the entropy H(Z) may be obtained as As with single-feature mutual information, the probabilities p(z a ), p(z b ) and p(z a ,z b ) can be estimated from a multi-feature joint histogram h(z a ,z b ).Unfortunately, even in case of two features, the fourdimensional joint histogram h(z a ,z b ) would be so sparse that a correct estimation of joint probabilities would become practically impossible.Fortunately, by making certain assumptions on feature probability distributions, the estimation of joint probability in a high dimensional feature space can be solved efficiently.
Let the features in image feature vector z(x) be divided into feature i(x), which we call the basic feature, and the vector v(x) of remaining features, which we call additional features, i.e., z(x)=(i(x),v(x)) and v(x)=(v 1 (x),…,v K-1 (x)).Using the known property of entropy (Cover and Thomas 1991) the multi-feature mutual information defined by (Eq. 6) may be decomposed into mutual information MI(I a ,I b ) of basic features i a (x) and i b (x), and additional information where If basic features i a (x) and i b (x) are observed values of discrete random variables I a and I b then the conditional entropies can be obtained as (Cover and Thomas 1991) where p(i) stands for probabilities p(i a ), p(i b ) or p(i a ,i b ), and Suppose that the conditional distributions V a |i a , V b |i b and V a V b |i a i b , are approximately normal.If random variables in feature vector V are distributed normally, then their joint probability p(v) may be defined by the vector of mean values μ v and the covariance matrix Σ v .Furthermore, the entropy of V can be easily assessed (Cover and Thomas, 1991) from the determinant of the covariance matrix Σ v as where V a ,V b |i a i b , respectively.In conclusion, the estimation of conditional entropies H(V|i) for each intensity value i (Eq.13) through covariance matrices requires much fewer samples than the estimation through high dimensional histograms, which makes the proposed approach more effective.Conditional entropies H(V|i) of all intensity values i are used to estimate conditional entropies H(V|I) (Eq.12), defining the information AI(I a ,I b ,V a ,V b ).

EXPERIMENTAL DATA SETS
Two different image data sets were used to evaluate the proposed multi-feature mutual information similarity measure.The first data set consisted of MR-T1 and CT images of vertebra L3 of a lumbar spine phantom (Fig. 1), which was originally designed to evaluate 3D/2D registrations (Tomaževič et al., 2003).The MR and CT images were acquired with intra-slice resolutions of 0.39 x 0.39 mm and 0.27 x 0.27 mm and inter-slice resolutions of 1.9 mm and 1 mm, respectively.A sub-volume, containing the whole vertebra L3 and some of the two neighboring vertebrae was manually defined for each MR and CT image.The sizes of MR and CT sub-volumes were 238 x 238 x 22 and 341 x 340 x 43 voxels, respectively.The "gold standard" rigid registration was established by minimizing the distance between the centers of six fiducial markers rigidly attached to the spine phantom and visible on both MR and CT images.The accuracy of "gold standard" registration was obtained by estimating the target registration error (TRE) from the residual of fiducial marker registration (Fitzpatrick et al., 1998).For vertebra L3, the expected TRE of "gold standard" MR to CT image registration was estimated to be 0.31 mm.A detailed description of image acquisition and "gold standard" registration can be found in Tomaževič et al. (2004).To evaluate the performance of the proposed registration algorithm in case of poor image quality, the MR image was artificially corrupted by adding intensity inhomogeneity.Intensity inhomogeneity was simulated by changing the original intensities i(x) to i'(x) by applying the following inhomogeneity model where x 0 was the spatial position where inhomogeneity was centered and ε was the inhomogeneity scale factor.The position x 0 was set at the image corner x 0 = (0,0,0), while four different scale factors ε were implemented (ε = 5⋅10 -5 , 15⋅10 -5 , 25⋅10 -5 , 35⋅10 -5 ) to obtain MR images with different levels of inhomogeneity (Fig. 2).The second image data set consisted of PET images of a human head and corresponding proton density (PD), T1 relaxation time (T1), and T2 relaxation time (T2) magnetic resonance (MR) images (Fig. 3).

Fig. 3. Cross-sections of corresponding PET (top left), PD (top right), T1 (bottom left) and T2 (bottom right) images from the RIRE project.
The images of heads of seven different patients were acquired as part of Retrospective Image Registration Evaluation (RIRE) study (West et al., 1997).Six subsets of MR images were available, with four to seven images in each subset.Three subsets consisted of raw images of each MR modality (PD, T1, T2) and three subsets (PDr, T1r, T2r) consisted of corresponding images, corrected (rectified) for scannerdependent geometric distortions.The size of PET images was 128 x 128 x 15 voxels.The inter-slice distance of PET images was 8 mm, and the intra-slice resolution was 2.59 x 2.59 mm, except of one image, which had intra-slice resolution 1.94 x 1.94 mm.The sizes of MR images were 256 x 256 x 26 or 256 x 256 x 24 voxels.The intra-slice resolution of all MR images was 1.25 x 1.25 mm with 4 mm of inter-slice distance.Gold standard registrations, not revealed to participants of RIRE, between PET and MR images were obtained by means of fiducial markers implanted into the patient's scull.

FEATURE EXTRACTION AND SIMILARITY MEASURE OPTIMIZATION
For calculating the proposed multi-feature mutual information, image intensities of floating and reference images were used as basic features, i a (x) and i b (x), while gradients of image intensities were used as additional features, v a (x) and v b (x), each containing gradient components in x, y, and z directions.The original images used in our experiments had highly anisotropic image voxel sizes.The frequency of intensity signal in the intra-slice direction was much higher than the frequency in inter-slice direction.To correct for the anisotropic voxel sizes and different frequencies, each image was first isotropically resampled by linear interpolation.For MR-CT registrations the MR and CT images were resampled to 0.954 mm and 0.526 mm voxel sizes, respectively, while for PET-MR registrations images were resampled to voxel sizes equal to the smallest voxel dimension before resampling.The basic features i a (x) and i b (x) used for calculating the single-feature and multifeature mutual information measures were obtained directly from intensities of resampled images.To reduce image noise and obtain similar magnitudes of intra-slice and interslice intensity gradients, the gradients were obtained by implementing a simple symmetric gradient operator after convolving intensities with a Gaussian kernel of scale σ ( ) The choice of scale σ should assure that interslice gradients are suppressed while intra-slice gradients are not affected by filtering.The sizes of convolution kernels were chosen to be 1/3-1/4 of the average intraslice resolution of two images involved in registration, and were 0.5 mm and 1.5 mm for MR-CT and PET-MR experiments, respectively.Image intensities were discretized into M, M = 256, 64 and 16 intensity levels.In both registration experiments, Powell's optimization method (Press et al., 1992) was used to optimize the chosen similarity measure for six rigidbody transformation parameters (t x , t y , t z , ω x , ω y , ω z ).

MR-CT REGISTRATION
To test the accuracy and reliability of the multifeature mutual information similarity measure and compare it to the single-feature mutual information, registrations were performed from twenty different starting positions around the "gold standard" registration for each MR-CT data set and each similarity measure.The starting positions of the floating MR image with respect to the registered reference CT image were randomly generated by alternating three translation parameters (from 0-8mm) and three rotation parameters (from 0-22.9°) in such a way that the distance from "gold standard" position in normalized space of rigid transformations was uniformly distributed on the interval of 0-8mm (0-22.9°).The following assumption was used to normalize the parametrical space of rigid transformation: rotation of a volume containing a single-vertebra of size 80 mm around its center for 0.1 radians (5.7°) causes mean translation of vertebra points of 2 mm (Tomaževič et al. 2003).For each individual registration, the registration error was calculated as the root mean square (rms) target registration error (TRE).The positions of CT image voxels belonging to bone and obtained by bone thresholding were chosen as target points.A registration was treated as successful if the obtained rmsTRE was below 1.9 mm, which was the largest voxel size of images involved in registrations.The mean of rmsTRE of all twenty starting positions, none of which was smaller than 1.9 mm, was 4.3 mm with a standard deviation of 1.62 mm.The results for MR-CT registrations, using the two similarity measures, five different intensity inhomogeneities, and M = 256, 64 and 16 intensity levels are presented in Table 1.Table 1 also gives the percentage of misregistration and the mean and standard deviation of registration errors of all successful registrations.In addition, Table 1 gives the significance (P = 0.01) of a two tailed hypothesis test on differences of mean registration errors of two similarity measures.Results in Table 1 indicate, that by increasing the intensity inhomogeneity the registration error and percentage of misregistrations increase much more when the single-feature mutual information is used.This is true regardless of the number of intensity levels.

RIRE PET-MR REGISTRATION
In these experiment 35 combinations of PET-MR images were rigidly registered by the standard singlefeature mutual information of image intensities and the proposed multi-feature mutual information of image intensities and image intensity gradients as similarity measures.For both similarity measures, the starting position for each individual PET-MR registration was obtained by aligning the centers of the floating PET image and reference MR image, while the number of intensity levels was set to 64.The obtained transformations were submitted to the authors of RIRE project, and in return, they provided the registration errors for a particular registration.A PET-MR registration was considered successful if the average TRE error over the volumes of interest was smaller than 8 mm, which was the largest voxel dimension of the MR or PET image.A summary of registration errors for both mutual information measures and for each PET-MR subset appears in Table 2.All median and maximum registration errors, except for PET-T1r were smaller for the multi-feature mutual information than for the singlefeature mutual information.Multi-feature mutual information also outperformed the classical mutual information when the number of misregistrations was taken into account.

COMPARISON WITH OTHER REGISTRATION METHODS IN THE RIRE PROJECT
Fig. 4 summarizes the results of various PET to MR registration methods tested in the RIRE project and compares these results with the results of the proposed multi-feature mutual information similarity measure based on image intensities and intensity gradients.Fig. 4 comprises registration results of those registration methods published on the RIRE web page which contain complete results, i.e., results of all patients in a data subset.Fig. 4 shows results for six PET-MR data subsets.Each dot in a graph represents the median and maximum registration errors for a particular method.The results of the proposed multi-feature mutual registration method are illustrated by dotted lines.Three different implementations of mutual information based registration, i.e., the methods of Collignon et al. (1995), Studholme et al. (1996), andThevenaz andUnser (2000), are especially marked as CO, HI and TH, respectively.In Fig. 4, the absence of TH result for PET-T1 registration is due to TH maximum error, which exceeds the graph's boundaries, while the results of CO are excluded from PET-T1r results because the authors did not report the registration results for the complete PET-T1r data set.

DISCUSSION
In the last decade, information-theoretic similarity measures, especially mutual information and its derivatives, have been extensively used for rigid and nonrigid, mono-and multi-modal image registrations.However, in cases of low global intensity feature correspondence these measures are not robust enough.The robustness can be improved by adding spatial information in the form of local intensity feature changes to the global intensity correspondence.In this paper we have showed how spatial information in the form of intensity gradients may be included in information-theoretic similarity measures.The proposed measure was evaluated by registering intensity inhomogeneity corrupted MR images with CT images.In addition, the measure was tested for the registration of PET to MR images from the widely used Retrospective Image Registration Evaluation project image database.
The aim of the MR to CT registration experiments was to establish and compare the performances of standard single-feature mutual information and the proposed multi-feature mutual information similarity number of additional features as long as they are approximately normally distributed.As presented, multi-feature mutual information registration approach is general.The approach is feasible when additional features will provide additional information on correspondence between two images that are registered.If features from one image have large correspondence with features on the other image, then their contribution to multi-feature mutual information will be larger than for features with lower correspondence.If features between images have no correspondence, their contribution to similarity measure will be insignificant.Furthermore, the contribution of features that are highly correspondent with other features within a single image will be irrelevant since those features are redundant.Rigid registration experiments indicate that the multi-feature mutual information outperforms the single-feature mutual information measure.The contribution of additional image features is especially significant in cases of poor intensity dependences.
n is the dimension of Σ v .The conditional entropies H(V a |i a ), H(V b |i b ) and H(V a ,V b |i a ,i b ) may be thus defined by covariance matrices Σ va|ia , Σ vb|ib and Σ vavb|iaib , of conditional distributions V a |i a , V b |i b and

Fig. 4 .
Fig. 4. Performance of the proposed multi-feature mutual information (dotted lines) and other registration methods (dots) in the RIRE study.For a PET-MR data subset, each dot represents the median and maximum errors of a particular registration method.

Table 1 .
Results of MR to CT image registrations with the single-feature mutual information (SF) and multifeature mutual information (MF) similarity measure for different number of intensity levels and different levelsε of intensity inhomogeneity in the MR image.(-)indicatesthat none of the registrations was successful.

Table 2 .
Registration results for single-feature (SF) and multi-feature (MF) mutual information registration ofPET and MR images.