IMAGE REGISTRATION BASED ON MAXIMIZATION OF GRADIENT CODE MUTUAL INFORMATION

Herein one proposes a mutual information-based registration method using pixel gradient information rather than pixel intensity information. Special care is paid to finding the global maximum of the registration function. In particular, one uses simulated annealing method speeded up by including a statistical analysis to reduce the next search space across the cooling schedule. An additional speed up is obtained by combining this numerical strategy with hill-climbing method. Experimental results obtained on a limited database of biological images illustrate that the proposed method for image registration is relatively fast, and performs well as the overlap between the floating and reference images is decreased and/or the image resolution is coarsened.


INTRODUCTION
Image registration is the process of aligning two or more images.A variety of approaches have been published.Good surveys are available in Maintz and Viergever (1998) and in Hajnal et al. (2001).Over the last few years, mutual information (MI) has become one of the most popular methods of image registration because it is more flexible and accurate than any other method based on global information content.However, MI method has its own limitations.For example, when the images are of low resolution, when the images contain little information, or when the region of overlap is small, MI may result in misregistration (Pluim et al., 2000a).
Since the initial work of Wells et al. (1996) and Maes et al. (1997), several groups have modified the optimization objective MI itself.It is worth mentioning Studholme et al. (1999) who introduced normalized MI to rigidly register multi-modal images with different fields of view, Pluim et al. (2000a) who added a gradient-based term to the MI function in order to decrease the number of local maxima, and Rueckert et al. (2000) who applied second order entropy estimation to model the dependency of a voxel gray value on the intensities of a local neighborhood around that voxel.Other groups discussed registration with image features.Rangarajan et al. (1999) discussed feature point registration with MI.Hellier and Barillot (2003) coupled dense and landmark-based approaches for non-rigid registration.Johnson and Christensen (2002) used landmark and intensity information together to produce consistent correspondences between images.Hartkens et al. (2002) introduced feature information into voxel-based registration algorithms in order to incorporate higher-level information about the expected deformation.Butz and Thiran (2001;2002) recalled the very general definition of MI and chose the feature of edgeness to perform image registration.These methods combining directly image features with MI have many advantageous properties of both feature-based and intensity-based methods.Our own work continues along this line.
In this paper, we propose a technique for image registration, called Gradient Code Mutual Information (GCMI).It estimates MI function based on the gradient code, rather than on the widely used intensity.Another problem is the presence of local maxima that may cause local optimization algorithms to fail.Therefore we improve the global optimal algorithm of simulated annealing (SA), and then combine it with the local optimization of hill-climbing properly to reduce the number of iterations.
The remainder of this paper is organized as follows.Firstly, the method of GCMI is introduced.Then, the advantages of GCMI method are experimentally illustrated through applications to multimodal medical image registration.Finally, conclusions are provided.

METHOD GRADIENT CODE
Take two 2D discrete images f(x,y) and r(x,y) to be registered.From f and r, we construct gradient code images, f g and r g , such that each pixel of the images represents a gradient code that is, the quantification of gradient vector at the corresponding pixel position in the original gray-level images.Take f(x,y) as an example, the horizontal and vertical derivatives of f(x,y) are represented respectively as / .Partial differentials are calculated using central differences (Rosefeld and Kak, 1982).To avoid a reduction in image size, forward / backward differences are used for border pixels.Then the module ( j i, ρ ) and the phase ( j i, θ ) of the gradient vector ) , ( at a pixel (i, j) are calculated respectively as: Herein, for convenience purposes, we make the range j i, . Then, the gradient code is obtained by dividingθ into N ( ) levelintervals of constant width θ ∆ , and ρ into M ( level-intervals of constant length ρ ∆ .We have to choose appropriate interval-sizes for precise registration of different modal images.This issue should be considered in relation to inherent information amount and spatial resolution.The gradient code (f g and r g images) is ultimately defined as: [x] represents the integer which is equal or less than x, Γ is a pre-specified threshold level for ignoring low contrast pixels coded as L (L is a large value).The purpose for using Γ is to prevent uniform or semiuniform regions from influencing the MI evaluation as pixels with low contrast neighborhoods are more sensitive to noise.Using a too large value for Γ can cause the suppression of information in low contrast images.An example of the 2D gradient code is depicted in Fig. 1  ).
The method can be easily extended to 3D discrete images.Similarly, we represent the gradient vector ) , , ( ) intervals of width ϕ ∆ .The gradient code is defined as:

GRADIENT CODE MI (GCMI)
MI is a widely used information theoretical distance measure between probability densities.The MI between two images f and r is: Where H( f ) and H( r ) stands for the entropies of images f and r, H( f,r ) for their joint entropy, and for the conditional entropies.Usually we use the generalized distance between joint probability distribution p fr ( f,r) and marginal probability distributions p f ( f ) and ) (r p r to estimate the MI: Registration of the images f and r can be defined as finding the geometrical transformation T (mapping the floating image f to the reference one r) which maximizes the MI registration measure.In Wells et al. (1996), the measured features were simply the voxel intensities.In Rueckert et al. (2000), the sampling space was defined by the intensities of two neighbor voxels (i.e., a two-dimensional feature space).In Butz and Thiran (2002), edgeness was defined as the feature space to estimate the MI.In this paper, we restrict the discussion to a feature space represented by gradient code information.Because MI is evaluated based on gradient code images, we call our approach Gradient Code Mutual Information (GCMI).
In practice, MI cannot be computed exactly due to the sampled nature of the floating and reference images.One obvious problem is that the transformed position of a voxel will generally not coincide with a grid point of the reference image, such that the corresponding intensity is unknown.Partial volume (PV) interpolation, originally proposed by Collignon et al. (1995;1998) and Maes et al. (1997), is specifically designed to update the joint histogram of two images.It has been found PV interpolation generally outperforms linear interpolation in terms of smoothness of the MI registration criterion.This study uses PV interpolation for both MI and GCMI.

OPTIMIZATION STRATEGY
An optimal optimization strategy should converge to the global extreme of an optimized function and also be fast.Several strategies available in the literature including genetic algorithm (Parker, 1997), Powel's method (Press et al., 1992), amoeba (Press et al., 1992) and SA (Goffe et al., 1994) are compared.The genetic algorithm is a relatively robust strategy, but its convergence to the global extremum is slow; amoeba and Powel's method are faster optimizers, but they frequently terminate in local extremes.SA is a global optimization method that distinguishes between different local optima.It is fast and is more robust (Čapek et al., 2001).Therefore, we apply here the Metropolis SA strategy (Metropolis et al., 1953), and modify it to adaptively adjust the search space for the transformation parameters of optimization.That is, before updating the annealing temperature, we statistically analyze the whole results (including the transformation parameter sets and their corresponding MI values) performed at current temperature to obtain the maximum (MI max ) and minimum (MI min ) of the objective function, and then calculate the middle value (MI mid ): Among all the parameter vectors (geometrical transformation mapping the floating image to the reference image) at this temperature, only those of the corresponding MI value being above the middle MI mid are retained.Within all these retained vectors, we get the maximum and minimum of each parameter, and then assigned them respectively as upper bound (ub) and lower bound (lb) of the optimization parameters for the next search space.Thus the optimization process is greatly speeded up for avoiding many useless evaluations of the objective function.Since the algorithm makes very few assumptions about the function to be optimized, it is robust with respect to non-quadratic surfaces.
However, the time consumption of the above strategy for image registration is still too high to be allowed.We have found that when MI value is not changed or the change is very small within several successive iterations, the transformation parameters are close to the optimal values.If one continued to perform SA search, the process would be very slow.So we quit SA at this moment, and then use a local optimization of hill-climbing (Russell and Norvig, 1995) for optimal search.The idea behind hill climbing is as follows: 1. Pick a random point in the search space.
2. Consider all the neighbors of the current state.
3. Choose the neighbor with the best quality and move to that state.
4. Repeat 2 thru 4 until all the neighboring states are of lower quality.
5. Return the current state as the solution state.
The scheme does not only prevent the optimizing course from trapping in local extrema, but also saves a large number of iterations.The whole algorithm is implemented as follows: Step 1: Transform the two original images to be registered into gradient code images.
Step 2: Initialize the cooling schedule (initial temperature T = 5.0, temperature reduction factor rt = 0.85) and the parameter values (initial transformation parameter set x, upper and lower bounds on search space lb and ub), where the temperature is initially set at a relatively high value, which allows a large range of possible perturbations.Compute ) (x MI . Step 3: Pick a new set of transformation parameter set x', and compute MI(x').
Step 4: Calculate the difference between MI(x) and MI(x').If MI(x') > MI(x) the new parameter is accepted; otherwise, acceptance or rejection is determined by the Boltzmann probability given by where T is the temperature parameter of the SA.
Step 5: If the number n of evaluations of the objective function is more than a large predefined number or if the current MI value is not changed within four successive iterations, then go to step 6.Otherwise, adjust the upper (ub) and lower bounds (lb) as mentioned above and accordingly the next search range vm = ub -lb, decrease the temperature according to the cooling schedule T = T 0 • rt, and go to step 3.
Step 6: Perform hill-climbing algorithm for optimal search.

RESULTS
To illustrate the performance of our method, we consider hereafter two different 2D registration problems: T1 magnetic resonance (T1-MR) to T2 magnetic resonance (T2-MR), and T1-MR to computed tomography (CT) and compare the results obtained by MI and GCMI methods (with 8 / π θ = ∆ and 8 / 1 = ∆ ρ ).Our images are real data from Beijing Tiantan hospital and consist of pairs of CT and MR (T1 and T2) images of 6 patients.Each studied pair of images relates to the same patient.
Then, we evaluate the efficiency of our method for 2D and 3D registration by duplicating the original MR images.
The study was approved by the local ethic committee and consented by the patients after explaining the purpose of the study.All the experiments below are based on our medical image processing platform 3Dmed (Tian et al., 2003).

ACCURACY
• MR-T1 and T2 T1-and T2-MR images are multimodal in that different tissue characteristics are imaged.Yet, the structure in both images remains similar.Each pair of images comes from the same patient and the same scan, and we assume accordingly that no transformation is required to align the images.Fig. 2 displays the registration function of the T1-MR image to the T2-MR image using GCMI and MI for translation along the O x axis.In the left column of Fig. 2 (images with full resolution), both methods perform well.In the mid column (images sub-sampled by a factor of eight in each dimension) and in the right column (image-overlap changed to 60%), the MI registration does not perform well, and the optimum is stretched and shifted.In contrast, the GCMI registration performs well and the optimum is accurately identified.The results illustrate that GCMI function is invariant to the reduced overlap between the floating and reference images and less sensitive to the sub-sampling rate.
• MR-T2 and CT Although MR images depict different anatomical details than CT images, there are generally corresponding structures in both images.Each studied pair of images comes from the same patient; the images have been manually registered by clinical experts so that no transformation is assumed to be needed for aligning the images.Fig. 3 shows examples of MR-CT registration using GCMI and MI.We can observe that when the image-overlap is reduced to 60% (mid-column in Fig. 3) and when the images are in addition sub-sampled (by a factor of four in each dimension) (right column), the MI functions deteriorate, while the GCMI functions are much better.Some results for MR-CT registration are illustrated in Fig. 4. On these examples, we clearly observe that when the images are of low resolution or when the image-overlap is reduced, the MI results in misregistration, whereas, the GCMI performs better.

TIME REQUIREMENT
To illustrate the speed-up (time-wise) allowed by our numerical strategy, we duplicate a 2D MR image of size 256×256 and then register it to its original.Repeated experiments of this type illustrate that the registration using standard SA optimization (Table 1; 1 st line) requires about 1650 seconds.The improved SA optimization (Table 1, 2 nd line) requires only about 455 seconds with the accuracy significantly improved.When combining the improved SA optimization with hill-climbing method (Table 1, 3 rd line), an accuracy of the same order is still observed, and the time is significantly reduced.To 3D volume registration, we duplicated a 3D MR image of size 256 × 256 × 24 and registered it to its original.The results are listed in Table 2.All the experiments were performed using MATLAB on a PIII800, 128M.

CONCLUSIONS
In this paper, we present a Gradient Code Mutual Information (GCMI) based registration technique.The first step consists in transforming intensity images into gradient code images.MI is then calculated directly based on the gradient code images.Relying on a limited database of multimodal images relating to 6 patients, it is preliminarily observed that the proposed GCMI method yields a more accurate registration function than does standard MI when the image overlap is reduced and/or the sampling resolution is coarsened.In this respect, and at least for the studied database, gradient code information seems more appropriate than intensity information for MI based image alignment.
As in Pluim et al. (2000b), we can observe the local extrema in both MI and GCMI functions (Figs. 2, 3) brought with the PV interpolation.According to the optimization strategy mentioned above, sub-pixel registration accuracy can be obtained (Table 1, 2).Several aspects of the method shall be improved or further investigated: mention in the first place the robustness of GCMI.Also, experiments with images of lower intrinsic resolution such as PET and SPECT should be carried out.This will be dealt with in the future work.

Fig. 2 .
Fig. 2. T1-to T2-MR registration using GCMI and MI for translation along O x axis.From left to right: images with full resolution, images sub-sampled by a factor of 8 in each dimension, and image-overlap changed to 60%.

Fig. 3 .
Fig. 3. MR-CT registration using GCMI and MI.From left to right: images with full resolution for translation along O x axis, image-overlap changed to 60% for rotation around O z axis, and in addition images sub-sampled by a factor of 4 in each dimension for translation along O x axis.

Fig. 4 .
Fig. 4. Illustration of MR-CT registration results based on GCMI and MI.(a) MR floating image.(b) CT reference image.(c) Image containing 60% of the information in (b).From (d1) to (d4), GCMI registration results with full resolution images (d1), with images sub-sampled by a factor of 8 in each dimension (d2), with images sub-sampled by a factor of 4 in each dimension and overlap changed to 60% (d3), and with images subsampled by a factor of 8 in each dimension and overlap changed to 60% (d4).(e1) to (e4): Corresponding results using MI.

Table 1 .
Mean time requirement of three optimization methods for the registration of a pair of 2D images.

Table 2 .
Mean time requirement of three optimization methods for the registration of a pair of 3D images.