BAYESIAN IMAGE RESTORATION, USING CONFIGURATIONS

In this paper, we develop a Bayesian procedure for removing noise from images that can be viewed as noisy realisations of random sets in the plane. The procedure utilises recent advances in configuration theory for noise free random sets, where the probabilities of observing the different boundary configurations are expressed in terms of the mean normal measure of the random set. These probabilities are used as prior probabilities in a Bayesian image restoration approach. Estimation of the remaining parameters in the model is outlined for salt and pepper noise. The inference in the model is discussed in detail for 3 X 3 and 5 X 5 configurations and examples of the performance of the procedure are given.


INTRODUCTION
The comparison of neighbouring grid points in a discrete realisation of a random closed set Z in R 2 has been used for decades to make inference on various characteristics of the random set.A classical result, cf.Serra (1982), states that the information obtained by comparing pairs of neighbouring grid points can be used to estimate the mean length of the total projection of the boundary of the random set in directions associated with the digitisation.This, in turn, yields certain information about the directional properties of the boundary.Larger configurations, such as grid squares of size 2 × 2 or 3 × 3, were used in Ohser et al. (1998) and Ohser and Mücklich (2000) to estimate the area density, length density, and density of the Euler number of Z.
In Jensen and Kiderlen (2003) and Kiderlen and Jensen (2003), the authors use grid squares of size n × n, n ≥ 2, to estimate the mean normal measure of the random set Z. The knowledge of this can then be used to quantify the anisotropy of Z. Events of type tB ⊂ Z, tW ⊂ R 2 \ Z are observed, where tB and tW are finite subsets of the scaled standard grid tZ 2 .The probability of such events, can effectively be estimated by filtering the discrete image.In digitised images, B usually stands for "black" points and W for "white" points.Here, we use the notion point for the mid-point of a pixel in the digitised image.We will not distinguish between a pixel and its mid-point and we use both notions in the following.
Another interesting aspect in the analysis of discrete planar random sets is the restoration of the random set from a noisy image.If the mean normal measure of the random set Z is known, the method in Kiderlen and Jensen (2003) and Jensen and Kiderlen (2003) can be reversed to obtain the prior probabilities for a Bayesian restoration procedure.The fundaments for Bayesian image analysis were developed by Grenander (1981), while the method itself was developed and popularised mainly by Geman and Geman (1984).For further readings on the subject, see e.g.Chalmond (2003) and Winkler (2003).Hartvig and Jensen (2000) introduce a spatial mixture modelling approach to the Bayesian image restoration.They consider n × n neighbourhoods around each pixel in the image, where n ≥ 3 is an odd number.The prior probability of a certain configuration or pattern to be observed in the neighbourhood then depends on the number of black points in the given configuration.In other words, every two configurations with equal number of black points have the same prior probability.If, however, the restored image represents a random closed set Z that fulfils some regularity conditions and the resolution of the image is "good enough", the following configurations should not have equal prior probabilities: This is showed in Kiderlen and Jensen (2003) where, asymptotically, a black and white configuration has a positive prior probability if and only if there exists a line going through the centre of at least two pixels that separates the black and the white points and hits only points of one colour.We use this theory to specify new prior probabilities for the spatial mixture model of Hartvig and Jensen (2000).

PRELIMINARIES
A compact convex subset of R 2 is called a convex body and we denote by K the family of convex bodies in R 2 .The convex ring, denoted by R, is the family of finite unions of convex bodies while the extended convex ring is the family of all closed subsets F ∈ R 2 such that F ∩ K ∈ R for all K ∈ K .Further, we denote by L(K, •) the normal measure of K ∈ R on the unit circle S 1 .For a Borel set A ∈ B(S 1 ), L(K, A) is the length of the part of the boundary of K with outer normal in A. L is thus a Borel measure on S 1 and the total mass L(K, S 1 ) is just the boundary length L(K) of K.The normal measure is sometimes called the first surface area measure and then denoted by S 1 (K, •), cf.Schneider (1993).Now, let Z be a stationary random set in R 2 with values in the extended convex ring.We assume in the following that Z satisfies the integrability condition for all K ∈ K .Here, N(U) is the minimal number k ∈ N such that U = ∪ k i=1 K i with K i ∈ K if U = / 0 and N( / 0) = 0.This condition is stricter than most standard integrability conditions, but it guarantees that the realisations of Z do not become too complex in structure.The mean normal measure of Z is defined by where ν 2 is the Lebesgue measure on R 2 .See e.g.Schneider and Weil (2000) for more details.
A digitisation (or discretisation) of Z is the intersection of Z with a scaled lattice.For a fixed scaling factor t > 0, we consider Z ∩ tL, where is the usual lattice of points with integer coordinates.The lattice square Here, we follow the notation in Hartvig and Jensen (2000) and place the lattice square around a centre pixel.As we only consider lattice squares with odd number of points, this should not cause any conflicts in the notation.A line passing through at least two points of L n will be called an n-lattice line.
Let X ⊂ tZ 2 be a finite set and t > 0. A binary image on X is a function f : X → {0, 1}.Here, f is given by f (x) = {x ∈ Z ∩ X} so that f is a random function due to the randomness of the set Z. We call a certain pattern of the values of f on a n × n grid a configuration.We denote it by C n t , where t > 0 is the resolution of the grid, as in the definition of a lattice above.The elements of the configuration are numbered to match the numbering of the elements in the lattice square L n .For n = 3 this gives and similarly for other allowed values of n.If the size of the configuration is clear from the context, we will omit the index n.Examples of 3 × 3 configurations are where the sign • means that f (x) = 1 or equivalently z ∩ {x} = / 0, while • means that f (x) = 0 or equivalently z ∩ {x} = / 0. Here, z is the realisation of the random set Z observed in the image f .

CONFIGURATION PROBABILITIES
Let f : X → {0, 1} be an image as before and let Z be a stationary random set that fulfils Eq. 1.In Kiderlen and Jensen (2003), the authors show that for n > 0, a given x ∈ X, and a given configuration C t , lim (2) The function h is given by where (tB,tW ) = C t is the partitioning of the configuration C t in "black" and "white" points, that is, tB ⊂ Z and tW ⊂ R 2 \ Z. Here, g + := max{g, 0} denotes the positive part of the function g and x, y denotes the usual inner product of the vectors x and y.A configuration C t with non-identically zero h is called an informative configuration.C t is informative if and only if there exists a n-lattice line separating tB and tW not hitting both of them.More precisely, C t = (tB,tW ) is informative if and only if there exists an n-lattice line g such that tB is on one side of g, tW is on the other side of g and all the lattice points on g are either all black or all white.Furthermore, it is shown in Jensen and Kiderlen (2003) that for a given informative configuration C t , there exist vectors a, b ∈ R 2 such that for all v ∈ S 1 .These results are then used to obtain estimators for the mean normal measure L(Z, •) based on observed frequencies of the different types of configurations.If we, on the other hand, assume we have a discrete noisy image in R 2 , where the underlying image is a realisation of a stationary random closed set Z with a known mean normal measure L(Z, •), Eq. 2 provides prior probabilities in a Bayesian restoration procedure.
As an example, let us assume that Z is isotropic.Then, the mean normal measure L(Z, •) is, up to a positive constant of proportionality, the Lebesgue measure on [0, 2π).Eq. 2 thus becomes lim where y(θ ) = (cosθ , sin θ ) and k > 0 is a constant.For t > 0 small enough, such that only informative, all black, and all white configurations have positive probability, this gives the marginal probability of each informative configurations up to a constant of proportionality.
For n = 3, the vectors a and b are given in Jensen and Kiderlen (2003).We can thus insert those values without further effort into the right hand side of Eq. 3.For x ∈ X, this gives where A is the event Z ∩ (tL 3 + x) = C t and R(•) is the set of all possible rotations and reflections.The probabilities p 2 , . . ., p 5 are determined from Eq. 3 up to a multiplicative constant c.They are given by p 2 = c 5 sin(atan(2)) − 4 , As the total probability is 1, we have We can thus express c in terms of the other unknown probabilities, For n = 5, we have used the methods described in Jensen and Kiderlen (2003) to determine the informative 5 × 5 configurations and the vectors a and b for each configuration.We have then calculated the prior probabilities in the same manner as described above for 3 × 3 configurations.The results from this can be found in Appendix A.
Knowledge of the mean normal measure of Z will not give us information about the probability of observing all white and all black configurations, as the mean normal measure is a property of the boundary of the set.The remaining parameters, p 0 and p 1 must thus be estimated from the data.This problem is treated in the next section.

RESTORATION OF A NOISY IMAGE
Let F : X → {0, 1} be a binary image on a finite set X ⊂ tZ 2 for t > 0 and such that F can be viewed THORARINSDOTTIR TL: Bayesian image restoration, using configurations as a realisation of an isotropic stationary random set Z with noise.Note that the randomness in the image F is two-fold.First, the noise free image is random due to the randomness of the set Z. Second, a random noise is added to the image.By Bayes rule we have, for x ∈ X and a given configuration C t , We assume that F(x i ) and F(x j ) are conditionally independent given Z for all x i , x j ∈ X, and that the conditional distribution of F(x) given Z only depends on Z ∩ x for all x ∈ X.Under these conditions, we get where {y k } n 2 k=1 = tL n + x and {c k } n 2 k=1 = C t .By summing over the neighbouring states, we obtain the probability of Z hitting a single point x ∈ X, The probability of Z not hitting a single point x ∈ X is obtained in a similar way.It is given by As the probabilities in Eq. 4) and Eq. 5 sum to one, we only need to compare S 1 (x) and S 2 (x) for determining the restored value of the image for a pixel x.The restored value is 1 if S 1 (x) > S 2 (x) and 0 otherwise.
To compare S 1 (x) and S 2 (x), we need to determine the densities p F(x)|Z ∩ {x} which depend on the distribution of the noise.As an example, we consider salt and pepper noise.That is, a black point is replaced by a white point with probability q, and vice versa.More precisely, for some 0 ≤ q ≤ 1.This noise model has one unknown parameter, q, which must be estimated from the data.
Further, we need to determine the marginal probability P Z ∩ (tL n + x) = C t of observing a given configuration, C t .A method to obtain the prior probabilities of observing the different types of boundary configurations, that is configurations that contain both black and white points, is given in the previous section.We still lack information about the prior probabilities of observing configurations that are all black or all white, that is if n = 3 and similarly for larger n.
We use the parameter estimation approach introduced in Hartvig and Jensen (2000) which is related to maximum likelihood estimation.Within the model, we can calculate the marginal density of an n × n neighbourhood.It is given by where the constant B(C t ) is given by the integral on the right hand side of Eq. 3 and We have A(3) = 16 and A(5) = 32.
A possibility for estimating the parameters p 0 , p 1 , and q is to maximise the contrast function Stereol 2006;25:129-143 This is, however, computationally a very demanding task.We have therefore used a simplified version of the approach.The probability that a single point x ∈ X is in the set Z is as exactly half of the boundary configurations have a black mid-point.The marginal density of a single point is thus given by p F(x); p 0 , p 1 , q The corresponding contrast function can easily be differentiated with repect to the parameters p 0 , p 1 , and q.The differentiation yields that the maximum of γ m is obtained when where |X| denotes the number of points in X.

EXAMPLES
We illustrate the method by applying it to two synthetic datasets and one real data set.We use the salt and pepper noise model and isotropic priors for the configuration probabilities in all three examples.The method can not be used directly to restore the values on the edge of an image.In the examples below, we have therefore a one-pixel-wide edge of white (background) pixels in each restored image for n = 3 and a twopixel-wide edge of white pixels in each restored image for n = 5.Another possibility here would be to add either a one-pixel-wide boundary of white pixels for n = 3, or a two-pixel-wide boundary of white pixels for n = 5, around the noisy image before restoration.This will, however, lead to a slight underestimate of black pixels on the edge.We will quantify the results by the classification error.The classification error is estimated as the percentage of misclassified pixels (either type I or type II errors).The results given for the classification error are based on those pixels from the interior of each image where there are no edge effects.

Example 1 (Boolean model with isotropic grains).
The first example is based on digitisation of a simulated Boolean model, see Schneider and Weil (2000).Boolean models are widely used as simple geometric models for random sets.The simulation of a Boolean model is a two-step procedure.First, independent uniform points are simulated in a sampling window.Second, a random grain is attached to each point.The grains are independent from one another and from the points.In order to avoid edge effects, the sampling window must be larger than the target window.Here, the target window is the unit square and the grains are circular discs with random radii.The radius of each grain is a uniform number from the interval  In the leftmost image we have q = 0.25, in the middle image q = 0.33, and in the rightmost image q = 0.4.We have restored the original image from the noisy images using both 3 × 3 configurations and 5 × 5 configurations as described in the previous section.The resulting images for 3 × 3 configurations are shown in the middle row of Fig. 2 and the resulting images for 5 × 5 configurations are shown in the bottom row of Fig. 2. The parameter estimates and the classification errors for the restoration are given in Table 1.We have proceeded exactly as in the previous example.The noisy images are shown in Fig. 4 (top row).In the leftmost image we have q = 0.25, in the middle image q = 0.33, and in rightmost image q = 0.4.The restored images for 3 × 3 configurations are shown in the middle row of Fig. 4 and the restored images for 5 × 5 configurations are shown in the bottom row of Fig. 4. Further, Table 2 shows the parameter estimates and the classification errors for the restoration.Example 3 (Image from steel data).Our last example is an image showing the micro-structure of steel.The image is from Ohser and Mücklich (2000), where it has been analysed to estimate the mean normal measure, see also Jensen and Kiderlen (2003).The thresholded, binary image of the data is shown in Fig. 5.We have used Otsu's method for the thresholding.This method minimises the intraclass variance of the black and the white pixels, see Otsu (1979).The resolution of the image is 896 × 1280 pixels.Fig. 5. Binary image of rolled stainless steel in a longitudinal section.The light phase is ferrite, the black phase is austenite.From Ohser and Mücklich (2000).We have added salt and pepper noise to the binary image for q = 0.25 and q = 0.33.The noisy images are shown in Fig. 6 (left column).We have used the method described in the previous section for the restoration of the noisy images, using isotropic priors for the informative configurations.The resulting images can be seen in Fig. 6 (middle column) for 3 × 3 configurations and in Fig. 6 (right column) for 5 × 5 configurations.Further, the parameter estimates with classification errors are given in Table 3.

COMPARISON WITH OTHER METHODS
Other models for this type of Bayesian image restoration include Markov random field models.In Besag (1986), the author presented an iterative method for this type of image restoration where the local characteristics of the underlying true image are represented by a non-degenerate Markov random field.The author calls his method iterated conditional modes (ICM).As the name of the method indicates, the estimation under the model is performed iteratively where the possible parameters of the model and the reconstructed point pattern are updated in turn.This method is very flexible, though computationally quite intensive and depends on a smoothing parameter that cannot be directly estimated from the data.An extenstion of this method is the maximum a posteriori (MAP) method proposed in Greig et al. (1989).The MAP method is a special case of the ICM method where the estimated image is given by the image which maximises the posterior density.In this case there exists and efficient algorithm to calculate the estimate.There is though still the problem of a smoothness parameter that cannot be directly estimated from data.Further discussion of models of this type can be found in Chalmond (2003) and Winkler (2003).
In Hartvig and Jensen (2000), the authors proposed three prior models of different complexity for a similar type of restoration method as is described in this paper.These models also reflect the idea, that pixels of one colour tend to cluster rather than appear as single isolated voxels.They, however, do not take the actual spatial pattern of the neighbourhood into account.
In order to compare the method presented here to the methods mentioned above, we have used our method to reconstruct noisy versions of the image of an "A" by Greig et al. (1989), see Fig. 7(a).The same image was used in Greig et al. (1989) and Hartvig and Jensen (2000) to show the performance of the methods mentioned above.Fig. 7.The 64 × 64 binary image of an "A" by Greig et al. (1989) (a), the same image corrupted with salt and pepper noise with parameter q = 0.25 (b), the estimated true image using the method described in this paper with n = 3 (c), and the estimate using the same method, but with n = 5 (d).
Table 4.Estimated classification errors for the models described above based on five independent reconstructions of noisy versions of the image in Fig. 7 (a).The results are given in percentage, standard errors are given in parentheses.The different models presented in Hartvig and Jensen (2000) (denoted HJ in the table) are denoted as in the orignal paper.All results except for the model presented in this paper are reproduced from Hartvig and Jensen (2000) and Greig et al. (1989).In Fig. 7, estimates under our method with noise parameter q = 0.25 are shown.This is the same amount of noise as was used in the other papers.
For comparability, we have added a two-pixel-wide boundary of black pixels to the image before adding the noise.This way, we can restore the whole original image using both 3 × 3 and 5 × 5 configurations.The resulting classification error based on five independent repetitions of the procedure is given in Table 4.For comparison, we have also reproduced the results on this image from Greig et al. (1989) and Hartvig and Jensen (2000).In the case of the ICM and the MAP methods, we only show the classification error for the value of the unknown smoothing parameter which gave the best results.

DISCUSSION
In the two first examples we have images of a similar type, the only difference is the mean normal measure of the boundary of the objects.In Example 1, the grains have isotropic boundaries which means that the model is using the correct prior probabilities for the configurations.In Example 2, on the other hand, there are some configurations that have much higher probability than suggested in the prior.The configurations are, for instance, more likely to occur in the image than the configurations According to the isotropic prior, however, these configurations are all equaly likely to occur.If we compare the results in Table 1 and Table 2, we see that the classification error in Example 2 is very similar to the classification error in Example 1 for the same amount of noise and the same type of model.This suggests that it is not necessary to know the mean normal measure of the boundary of the object precisely for our model to perform in a close to optimal way.Further, it can be seen from Table 4 that our method performs better or equally good as other similar methods, even though the "random" set in the image in Fig. 7 is far from being isotropic which is assumed in the prior.
It is also clear from the results in Table 1-4 that the model using 5 × 5 configurations is superior to the model using 3 × 3 configurations for all our examples.This is not surprising since the true images are quite regular with large patches of either black or white pixels.One might suspect that the model using 3 × 3 configurations would be more appropriate for images where the object Z consists of relatively small, disconnected components as in the example of simulated fMRI data in Hartvig and Jensen (2000).In this example, the 3 × 3 configurations give better results than the 5 × 5 configurations for all three prior models discussed in the paper.In our prior model, only informative configurations have a positive probability.That is, a boundary configuration has a positive prior probability if and only if there exists a line going through the centre of at least two pixels that separates the black and the white points and hits only points of one colour.If, however, the components of Z are very small compared to the resolution of the image, only very few boundary configurations will be of this type for large configurations.We would therefore expect similar results as in Hartvig and Jensen (2000).
Another consideration is whether it is of interest to consider larger configurations than 5 × 5 configurations.As one can see from Appendix A, the model is already quite complicated if we use 5 × 5 configurations.We think, therefore, that it is computationally not feasible to consider larger configurations.Further, and maybe more importantly, very large configurations will tend to remove any finer details in the original image.
The model presented in this paper is very local in nature.The estimated restored value in a given pixel only depends on the image values in a small neighbourhood around that pixel.For this reason, there is no obvious way how to derive the joint posterior distribution over the entire image from the posterior distribution of the marginals in the small neighbourhoods and it is the former that is needed for estimating the unknown global parameters in the model.We have chosen to use the contrast function from Hartvig and Jensen (2000), as this seems a sensible choice with a close relation to maximum likelihood estimation.As noted in Woolrich et al. (2005), the difference between the parameter estimates using this contrast function and those that could be obtained if the joint posterior were available is not known.Our method seems, however, not very sensitive towards small changes in the parameter estimates.We can also see from Table 1-3 that we get fairly good parameter estimates by maximising the contrast function if the noise in the image is moderate, especially for the larger image in Example 3.For higher levels of noise, the accuracy in the parameter estimates seems to depend on the accuracy of the prior for the informative configurations.Furthermore, the accuracy of the parameter estimates depends heavily on the resolution of the grid on which the contrast function γ is maximised.It might therefore be of interest to use a finer grid of possible parameter values for the maximisation of γ.As can be seen from Table 1-3, it might especially be of interest to use a finer grid for the noise parameter q as the remaining parameters p 0 and p 1 are estimated with less accuracy if the estimate for q is inaccurate.

APPENDIX A
Using the methods described in Jensen and Kiderlen (2003), we have constructed all informative 5 × 5 configurations and calculated the vectors a and b which are needed for the calculation of the prior probabilities of the informative configurations.The results are given in Table 5.We have omitted both the index for the resolution of the grid and the index for the size of the configuration to save space in the table.
In the examples presented above, we have used an isotropic prior for the boundary configurations.For x ∈ X, the prior probabilities for 5 × 5 configurations are in this case given by Table5.The92groupsofinformative5 5con gurations(continued).
No. Config.Twin Config.Twin Config.Twin Config.Twin a b 20 Table5.The92groupsofinformative5x5configurations(continued). No [0.0375, 0.15].Fig. 1 (left) shows a realisation of this model.We have then digitised the image with t = 0.01 which gives a resolution of 100 × 100.The digitised image is shown in Fig. 1 (right).

Fig. 1 .
Fig. 1.Boolean model with circular grains.Left: a realisation of the model on the unit square.Right: a digitised image of the realisation with resolution 100 × 100.The digitised realisation of the Boolean model from Fig. 1 (right) is now our original image.We have added salt and pepper noise to it for three different values of the noise parameter q.The noisy images are shown in Fig. 2 (top row).In the leftmost image we have q = 0.25, in the middle image q = 0.33, and in the rightmost image q = 0.4.We have restored the original image from the noisy images using both 3 × 3 configurations and 5 × 5 configurations as described in the previous section.The resulting images for 3 × 3 configurations are shown in the middle row of Fig.2and the resulting images for 5 × 5

Fig. 2 .
Fig. 2. Restoration of the digitised realisation of the Boolean model with isotropic grains.Top row: the original image disturbed with salt and pepper noise for q equal to 0.25, 0.33, and 0.4.Middle row: estimates of the true image using 3 × 3 configurations.Bottom row: estimates of the true image using 5 × 5 configurations.Example 2 (Boolean model with non-isotropic grains).The grains in the Boolean model are here the right half of circular discs with random radii.The radius of each grain is a uniform number from the interval [0.0375, 0.15] and the target window is again the unit square.A realisation of this model is shown in Fig. 3 (left).As before, we have digitised the image with t = 0.01 which gives a resolution of 100 × 100.The digitised image is shown in Fig. 3 (right).

Fig. 3 .
Fig. 3. Boolean model with non-isotropic grains.Left: a realisation of the model on the unit square.Right: a digitised image of the realisation with resolution 100 × 100.

Fig. 4 .
Fig. 4. Restoration of the digitised realisation of the non-isotropic Boolean model.Top row: the original image disturbed with salt and pepper noise for q equal to 0.25, 0.33, and 0.4.Middle row: estimates of the true image using 3 × 3 configurations.Bottom row: estimates of the true image using 5 × 5 configurations.
Fig.6.Restoration of the steel data image.Left column: the original binary image disturbed with salt and pepper noise for q = 0.25 (upper image) and q = 0.33 (lower image).Middle column: estimates of the true image using 3 × 3 configurations.Right column: estimates of the true image using 5 × 5 configurations.

Table 1 .
Parameter estimates, true parameter values, and classification errors for the restoration of a Boolean model with isotropic grains.The parameter estimates are based on five independent simulations of the degraded image.The standard errors of the estimates are given in parentheses.The classification errors are given in percentage.

Table 2 .
Parameter estimates, the true parameter values, and classification errors for the restoration of the nonisotropic Boolean model.The parameter estimates are based on five independent simulations of the degraded image.The standard errors of the estimates are given in parentheses.The classification errors are given in percentage.