ON THE ESTIMATION OF DISTANCE DISTRIBUTION FUNCTIONS FOR POINT PROCESSES AND RANDOM SETS

This paper discusses various estimators for the nearest neighbour distance distribution function D of a stationary point process and for the quadratic contact distribution function H q of a stationary random closed set. It recommends the use of Hanisch’s estimator of D, which is of Horvitz-Thompson type, and the minussampling estimator of Hq. This recommendation is based on simulations for Poisson processes and Boolean models.


INTRODUCTION
Random sets are successful models for various spatial structures such as porous media, phases in two-or multi-phase materials or biological tissues.They are studied in many stereological studies.In their statistical analysis, contact distributions play an important role, see Serra (1982), Stoyan, Kendall and Mecke (1995), and Ohser and Mücklich (2000).Of particular interest are the linear, spherical and quadratic contact distribution function (cdf) H l H s and H q .For a stationary random closed set X, the spherical cdf H s is the distribution function of the random distance from an arbitrary point outside of X to its nearest neighbour in X.The quadratic cdf H q is the distribution function of an analogous distance but measured in a Minkowski metric where the unit sphere is the unit cube.It is of particular value in the statistical analysis of pixel images, where the squarebased metric is natural.The cdf's characterize in some sense the size of the complement X c of X, as introduced by Delfiner (1972).If, in the case of a porous medium, X is a model for the matrix, then X c is the union of all pores and the cdf's characterize the size of the pores.
The spherical cdf is also used in point process statistics, see Diggle (1983), where often the character F is used.Perhaps still more important for point processes is the nearest neighbour distance distribution function D (or G is Diggle's notation), which does not have a counterpart for general random closed sets.The cdf's and D play an important role in the characterization of the variablity of spatial structures.They are sometimes considered in secondorder stereology though they are not second-order characteristics in the classical use of the term 'secondorder' in probability and statistics.
For the statistical estimation of D and of cdf's there exist various methods, see Stoyan, Kendall and Mecke (1995).The classical estimators are minus-sampling or border estimators.Following Hanisch (1984) and Chiu and Stoyan (1998), the approach of Horvitz-Thompson (see Overton and Strehman, 1995) can be used, what leads to refined estimators.Finally Kaplan-Meier-like estimation is possible, see Baddeley and Gill (1997).
All these estimators are ratio estimators, which contain in the denominator an unbiased estimator of area fraction p or intensity λ .In the classical estimation procedure, p and λ is estimated from the whole window of observation, while the numerator is obtained only from a subwindow or by some form of edge correction.Consequently, numerator and denominator may show little correlation and are estimated with different precision.Thus it seems to be natural to ask for estimators where numerator and denumerator are (more) positively correlated and their precision is closer, even if this leads to a loss of precision of the denominator.A possible approach is to use adapted estimators of p and λ .Such a modification of classical ratio estimators has been shown to be very successful in the estimation of second-order characteristics such as the pair correlation function of random sets (Mattfeldt and Stoyan, 2000) and of point processes (Landy andSzalay, 1993, andStoyan andStoyan, 2000).
It is an open question which effect is possible if adapted estimators of p and λ are used in the estimation of D and cdf's.The present paper discusses such estimators for D and H q and compares their behaviour with that of the classical minus-sampling estimators.Since it is obviously very complicated to do rigorous calculations, which are very difficult even in the particular case of a Poisson process, the behaviour of these estimators is investigated by Monte Carlo simulations.
Simulations for Poisson processes, a cluster process and Boolean models lead to a clear result: For D the Horvitz-Thompson estimator introduced by Hanisch (1984) should be used, while for H q all the more sophisticated estimators are not better than the classical minus-sampling estimator if the criterion is the mean squared error.

ESTIMATORS OF THE NEAREST NEIGHBOUR DISTANCE DISTRIBUTION FUNCTION D VARIOUS D ESTIMATORS
The function D is the distribution function of the distance from a typical point of the analysed point process Φ in Ê d to its nearest neighbour, see Stoyan, Kendall and Mecke (1995).Φ is assumed to be stationary and to have intensity Before starting with the explanation of estimators of D, it is helpful to give all points of Φ in W two realvalued marks s and c.For a fixed point x s´xµ denotes the distance from x to its nearest neighbour in W and c´xµ is the distance from x to the edge of W .
The classical and perhaps most natural estimator of D is the minus-sampling or border-method estimator Dm , Dm ´rµ ∑ x;s 1 W ©b´o rµ ´xµ1 ´o r ´sµ Φ´W ©b´o rµµ (1) for r 0 where the sum in the numerator yields the number of points in the reduced window W ©b´o rµ with nearest neighbour closer than r and the denominator is the total number of points in W ©b´o rµ.
The summation goes here and elsewhere in this section over all marked point pairs x; s of Φ.The structure of this estimator may be clarified by writing the denominator as The estimator Dm is frequently used and yields for samples not too small acceptable or good results, see also below.As a function of r Dm ´rµ is not necessarily monotonous, see Fig. 4.14 in Stoyan, Kendall & Mecke (1995) and Fig. 9 in Baddeley et al. (1993).
Formula (1) can be rewritten as with Obviously, λm ´rµ is an unbiased estimator of λ , which could be called the minus-weighted estimator of intensity λ , and D m ´rµ is an unbiased estimator of λ D´rµ.Thus, Dm ´rµ is a ratio-unbiased estimator of D´rµ of the type described in the introduction, with adapted intensity estimator.One can expect positive correlation, between D m ´rµ and λm ´rµ, i.e., large values of D m ´rµ are connected with large values of λ m ´rµ.This relationship reduces fluctuations of Dm ´rµ and explains the good experience with the border-method estimator.Note that it does not help if the true value of λ would be known; replacing λm ´rµ by λ leads to much larger squared deviations in the estimation of D´rµ.
The Hanisch estimator of D´rµ uses all points in W with nearest neighbour in W and is defined as with ´λ H is pracically the same as D 4 ´Rµ on page 140 of Stoyan, Kendall & Mecke, 1995).D H ´rµ is an unbiased estimator of λ D´rµ, and λH is an adapted unbiased estimator of λ .D H ´rµ counts all points x with s´xµ c´xµ weighted by the volume ν d ´W © b´o s´xµµµ; it is so organized that it can be really determined using the information in the sampling window W .While DH ´rµ appeared in Hanisch (1984) as an ad hoc estimator, Baddeley (1998) showed that it is a Horvitz-Thompson estimator.
The intensity estimator λH is independent of r.This guarantees that DH ´rµ is monotonous in r.The authors do not know whether there is an estimator of λ which is better adapted to D H ´rµ and produces better estimates of D´rµ.
Unfortunately, Hanisch (1984) had presented (perhaps following a wrong recommendation by the first author D.S.) together with DH ´rµ (his formula (4)) also other D-estimators, for example Just this estimator appeared later in Cressie (1991), p. 638, and was also used in Baddeley et al. (1993).It is not an unbiased estimator and also not a ratiounbiased estimator.As simulations showed (see below), it has a large squared deviation and it should be forgotten.

COMPARISON OF D ESTIMATORS
In order to compare and to evaluate the various estimators ´D m DN and DH µ, they were applied to each 1000 simulated point patterns in the unit square and cube of Ê 2 and Ê 3 , respectively.
For Poisson processes of intensities λ = 50, 100 and 200 the following results were obtained.The biases of Dm and DH in the planar and spatial cases are small, typically negative, usually in the order of 0.001 . . .0.005.They are smaller for d = 2 than for d = 3 and smaller for DH than for Dm .Fig. 1 shows the estimation standard deviation s´rµ for the Hanisch estimator DH in dependence on r for λ = 50.The behaviour for λ = 100 and λ = 200 and for the spatial case is similar, the values decrease (for fixed window) with growing intensity λ or point number probably in proportionality to 1 Ô λ .The estimation standard deviations for Dm are similar.The form of the s´rµ-curve shown in Fig. 1 is quite natural: For r 0, where D´rµ and all its estimators vanish, and for large r, where D´rµ and its estimators are close to one, there is no much room for fluctuations, which appear for medium values of r.
As an example of a non-Poisson point process, a planar Gauss-Poisson process (as in Stoyan, Kendall & Mecke, 1995, p. 161) with parameters λ p λ p 1 p 2 p 3 1 3 and inter-pair distance 0.15 was analysed.This process belongs to the few number of processes which are not Poisson processes but for which there are known formulas for D´rµ.It is a Neyman-Scott cluster process with empty clusters, 'clusters' consisting of a single point and two-point clusters with constant distance between the points.
For this process, the biases turned out to be a bit larger than for the Poisson process, but the standard deviations s´rµ for the Hanisch estimator are quite similar to those for the planar Poisson process of equal intensity.The star in Fig. 1 marks the maximum of s´rµ for DH ´rµ in the Gauss-Poisson process case and λ = 50.Also here the biases for DN are much larger than for Dm and DH .Concluding, we recommend the use of the Hanisch estimator DH ´rµ in the form (3). It produces monotonous estimates with small biases and estimation variances.It is easy to implement, see Stoyan & Stoyan (1994), p. 296 (replace β there by β i ).

ESTIMATORS OF CONTACT DISTRIBUTION FUNCTIONS VARIOUS CONTACT DISTRIBUTION ESTIMATORS
Contact distribution functions (cdf's) are frequently used in the statistical analysis of point processes as well as random closed sets.In this section we concentrate on the case of random closed sets with positive volume fraction.As a practically important particular case the quadratic cdf H q is considered, which is of special interest in statistical analyses of pixel images.
Let X be a stationary random closed set in Ê d .Its volume fraction p satisfies p P´o ¾ Xµ, where o denotes the origin.It is assumed that p 0. The quadratic cdf is defined as where q´rµ is the cube of side length r with one vertex in o and sides emanating in o along the positive coordinate axes; it is q´rµ q´rµ.
For the case of a Boolean model the formulas in Stoyan, Kendall and Mecke (1995), p 79-81, lead to explicit expressions for H q ´rµ.In particular, if in the planar case the primary grains are isotropic squares of side length a (for this case the simulations were carried out), then where λ denotes the intensity of the germ process.
The classical minus-sampling or border-method estimator of H q ´rµ is Ĥq Note that formula (6.3.6) in Stoyan, Kendall and Mecke (1995) is corrected here.The term ν d ´W X c µ ν d ´W µ is the usual estimator of 1 p.
The estimator Ĥq ´rµ will be compared with another estimator of H q ´rµ, namely Ĥa q ´rµ.It differs from Ĥq ´rµ by different handling with volume fraction p: Ĥa ´rµ is defined as ĤHT q ´rµ but with the term p´rµ from above.Here, d´xµ denotes the distance from x to X measured in the metric corresponding to the unit cube.For a given pixel image, the integrals are replaced by sums in a natural way.
Four of the estimators introduced above were compared for a long series of simulated stationary and isotropic planar Boolean models.For each case the primary grains were isotropic congruent rectangles; the same rectangular primary grain was combined with a series of germ process intensities λ ; see Mattfeldt and Stoyan (2000) for more details.In total, 200 series with each 100 replications were simulated in a 512 ¢ 512 square.The statistical analysis was then carried out for the central 128 ¢ 128 square.
λ .It is observed in a sampling window W , which is a compact convex set of positive volume ν d ´W µ.In the case of a Poisson process of intensity λ D´rµ has the form D´rµ 1 exp´ λ b d r d µ for r 0 where b d denotes the volume of the unit sphere of Ê d .

Fig. 1 :
Fig. 1: Estimation standard deviation s´rµ for the Hanisch estimator DH in the case of a Poisson process in dependence on r for λ = 50.Star: maximum of s´rµ for Gauss-Poisson.