CHARACTERIZATION OF MAMMARY GLAND TISSUE USING JOINT ESTIMATORS OF MINKOWSKI FUNCTIONALS

A theoretical approach to estimate the Minkowski functionals, i.e., area fraction, specific boundary length and specific Euler number in 2D, and their asymptotic covariance matrix proposed by [17], [11] and [12], is applied to real image data. These two-dimensional images show mammary gland tissue and should be classified automatically as tumor-free or mammary cancer, respectively. The estimation procedure is illustrated step-by-step and the calculations are described in detail. To reduce dependencies from chosen parameters, a least-squares approach is considered as recommended by [3]. Emphasis is placed on the detailed description of the estimation procedure and the application of the theory to real image data.


INTRODUCTION
Breast cancer is the most frequent malignant tumor in women.In routine diagnostics, it is usual to perform a histopathological grading, which is based on a three-tiered scheme with grades I, II, and III (Ellston and Ellis, 1991;Mattfeldt et al., 2004).As the reproducibility of tumor grading is unknown for individual cases, many attempts have been made to arrive at an objective and quantitative grading of tumor structure.Let us consider here the tumor texture, which reflects the degree of differentiation of the tumor.The tissue may be conceived as a random set with different phases, which all possess a positive volume fraction.This means consideration of the tumor tissue as a volume process (Mattfeldt and Fleischer, 2005;Mattfeldt et al., 2006).It consists of three phases: tumor cells, stroma and lumina, which altogether account for 100% of the tissue.
In diagnostic pathology we deal with histological sections, i.e., very thin slices, onto which windows, usually of rectangular or quadratic shape, are placed for evaluation under microscopical view.Hence we are faced in practice with random closed sets in 2D, which may be quantified in terms of the three Minkowski functionals: A A , the mean area of the interesting phase per unit reference area (area fraction); B A , the mean boundary length of the interesting phase per unit reference area; and χ A , the mean Euler number of the interesting phase per unit reference area.Notably all these quantities have a stereological interpretation, hence they can be used for the estimation of stereological model parameters: (1) (2) where V V is the volume fraction, S V is the mean surface area per unit reference volume, and M V is the mean curvature density (Stoyan et al., 1995).Eqs.1-3 are all fundamental stereological formulae and hold for random closed sets under the conditions of isotropy and stationarity for arbitrary sections.Recently a new approach has been developed which allows a joint estimation of all three Minkowski functionals for a given image (Schmidt and Spodarev, 2005;Spodarev and Schmidt, 2005).It provides not only point estimates of V V , S V , and M V , but also estimates of their asymptotic variances and covariances.Up to now the aforementioned estimator has only been applied to simulated images, but not yet to real image material.As a first application to real images in a simple situation, we decided to compare mammary cancer tissue to normal (tumor-free) mammary tissue, see also earlier publications of our group (Mattfeldt et al., 1993;1996;2000;Mattfeldt and Stoyan, 2000).
The paper is organized as follows.In Section 'Mathematical Methods' the notation used throughout MATTFELDT T ET AL: Characterization of mammary gland tissue the paper is introduced.The specific intrinsic volumes are defined by means of Steiner's formula.The method given in Spodarev and Schmidt (2005) and Pantle et al. (2006a;b) to estimate these quantities and their asymptotic covariance matrix is described where some technical details are omitted.Section 'Application to image data' deals with the estimation of specific intrinsic volumes from real image data showing mammary tissue.The procedure for the twodimensional case is described in detail.With the estimated quantities a statistical test is considered to classify an image as tumor-free or as mammary cancer, respectively.The paper ends with a discussion and an outlook to further projects.

MATHEMATICAL METHODS
First we introduce some notation.For some fixed d ≥ 2, denote the family of convex bodies, i.e., compact convex sets, in For convex bodies K ∈ K it can be proven that there exist d + 1 functionals V j : K → [0, ∞) for j = 0, . . ., d, such that the volume of the so called parallel body K ⊕ B r (o) for r > 0 is given by Steiner's formula A proof of this formula can be found, e.g., in chapter 2 of Schneider and Weil, 1992.The functionals V j are called intrinsic volumes.They are related to the Minkowski functionals for all j = 0, . . ., d.The intrinsic volumes are not restricted to convex bodies.There is a unique additive extension to the convex ring R given by the inclusion-exclusionformula.For any polyconvex set K ∈ R, any n ∈ N and any convex bodies K 1 , . . ., for j = 0, . . ., d.Notice that the value of V j (K) does not depend on the particular representation of K as the union of convex sets K i .The proof of existence and uniqueness of this extension can be found in Schneider, 1993.The formula itself can be shown by induction using the fact that the intrinsic volumes are additive, i.e., V j ( / 0) = 0 and for all . Some of the intrinsic volumes have a nice geometric interpretation: V d (K) is the usual volume of K, dV d−1 (K) is the surface area of K and V 0 (K) is the Euler-Poincaré characteristic of K.
In the following let Ξ be a stationary random closed set in R d with values in the extended convex ring S almost surely.Let {W n } be a monotonically increasing sequence of compact convex observation windows Under appropriate assumptions, the expectation EV j (Ξ ∩W n ) is well defined and the limit exists for all j = 0, . . ., d, see, e.g., Schneider and Weil, 2000.The functionals V j (Ξ) are called specific intrinsic volumes of Ξ.In the two-dimensional case they are well-known under the notation A A = V 2 , B A = 2V 1 and χ A = V 0 and are connected to the stereological model parameters V V , S V and M V by Eqs.1-3.
To estimate the specific intrinsic volumes from a binary image we use a method developed in Spodarev and Schmidt (2005).It makes use of the local Euler-Poincaré characteristic, which is defined as the expected Euler number of Ξ in a neighborhood of a point x, i.e., as EV 0 (Ξ ∩ B r (x)).It can be shown that for any r > 0 and for any x ∈ R d it holds A proof of Eq. 9 can be found in chapter 5 of Schneider and Weil (2000), see also Spodarev and Schmidt (2005).Since this formula holds for any r > 0 we can plug in d + 1 pairwise different radii 0 < r 0 < . . .< r d , where we have to take care that the edgecorrected observation window W B r j (o) has positive volume for j = 0, . .Eq. 9 we get the following system of d + 1 linear equations Av = y, ( where . The matrix A is regular because the radii r 0 , . . ., r d are pairwise different and it can be computed without problems.With an appropriate estimator y of y we now get an estimator v of v by Since the local Euler characteristic, i.e., the vector y, can be estimated from one single image we also can estimate the vector of specific intrinsic volumes, i.e., the vector v, from one single image.To estimate the vector y of local Euler characteristics for different radii we consider the stationary random field for j = 0, . . ., d.The stationarity of Y j follows directly from the stationarity of Ξ.An unbiased estimator y j of y j = EY j (o) is given by where µ is an arbitrary probability measure concentrated on the reduced observation window the estimator y j is given by To study the variance of the estimator v of the specific intrinsic volumes we consider a sequence of observation windows {W n }, which satisfies condiditions Eq. 6 and Eq. 7.For each j = 0, . . ., d and for each n ∈ N we can estimate y j on W n by Under appropriate assumptions, the covariances exist for all i, j = 0, . . ., d, see, e.g., Schmidt and Spodarev (2005).If the covariances are absolutely integrable and some further assumptions are fulfilled (cf.Pantle et al., 2006a), then the random vector ) is asymptotically normally distributed with mean vector o and covariance matrix Σ = (σ i j ) d i, j=0 with σ i j = R d Cov i j (x)dx.Therefore the random vector is also asymptotically normally distributed with zero mean vector and covariance matrix The values of these estimators v j of the specific intrinsic volumes depend heavily on the choice of the radii r 0 , . . ., r d , see also the discussion in Klenk et al. (2006).To reduce this dependence a least-squares approach is considered.Let 0 < r 0 < . . .< r k−1 be k > d + 1 pairwise different radii.Similar to Eq. 10 we get a system of k linear equations with the difference that the vector y is k-dimensional and that A is not a squared matrix any more because it has k rows and d + 1 columns.Anyhow, the minimization problem has a unique solution given by The estimator v * of the vector of specific intrinsic volumes does not depend on the choice of radii as much as the estimator v in Eq. 12. Furthermore, the random vector ) is asymptotically normally distributed with zero mean vector and covariance matrix The asymptotic covariance matrix Σ can be estimated from the observation of the stationary random fields Y j defined in Eq. 13.For each n ∈ N let W n0 , . . .,W nk−1 ⊂ W n .Let {U ni j } be a monotonously increasing sequence of bounded sets with U ni j ⊂ W ni ⊕ Wn j and |U ni j | > 0 for all n ∈ N and for all i, j = 0, . . ., k − 1. Additionally let lim n→∞ U ni j = supp(Cov i j ) and let the sets U ni j grow smaller in comparison to W ni j , i.e., Furthermore, assume that For each n ∈ N and i, j = 0, . . ., k − 1 we consider the estimator (24) This sequence Σ n = (σ ni j ) of estimators of Σ is asymptotically unbiased, i.e., lim n→∞ E Σ n − Σ = 0, see, e.g., Schmidt and Spodarev (2005); Pantle et al. (2006b), where Σ = ∑ d i, j=0 σ 2 i j denotes the matrix norm.Under additional integrability conditions, Σ n is L 2 consistent for Σ, i.e., it holds lim n→∞ E Σ n − Σ 2 = 0.If Ξ is the Boolean model with primary grain M 0 for example, the integrability conditions are fulfilled if From the estimators Σ n described in Eq. 23 and Eq.24 we get a sequence of estimators Σ * n of the asymptotic covariance matrix Σ v * by APPLICATION TO IMAGE DATA Now we are ready to apply the statistical approach explained in the previous section to estimate the specific intrinsic volumes of real image data showing mammary tissue.There were ten cases of ductal mammary cancer tissue and ten cases showing normal, i.e., cancer-free, mammary tissue.From each case, a sample of 3 × 3 = 9 contiguous quadratic images was evaluated, where the first image was selected at random.Each image had a size of 510 × 510 pixels.The concatenation led to a large quadratic image with 1530 × 1530 pixels, which is needed for the estimation of the asymptotic variances.This means that the final observation window is given by the rectangle  [0, 1529] × [0, 1529].Fig. 1a shows tumour-free mammary tissue and Fig. 2a shows invasive ductal mammary carcinoma.The edgelength of 510 pixels corresponds to 0.4 mm at the scale of the specimen at this magnification, i.e., Figs.1a and 2a show only one of the nine contiguous images.The same images are shown in Figs.1b and 2b, respectively, after interactive segmentation of stroma, epithelium and lumina.The stroma is represented by the black phase, the grey phase forms the lumina and the white phase stands for the epithelium without the lumina, i.e., the tumor cells.These three phase images were converted into three binary images by combining two phases.The foreground of the resulting images may consist of one of the three grey phases (e.g., the white phase) or of the union of two of the three phases (e.g., white and grey).We understand these binary images as realizations of stationary and isotropic random closed sets (cf.Mattfeldt and Stoyan, 2000b).Since these images are two-dimensional the specific intrinsic volumes V 0 , 2V 1 and V 2 represent the mean Euler number per unit area, the mean boundary length per unit area, and the area fraction, respectively.
In the following, the procedure to calculate the least-squares estimator given in Eq. 20 is described step by step.
1. Choose the number k > 3 and the values of the radii r 0 < . . .< r k−1 .We put k = 15 and r i = 4.2 + 1.3i, i = 0, . . ., 14, following the recommendation in Klenk et al. (2006).In this particular case, the matrix A defined in Eq. 11 is given by  23,1506] for all radii r 0 , . . ., r 14 by computing V 0 (Ξ ∩ B r i (x)) for all pixels x ∈ W Br 14 (o) and averaging over all pixels.That means the estimator y from equation Eq. 16 is given in discretized form by An algorithm to estimate the local Euler characteristic for all radii simultaneously is given in Klenk et al. (2006).

Now, the estimation of the specific intrinsic
volumes is straightforward by computing the leastsquares estimator v * = (A A) −1 A y.
4. To estimate the asymptotic covariance matrix the theory says we need an unboundedly increasing sequence of observation windows, cf.Eq. 6.In practice we have only one concatenated quadratic image with fixed size, and we assume it is large enough.So we choose the averaging set U = B 300 (o) and estimate the variances and covariances using a discretized version of the estimator given in Eq. 23, Eq. 24 and Eq. 25 where the integrals are replaced by sums.
In the present study, the application of the joint estimator described above to the white phase of the images led to the following mean values for the stereological model parameters Eqs.1-3.In Table 1, the terms Std (V V ), Std (S V ) and Std (M V ) denote the means of the asymptotic standard deviations of the white phase per concatenated large image.Thus, they are not identical with the standard deviations of these model parameters 'between images within cases' and also not identical with the ordinary standard deviations of V V , S V and M V 'between cases within groups' .The latter may be computed using standard statistical formulae even with a table calculator; however, this does not apply to the asymptotic standard deviations.In addition, the covariances between V V , S V and M V were computed, but these are not reiterated here.In order to see which parameter discriminated best between the groups, the results were visualized graphically.For example, Figs. 3 and 4 show the estimated area fraction of the white phase and the estimated mean curvature density of the white phase per unit area, respectively.From Fig. 4 one can see that it is not possible to base the decision whether an image shows mammary carcinoma or not only on the mean curvature density.Also a statistical test of the mean surface area per unit volume alone does not lead to a sufficient discrimination of the groups.However, the area fraction of the white phase (Figure 3) seems to be a better parameter to categorize the images into two groups.Since this paper is focused on explaining the theory and the algorithm of the new estimation approach we will only test the area fraction which yields acceptable results.Anyhow, it is clear that the two groups in Fig. 4 belong to two different settings.Therefore one could combine, e.g., area fraction and Euler number and consider vectorial tests to strengthen the results of a one-dimensional test.With the approach described in the last section this is possible and will be done in a further paper.In the following, we write just 'area fraction' and omit the words 'white phase' for convenience.From the last section we know that the least-squares estimator is asymptotically normally distributed so we can construct a statistical test for the area fraction.Since the images have fixed size we don't have an unboundedly increasing sequence of observation windows W n .Instead we claim that the estimators are approximately Gaussian because our images are large enough.The null hypothesis states that 'the expected Image Anal Stereol 2007;26:13-22 area fraction in image j corresponds to the mean area fraction in images showing normal mammary tissue,' where the images 1-10 show normal mammary tissue and the images 11-20 show mammary carcinoma tissue.So if we write p j for the expected area fraction in image j and p 0 denotes the mean area fraction in images showing normal mammary tissue, the null hypothesis reads as H 0 : p j = p 0 .As we can see from Fig. 3, the estimated values of the area fraction in images showing mammary carcinoma tissue are greater than in images showing tumorfree mammary tissue.That's why we consider the one-sided alternative hypothesis H 1 : p j > p 0 .The significance level is α = 5 %.The unknown expected area fraction p j in image j is estimated by the leastsquares estimator given in Eq. 20 or, to be precise, by its third entry, and denoted by p j .When calculating the mean area fraction p 0 one has to distinguish, if the image whose area fraction we want to test shows mammary cancer or normal tissue.In the first case we define p 0 as the arithmetic mean of the estimated area fractions of the ten images showing tumor-free tissue, i.e., But if the considered image shows tumor-free tissue, we have to exclude it from the calculation of the mean, so we define p 0 by Now we can calculate the test statistic in image j, where σ 2 j denotes the estimated variance of |W |( p j − p 0 ), see equations Eq. 23 and Eq.24 and subsequent lines.In fact, it holds that |W | = 1530 in the considered case, because all images are squares with sidelength 1530 pixel.Slutsky's theorem yields that T j is approximately standard Gaussian.With the 95 %-quantile z 0.95 = 1.64 the critical range is (1.64, ∞), so the null hypothesis for image j is rejected if the value T j is greater than 1.64.The results are shown in Table 2.The null hypothesis is not rejected for the images showing normal mammary tissue, but it is rejected for all ten images showing mammary carcinoma tissue, which means they are classified correctly.
Of course, we can, in a certain sense, exchange null and alternative hypothesis and test the null hypothesis that 'the expected area fraction in image j corresponds to the mean area fraction in images showing mammary carcinoma tissue' or, shortly, H 0 : p j = p 0 .Here, p 0 denotes the mean area fraction of images showing mammary cancer tissue and as above there are two definitions for p 0 depending on what type of tissue the considered image shows, cf.Eq. 28 and Eq.29.Again, the alternative is one-sided H 1 : p j < p 0 and the test statistic is defined by The results of this test are shown in Table 3.  3 shows that the null hypothesis is rejected for all images showing normal tissue.But unfortunately there are three images out of the ten showing mammary carcinoma tissue for which the null hypothesis is rejected although it should not be.

DISCUSSION
Eqs. 1-3 are well-known fundamental stereological formulae, valid under the conditions of isotropy and stationarity in a model-based approach, and under the condition of IUR sampling from arbitrary structures in a design-based approach.However, there is a practical difference: the estimators of the model parameters in Eqs. 1 and 2 are very easy to implement with an image analyzer, but the estimation of the Euler number is non-elementary even in 2D.While estimation of V V and S V is already taught in basic courses on stereology, this does not apply for the Euler number.Nevertheless the Euler number is of interest for a quantitative characterization of carcinoma tissue of glandular origin, because the Euler number is directly linked to fundamental pathological tumor properties such as solid architecture where ideally χ > 0, tubular architecture where ideally χ = 0, and cribriform architecture where ideally χ < 0. These textures may arise in all types of adenocarcinomas.Sometimes a cribriform texture can be found in mammary carcinomas; in practice it is most important to recognize this texture component in prostatic carcinomas, where it is known to be associated with a poorer prognosis as compared to tubular differentiation.Furthermore, the usual stereological approach, even if it encompasses the Euler number, leads merely to point estimates of the model parameters, but does not provide an insight into the covariance matrix, i.e., the asymptotic variances and covariances of V V , S V , and M V of the three phases remain unknown.Point estimates for V V and S V of the epithelial phase of tumour-free mammary tissue and mammary carcinomas obtained from conventional stereological methods were previously published (Mattfeldt et al., 2000, see Table 1 therein).The results were very similar to the present study.This shows the good reproducibility (robustness) of the method, in which the images were segmented interactively.Fully automatic segmentation of mammary tissue into epithelium, lumen, and tumour cells by image analysis would be desirable, but this aim is difficult to achieve at the moment.The plausibility of the results was checked by using a theorem of Tomkeieff, which states that the mean length of intercepts through particle profiles, l 1 , is related to the Minkowski functionals of the particles by the equation l 1 = 4V V /S V (see Baddeley and Jensen, 2005, p. 33).According to this relation, one would expect values for l 1 ≈ 0.025 mm in the tumour-free group and l 1 ≈ 0.042 mm in the carcinoma group, which roughly corresponds to the visual impression, see Figs. 1b and 2b.In contrast to many other methods to estimate the specific intrinsic volumes the approach given in Spodarev and Schmidt (2005) yields not only the estimates, but also the (asymptotic) variances and covariances.All values can be computed from one large concatenated image.The classical approach to estimate the sample variance in each group is inappropriate here because we want to test each image separately.This might be interesting if there is only one image available in a practical application, which may be divided into subwindows.
In the special case of mammary tissue it turned out that the area fraction of the white phase, i.e., the part of the image that shows epithelial cells, is a good criterion to detect if an image shows mammary cancer tissue or not.The area fraction has the useful property of being independent of the magnification of the image.Although this was not important in our study, it may be relevant for the analysis of images where the scaling factor is unknown.A practical example for this situation has emerged more and more in the last decade.There are published datasets available in the internet consisting of images of various tumor types, e.g., also mammary and prostatic carcinomas, where expert groups have performed the histopathological grading for reference purposes.Usually, these images are given without any information on the final magnification, and often it will not be possible to retrieve the magnification factor any more.It would be attractive to perform a quantitative meta-analysis of these images by means of spatial statistics.Due to the aforementioned reasons, one will then be restricted to methods which are independent of the magnification.This holds for the V V component of the joint estimator described here.In contrast to the usual routine method, e.g., point counting, our method will also provide an estimate of the asymptotic variance, and thus yield valuable additional information.As one can conclude from Tables 2-3, the test on tumor-free tissue yields better results than the test on mammary carcinoma tissue.
The most common tumor types of the female breast are invasive ductal and lobular carcinoma.These designations indicate a tumor differentiation more similar to the ducts or to the lobules of the mammary parenchyma, respectively.For both types of breast cancers, an attempt is made towards grading of malignancy in routine diagnostics.This is important for prognosis prediction and therapy planning.It is intended to apply our method for the characterization of mammary carcinomas of different degrees of malignancy, and eventually to use it for the prediction of the grade of malignancy from spatial data, i.e., for the purpose of pattern recognition.Also it will be interesting to differentiate by this technique between ductal and lobular mammary carcinomas, which may be difficult in some cases (Mattfeldt and Fleischer, 2005).However, before these two more ambitious projects are put into practice, we thought it advisable to implement the methodology first in a simpler setting comparing tumor-free tissue to carcinoma tissue, where the differences between the classes of specimens are more pronounced.For the advanced applications, it may become useful to consider vectorial tests.With the described method it is possible to characterize the tissue high-dimensionally.If only one phase is considered, one obtains 9 instead of 3 numerical values per image (the three point estimates, the three asymptotic variances and the three asymptotic covariances of the Minkowski functionals).This rises to a whole bunch of characteristics if also the other two phases are taken into account.This will be subject of a further paper.
be the closed ball in R d with radius r > 0 centered at x and let o ∈ R d be the origin.Further, k j denotes the volume of the j-dimensional unit ball for j = 0, . . ., d.For two sets A, B ⊂ R d the Minkowski sum A ⊕ B and the Minkowski difference A B are defined by A ⊕ B = {a + b : a ∈ A, b ∈ B} and A B = {x ∈ R d : B+x ⊂ A}, respectively, where B = {x ∈ R d : −x ∈ B} denotes the set B reflected at the origin.
(a) Original image of tumour-free mammary tissue.(b) Segmentation of Fig. 1(a) leads to this image, which contains three phases: white-epithelial cells, gray-lumen, black-stroma.

Fig. 1 .
Fig. 1.Tumour-free mammary tissue.Haematoxylin-Eosin stain (a) and segmented image (b), respectively.The edgelength of the quadrat corresponds to 0.4 mm at the scale of the specimen at this magnification.

Fig. 3 .
Fig. 3.Estimated area fraction of the white phase.

Fig. 4 .
Fig. 4. Estimated mean curvature density M V (in mm −2 ) of the white phase per unit area.
. , d.Since the radii are numbered in ascending order this holds if |W B r d (o)| > 0. From

Table 1 .
Means of the estimated values and the means of their asymptotic standard deviations.The latter means are not equal to the usual standard deviations between the cases within the groups.