SEGMENTATION OF ANATOMICAL STRUCTURES BY CONNECTED STATISTICAL MODELS

This paper presents a framework for the segmentation of anatomical structures in medical imagery by connected statistical models. The framework is based on three types of models: first, generic models which operate directly on image intensities, second, connecting models that impose restrictions on the spatial relationship of generic models, and third, a supervising model that represents an arbitrary number of generic and connecting models. In this paper, the statistical model of appearance is used as the generic model, whiles the statistical model of topology, obtained by applying principal component analysis (PCA) on aligned pose and shape parameters of the generic model, is used as the connecting model. The performance of such connected statistical model is demonstrated on anterior-posterior (AP) X-ray images of the hips and pelvis and compared to the modelling by one and six unconnected generic models. The most accurate and robust results were obtained by two-level hierarchical modelling, wherein connected statistical models were used first, followed by unconnected statistical models.


INTRODUCTION
In the past two decades the role of medical imaging has expanded beyond simple visualization and inspection of anatomical structures for diagnostic purposes.Nowadays, medical imaging has become a fundamental tool for monitoring the progress of disease and effects of treatment, surgery planning and simulation, intraoperative navigation, radiotherapy planning and quantitative analysis.In many medical imaging applications image segmentation plays a crucial role, by automating or facilitating the delineation of anatomical structures and other regions of interest (ROI) (Bankman, 2000;Pham et al., 2000;Sonka and Fitzpatrick, 2000).For example, ascertaining the detailed shape and organization of anatomical structures enables a surgeon to preoperatively plan an optimal approach to operate on the target structure while avoiding vital healthy structures.In radiotherapy, segmentation allows planning the delivery of necrotic dose of radiation to tumors with minimal collateral damage to healthy tissue.Automated segmentation of anatomical structures, by which accurate, repeatable and quantitative data are efficiently extracted, remains a difficult task due to the complexity and tremendous natural variability of shapes of anatomical structures.Moreover, the position of the patient during image acquisition, the imaging device itself, and the imaging protocol induce additional variations in shape and appearance.To address these difficulties, deformable models of anatomical structures have been extensively studied and widely used in medical image segmentation with promising results (McInerney and Terzopoulos, 2000;Xu et al., 2000).Deformable models maintain the essential characteristics of a class of structures they represent but can also deform to fit a range of examples.Particular representatives of deformable models which incorporate a priori knowledge on pose, shape and appearance are the statistical models of shape (Cootes et al., 1994) and appearance (Cootes et al., 2001).A statistical model is a deformable model which in a compact way describes the information contained in a training dataset.These models were successfully applied to the segmentation of bony structures, e.g., vertebrae (Hill et al., 1996), spine (Smyth et al., 1997), knee joint (Cootes et al., 1998), hand (Mahmoodi et al., 2000), rib cage (van Ginneken and Haar Romeny, 2000) and hip/pelvis (Bernard and Pernuš, 2001).
Organizations of anatomical structures exhibit two kinds of shape variations, i.e., variations in shapes of individual structures and variations in spatial relationships of the structures.Such combined variations cannot be optimally described by a single statistical model unless variations of spatial relationships are sufficiently small and a sufficiently large training dataset is used (Smyth et al., 1997).Therefore, alternative approaches are required to describe the nonlinear shape variations of structures organizations.One approach is to apply piecewise linear models (Heap and Hogg, 1997) and the other is to separately model variations of shapes of individual parts and the variations of their spatial relationships (Mahmoodi et al., 2000).The problem with piecewise linearization as described by Heap et al. (1997) is that it only approximates the non-linear shape variations because it is not using a priori knowledge on the organization of structures.However even though Mahmoodi et al. (2000) did use the a priori knowledge, it was used only to initialize a model and not throughout the segmentation process.Bernard et al. (2001) introduced the statistical model of topology to model the spatial relationships of anatomical structures.They applied the principal component analysis on pose and shape parameters of all structures to extract the most significant eigen-topologies that describe anatomically plausible topological variations of structures.Their strategy can be viewed upon as a hierarchical PCA.The statistical model of topology that describes plausible topological variations of connected anatomical structures at the upper level of hierarchy is constructed from the sets of shape and pose parameters describing plausible variations of shapes and poses of individual structures at the lower level of hierarchy.Pham et al. (2006) used a Gaussian model on translation invariant shape descriptions to describe spatial relationships between object parts.They also proposed to model objects appearance by learning long-distances dependences between pixel values using Bayesian networks.De Bruijne et al. (2005) used the normal probability distribution to model spatial relationships between shape particle set and its expected neighbour.Additionally, multi-scale probabilistic Markov models have also been employed to characterize inter-object relationships (Lu et al., 2007).Tsai et al. (2004) applied PCA to the collection of multiple signed distance functions, wherein zero level sets are implicit representations of shapes, to obtain a coupling between the multiple shape models.Another type of spatial modelling was used in level-set segmentation (Litvin and Karl, 2005), where relative inter-object distances were used in an energy functional for a curve evolution.The coupling constraint dependent on all contours is introduced as an overlap penalty in energy functional (Zimmer and Olivo-Marin, 2005).The principal geodesic analysis (PGA) was applied to object pose and medial m-rep shape description (Styner et al., 2006).The problem of splitting image data into submodels for a given set of training images was addressed by Langs et al. (2007), where minimum description length (MDL) principle was proposed as a criterion function.Modelling of variations in spatial relationships of the anatomical structures depends on the type of the model used (linear, nonlinear, probabilistic model, etc.) and on information (pose, shape) that is being used to model spatial relationships.
In this paper, we present a general framework for simultaneous segmentation of a number of anatomical structures by connected statistical models.In contrast to Bernard et al. (2001), vectors of pose and shape parameters are aligned before they are submitted to PCA to generate the statistical model of topology.In this way, the spatial relationships of connected structures are better modelled.We demonstrate the feasibility of the proposed segmentation method by connected statistical models on anterior-posterior Xray images of the hips and pelvis.We show that better segmentation results can be obtained with a two-level hierarchical segmentation procedure using statistical models of shape, appearance and topology than with unconnected models.

METHODS GENERIC, CONNECTING AND SUPERVISING MODELS
The framework for image segmentation by connected statistical models consists of three types of models, i.e., a generic, a connecting, and a supervising model.A generic model (G) is a statistical model, which operates directly on image intensities and models an object of interest.A connecting model (C) imposes restrictions on spatial relationships of generic models.A supervising model (S) represents an arbitrary number of generic and connecting models.An arbitrary hierarchy of models may be built by connecting supervising models.

Generic model
A generic model G, G(u, f G , O A ), is defined by vector u, containing all parameters of the generic model, criterion function f G , which defines how well the model matches the underlying image I and an optimization procedure O A , which optimizes the vector Image Anal Stereol 2011;30:77-88 u in order to optimize the criterion function f G .Optimization O A starts with a set u 0 of initial parameters and when f G reaches the optimum, ends with a set of optimized parameters u OPT .If generic models are not connected, each generic model is optimized independently of other generic models (Fig. 1).
As a generic model of an anatomical structure we use the statistical model of appearance (Cootes and Taylor, 2001) which models both the shape and texture of an anatomical structure.The statistical model of appearance consists of two sub-modelsthe statistical model of shape maintaining the structure's shape variation behaviour, and the statistical model of texture maintaining the structure's texture variation behaviour.The statistical model of shape of an anatomical structure is constructed by first establishing N corresponding sets x i , of coordinates of n landmark points representing the shape of the structure in the i-th training image.In each of the N training images, points should be placed in the same way on the object's boundary.Next, the sets x 1 , x 2 , …, x N are aligned with respect to translation, rotation, and scaling (Goodall, 1991).The result of the alignment are aligned point sets After the alignment, there is still a substantial amount of variability in the coordinates of corresponding points, which is due to differences in shape of an anatomical structure across different training images.Given N aligned shapes, described by shape vectors X 1 , X 2 , …, X N , the mean shape X , is defined to be: Singular value decomposition (SVD) (Press et al., 1992) is next used to compute the eigenvalues and corresponding eigenvectors of the covariance matrix: (2) The eigenvectors corresponding to the largest eigenvalues of the covariance matrix represent the most significant modes of shape variation.It is sufficient to keep only some of the first t eigenvectors to sufficiently describe the variability of shapes in the set of training images.Any aligned shape X of a structure in the training set can be approximated by: where x Θ is the matrix of the first t eigenvectors and 1 ( ) , is a vector of shape parameters, i.e., projections of ( ) i − X X to the first t eigenvectors of x Θ .Let the vector u of a generic model contain the pose and shape parameters u, u = [v T , p x T ] T , corresponding to one image.To be comparable to the shape parameters, the pose parameters v are normalized.Usually the number of the first t eigenvectors is selected in such a way to describe some predetermined amount of variability.However, this is not the case in the connecting model, where the number of shape parameters (t) is fixed according to all generic models in order to achieve the same dimensionality of the pose and shape vector (u).The next step in building a statistical model of appearance is to warp, for instance with the thin-plate spline interpolation method (Bookstein, 1989), each training image so that its landmarks match those of the mean shape.In this way, "shape-free" images are obtained.The images are intensity normalized to eliminate global intensity variations between the images and raster scanned into texture vectors Y 1 , Y 2 , …, Y N .Next, PCA is applied to the shape-free and intensity-normalized texture vectors, yielding a linear model that characterizes the intensity variations: where Y is the mean normalized texture vector, y Θ , a matrix of significant modes of intensity variations of covariance matrix y Ψ , and, p y a vector of texture parameters.As a connecting model we use the statistical model of topology which is obtained by the same principle as the statistical model of shape.The construction of the statistical model of topology thus consists of two steps: topology alignment and principal component analysis.Let T ] T , i = 1,…,N, be a vector containing all or just some of the pose and shape parameters of M generic models extracted from each of the N images.A vector u i,j corresponding to image i and model j may contain all parameters of pose and shape u i,j = [v i,j T , p i,j T ] T , only some pose and shape parameters, or only parameters of pose u i,j = v i,j .The sets z 1 , z 2 , …, z N are first aligned (Fig. 2) which results in aligned topo- , and mean topology Z .Topology alignment clusters the sets of pose-shape vectors u before they enter PCA.Using the PCA, any aligned topology Z in the training set can be approximated by: where p z is a vector of topology parameters.
Besides applying PCA to the aligned topologies, PCA may also be performed on the set W of parameters of the geometrical transformation T -1 : This results in a reduced set of parameters p w that approximates the transformation T -1 , which may prove useful during segmentation, because segmentation process involves optimization of parameters.

Supervising model
A supervising model S, S(p sm , f SM , G, C, O A ) is a union of generic models G and connecting models C (Fig. 3).It is composed of a vector p sm , criterion function f SM= f SM (p sm ), vector of generic models G, vector of connecting models C, and optimization algorithm O A .The vector p sm , p sm =[p z T , p w T ] T contains the topology parameters p z and parameters p w of the transformation T -1 that are used to generate plausible topologies in the image segmentation process (Fig. 2).The supervising model represents geometrical (pose and shape vectors) and textural representation of the image (Fig. 3).Using the statistical connection model, whole collection of generic models is steered in the optimization process.At the level of supervising model, other segmentation strategies can be applied.For example, one could apply sequential generic model fitting, e.g.fit one or few models and then use connection model to predict initial poses and shapes for subsequent generic models.Another approach could be to connect generic models between two or more images of the same anatomical structures but with different modalities.

IMAGE SEGMENTATION BY UNCONNECTED AND CONNECTED STATISTICAL MODELS
The goal of segmentation is to find the landmarks of each of the M anatomical structures present in a previously unseen image I. Segmentation of an image I with statistical models is in effect an optimization procedure.We have explored two different statistical model-based segmentation strategies: one strategy with unconnected statistical models, and another strategy with connected statistical models.

Image segmentation by unconnected statistical models
By unconnected statistical models, each anatomical structure is segmented independently of other structures.We have adopted the segmentation approach of Bernard et al. (2001), which does not require optimization of texture parameters p y (Fig. 1).Given the image I to be segmented and the initial pose and shape parameters u 0 , u 0 = [v T , p x T ] T , of a statistical model describing one anatomical structure, a shapefree and intensity normalized image Ĩ and corresponding texture vector Y % are created and used to obtain the vector of texture parameters: ( ) Next, the texture vector Y % is transformed to texture vector Y T by the texture model with only t eigentextures: T ( ) the optimal parameters u OPT , u OPT = [v T , p x T ] T of pose (v = [v x v y v φ v s ] T ) and shape (p x ) are found.The internal energy f INT measures the extent of deformation of the model: while the external energy f EXT measures the similarity between images Ĩ and I T : ( ) where Ĩ is a shape-free and intensity normalized image, I T is a best representation of an image Ĩ obtained by texture model, and MI denotes mutual information between two images.However, different similarity measures, such as normalized mutual information or correlation coefficient, may be used.The external energy function f EXT (Eq.9) is value always positive and suitable for minimization.

Image segmentation by connected statistical models
Segmentation by connected statistical models is illustrated in Fig. 3. Training the statistical models of appearance and topology on a sufficiently large image database results in matrices Θ x , Θ y , Θ z and Θ w of eigenvectors and corresponding mean vectors of X , Y , Z and W of shape, appearance, topology and geometrical transformation, respectively.The segmentation starts with some initial values of topology p z and transformation p w parameters.Using the statistical models of topology (Eq.5) and geometrical transformation (Eq.6), the aligned topology Z and the set W of parameters of the geometrical transformation are first obtained.The parameters W are next used to derive the unaligned topology z which is further decomposed into pose and shape parameters The pose and shape parameters together with the image I are used to derived the energy functions [f G1 , …, f GM ] and landmark positions of the M generic models according to the scheme shown in Fig. 1, where each generic model's optimization loop is omitted.The optimization procedure O A (Fig. 3) optimizes a criterion function f SM to achieve the best fit of all generic models while keeping the deformation of the topology small.The function f SM of a supervising model is defined as the weighted sum of internal f INT and external f EXT energies: ( ) The external energy of the supervising model is a weighted sum of all M generic models' energies f Gi : The internal energy f INT corresponds to the deformation of the statistical model of topology: where Z and W contain the parameters of the aligned topology and geometrical transformation, respectively.

EXPERIMENTS AND RESULTS
We have applied the above segmentation strategies, i.e., strategies using unconnected and connected statistical models, to segment anterior-posterior X-ray images of hips and pelvis.

IMAGE DATABASE AND LANDMARKS
The statistical models were trained on the training image database consisting of 42 anterior-posterior Xray images of hips and pelvis.The resolution of an image was 1700x1200 pixels.To construct statistical models in total 68 landmarks were defined on each image; 16 on left hip, 16 on the right hip, and 36 on pelvis, by three experts using a purpose-specific software tool.The position of each of the 68 landmarks was selected as the position of that landmark out of three which was closest to the average of three.An example of an annotated X-ray image is shown in Fig. 4.

GENERIC AND CONNECTING MODELS OF THE HIPS AND PELVIS
Six generic models G 1 -G 6 were chosen to model the hips and pelvis and the 68 landmarks were assigned to these models (Fig. 4, left).To derive the statistical model of appearance, rectangular regions of interest, each covering all corresponding landmarks, were selected (Fig. 4, right).The statistical model of topology was used as the model that connected the 6 generic models.Fig. 5 shows unaligned and aligned topology vectors for 6 models and 41 images.The affine transformation was used for topology alignment.It is obvious that by topology alignment much more compact clusters are obtained than without alignment.Besides, if more pose and shape parameters are used to describe the topology more compact classes are obtained.However, as the number of pose and shape parameters (R) and the number of parameters of the affine transformation (R (R+1)) are related, a more complex topology model will result in optimization of more parameters.To reduce the number of parameters of the affine transformations we applied PCA on the parameters W, which resulted in vector p w .In special case, when M R equals R (R+1), the affine transformation maps all of the M pose and shape vectors of generic models to the average topology with zero variance (Fig. 5, right) and, therefore, optimization of topological parameters can be omitted.In this case (M=R+1), the number of unknowns (R (R+1)) equals the number of equations (M R).In any other case (Fig. 5, middle), geometrical information stored in landmarks positions is divided between topological and transformation parameters (Fig. 3), where former describes residual information after transformation/alignment and the latter the information about transformation that brings structures close to the average arrangement.Figure (Fig. 6) illustrates the effect of changing the first 4 PCA parameters of affine transformation (p w ) on the landmark position for the case shown in figure (Fig. 5, right), where topology parameters (p z ) have no effect.The non-linear effects of PCA parameters of the affine transformation (p w ) on the landmarks positions can be observed, e.g.first parameter correlates with generic models rotation parameter (Fig. 7).

G6 G1 G3
G2 G4 G5 Fig. 4. AP X-ray image of the hips and pelvis with superimposed landmarks belonging to 6 generic models (G 1 -G 6 ) (left) and the 6 regions of interest used to obtain the statistical models of texture (right).Different symbols represent landmarks belonging to different generic models. -

EXPERIMENTS
Four statistical model-based segmentation methods have been applied to the 42 AP X-ray images of the hips and pelvis.The first method (M1) modelled the whole image by only one statistical model of appearance.The number of shape parameters (p x ) was set to 15 to allow the single model to be flexible since it had to model 68 landmarks.The second method (M2) modelled the anatomical structures by 6 unconnected statistical models of appearance (Fig. 4).Each of them was independently optimized using 4 pose (v) and 5 shape parameters (p x ) and the strategy shown in Fig. 1.In the third (M3) method, the strategy, illustrated in Fig. 3, was applied.In method M3, the topology vector consisted of 4 pose parameters v and only the most significant shape parameter p x,1 , yielding u=[v x v y v φ v s p x,1 ] T .Because in M3 the topology parameters p z had no effect on landmark positions (see Fig. 5, right), the relations between generic models' poseshape vectors were modelled by the first twenty parameters p w of the PCA affine transformation.The fourth method (M4) used a two-level hierarchical strategy.First, the method M3 was applied, which was then followed by the segmentation strategy shown in Fig. 1 using 4 pose (v) and 5 shape parameters (p x ).The goal of the two-level strategy was to initialize the generic models' parameters within their capturing ranges by first optimizing the parameters of the supervising model and then let the unconnected generic models to fine-tune their parameters.All experiments were conducted on 42 AP X-ray images with a leave-one-out test.In all experiments, the number of texture parameters, used to represent a shape-free and intensity normalized image Ĩ, was set to 3. For the supervising and generic models we used the downhill simplex optimization method (Press et al., 1992) with 300 iterations.The texture vectors of generic models were sub-sampled in horizontal and vertical directions by 5 pixels for G 1 , G 3 , G 5 , by 4 pixels for G 2 , G 4 , and by 8 pixels for G 6 .In the first method M1, the texture was sub-sampled by 5 pixels.In all methods, we used the mutual information to estimate the external energy.Segmentation results obtained by each method were expressed as the average Euclidean distance (AED) in pixels between automatically and manually derived landmarks positions.Fitting of supervising model takes approximately 10 sec on Intel i7-860 processor, where main computation burden represents computation of generic models cost function (image interpolation, computation of image representation with texture model, cost function computation).The same amount of time is spend on sequential fitting of unconnected models.However, computation of generic models cost function could be parallelized in both cases (connected and unconnected) to speed up the optimization.

RESULTS
Fig. 8 illustrates the segmentation results in the form of box-whiskers diagrams, showing the minimum, maximum, median, 1st and 3rd quartile of the distances between automatically and manually derived landmarks positions.Segmentation using only one model for the hips and pelvis (method M1) performed the worst.Using individual, unconnected, models (method M2) resulted in more accurate landmark positions.Even better results were, however, obtained by the optimization of the supervising model (method M3) using the statistical model of topology.Finally, the results of the two-level optimization (method M4) in which individual generic models were independently optimized after they had been initialized at the first level, show that the landmarks have been most accurately segmented.Figure (Fig. 9) shows the manually defined landmarks and the landmark positions obtained by methods M1, M2 and M4 for 5 X-ray images.Table 1 shows the segmentation performance (median of AED distances in pixels) as a function of the number of PCA components of the affine transformation used in the statistical connection model.The results of this sensitivity analysis show that the number of PCA components in statistical connection model affects the segmentation performance, which was to be expected.Method M4 shows the best performance and was always better then M3 method.However, the differences in the performances were larger at small number of PCA components, while the differences at larger number of PCA components were much smaller.Both methods (M3 and M4), which are based on statistical connection model, yielded much better results (Fig. 8) than the method with one model (M1) or the method with unconnected models (M2).The two-layer scheme (Fig. 3) composed of statistical connection model with statistical shape model yields more accurate positions of the land-marks and makes more compact models with fewer parameters, which consequently yields better and more robust segmentation of anatomical structures.

CONCLUSION
We have presented a general framework for segmentation of anatomical structures by connected parametrical models.The framework with the statistical model of topology enables modelling of large variations of spatial non-linearity, which are due to complexity of anatomical structures and inter-patient and inter-acquisition variations in shape, appearance, and spatial arrangement of anatomical structures.The framework, incorporating the statistical model of topology has been tested on 42 AP X-ray images of hips and pelvis.It was shown that the two-level modelling of landmark positions with the statistical model of topology improved accuracy and robustness of the whole model.The results of the presented framework indicate that the framework and statistical modelling of topology is a promising and general tool for constructing robust supervising models, by which a number of generic models can be connected.

Fig. 1 .
Fig. 1.Generic model and steps involved in image segmentation by unconnected statistical model.

Fig. 2 .
Fig. 2. Topology alignment (left to right) of three connected models in training phase and plausible topology generation (right to left) during image segmentation.
p and the image I T corres- ponding to the Y T is obtained.By optimizing the energy function f G :

Fig. 3 .
Fig. 3. Steps involved in segmentation of anatomical structures by connected statistical models.A detailed scheme of a generic model used in this strategy is same as shown in Fig. 1 without the optimization loop of the generic model.

Fig. 5 .
Fig. 5. Unaligned topology z consisting of 4 pose and one shape parameter (left), aligned topology Z consisting of 4 pose parameters (middle) and aligned topology Z consisting of 4 pose and one shape parameter (right) for 6 models and 41 images.Only three dimensions are used to visualize the vectors.

Fig. 6 .Fig. 7 .
Fig. 6.Traces of landmark positions when changing the first four PCA parameters of the affine transformation (left to right) from -3 to 3 by a step of 1/3.

Fig. 8 .
showing the minimum, maximum, median, 1st and 3rd  quartile of distances in pixels between automatically (initial and four methods, M1-M4) and manually derived landmarks positions (left) and box-whiskers diagrams of distances relative to the initial displacement (right).

Fig. 9 .
Fig. 9. Results in the form of landmarks (o) obtained by methods M1 (left), M2 (middle), and M4 (right) and their displacements from the manually (x) defined positions shown for 5 X-ray images (rows).

Table 1 .
The median of AED distances in pixels between automatically and manually derived landmarks positions for the supervising model (method M3) and the two-level optimization (method M4), given as a function of the number of PCA components of the affine transformation used in the statistical connection model.