GEOMETRIC ANALYSIS OF PLANAR SHAPES WITH APPLICATIONS TO CELL DEFORMATIONS X

Shape analysis is of great importance in many fields such as computer vision, medical imaging, and computational biology. In this paper we focus on a shape space in which shapes are represented by means of planar closed curves. In this shape space a new metric was recently introduced with the result that this shape space has the property of being isometric to an infinite-dimensional Grassmann manifold of 2-dimensional subspaces. Using this isometry it is possible, from Younes et al. (2008), to explicitly describe geodesics, a task that previously was not at all easy. Our aim is twofold, namely: to use this general theory in order to show some applications to the study of erythrocytes, using digital images of peripheral blood smears, in the treatment of sickle cell disease; and, since normal erythrocytes are almost circular and many Sickle cells have elliptical shape, to particularize the computation of geodesics and distances between shapes using this metric to planar objects considered as deformations of a template (circle or ellipse). The applications considered include: shape interpolation, shape classification, and shape clustering.


INTRODUCTION
Shapes play an important role in understanding objects and shape analysis has impact in many areas such as computer vision, medical imaging, and computational topology.Historically, there have been different geometrical characterizations of planar shapes and, in particular, different approaches to represent the continuous boundaries as curves, and then study their shapes.
In this paper we consider the space of planar shapes represented by simple closed plane curves with the metric introduced in Younes et al. (2008).This space has the property of being isometric to an infinitedimensional Grassmann manifold of 2-dimensional subspaces.Therefore, explicit geodesics and distances between shapes can be computed using Jordan angles (Neretin, 2001).In fact, the representation of shapes as elements of infinite-dimensional spaces with a given metric is of interest at this time and has important applications (Klassen, 2004;Huckemann, 2011;Srivastava, 2011).Furthermore, erythrocytes shape deformations are related to different illnesses, e.g., sickle cell disease (SCD), that cause the hardening or polymerization of the hemoglobin that contains the erythrocytes.The cells are deformed and this results in a risk of various complications, for example vaso-occlusive crisis.The study of this deformation process using digital images of peripheral blood smears offers useful results in the clinical diagnosis of these illnesses.
The first aim of this paper is to apply the distance and geodesics derived from the metric introduced in Younes et al. (2008) in order to study different applications related to the morphological analysis of shape deformation of erythrocytes (which are represented as planar curves).In particular, three applications are presented here: interpolation between shapes; supervised classification and unsupervised clustering.
On the other hand, curve evolution of a simple closed curve, whose points move in the direction of the normal with a prescribed velocity, has been applied to a wide variety of problems such as smoothing of shapes, shape analysis and shape recovery.The second aim of the paper consists on considering geodesics between closed-template curves and boundaries (curves) obtained as deformations of these templates.The idea of describing objects -such as cells or leaves -as deformations of a template also appears in the literature.For instance, in Granader and Manbeck (1993), the authors use an ellipse with fixed eccentricity as template in an application in the detection of defects in potatoes; and in Hobolth et al. (2002), the planar objects considered are deformations of a star-shaped template.Since normal cells are almost circular and many elongated cells have elliptical shape, we will represent the shapes of cells as deformations of ellipses and circles (radius vector function).In fact, Fourier expansion of the radius-vector function has already been used in many applications; see for instance Lestrel (1997) for a review of biological applications, and Loncaric (1998) for a survey of the engineering literature.A statistical application can be found in Hobolth et al. (2003), and a review of several contour functions, including the radius-vector function, appears in Kindratenko (2003).
The paper is organized as follows: In the following section we give a basic résumé of the results obtained in Younes et al. (2008).Next, we define planar shape changes from a deformation of a template and we apply the results cited in the previous section to these particular curves.We consider the particular case where the template curve is an ellipse or a circle.Finally, we use the general results in Younes et al. (2008) and the results of section of the previous section, to solve different problems related to the morphological study of shape deformation of erythrocytes: interpolation between normal and sickle cells; supervised classification and unsupervised clustering.We also compare the advantages of using the general theory or this theory, particularized to deformation curves, in the study of erythrocytes.

THE SPACE OF PLANE SHAPES
In this paper we consider closed curves which are boundaries of planar shapes.That is, we consider the space of smooth planar immersed curves given by where S 1 is the unit circle and α (t) is the usual parametric derivative of α.M denotes the space of where ḣ(s) means derivative with respect to arc length, ḣ(s) • k(s) is the usual product in R 2 , and l(α) is the length of α.
The space of planar shapes will be the set M modulo translations, rotations, scalings, and reparameterizations of the curve (Di f f (S 1 )).However we consider first the pre-shape space M d , where the division by the group of diffeomorphisms Di f f (S 1 ) has not been considered.The group generated by translations, scalings and rotations is called the group of similitudes, abbreviated as sim; then, the pre-shape space is defined as and we associate to M d the restriction of the metric G α .
We prefer to use this apparently complex metric rather than the much simpler metric Klassen (2004)) because the latter presents some problems -for example, a geodesic starting at a point can degenerate to a curve of vanishing length and geodesics cannot be computed explicitly.Moreover, as we will see next, M d with this metric has the property of being isometric to a Grassmannian, and then distances and geodesics using this metric can be computed efficiently.

THE BASIC MAPPING
Let V be the vector space of all C ∞ mappings f : S 1 −→ R, with the norm Then, given two functions e, f ∈ V and assuming that our plane curves are curves in the complex plane C, the basic mapping is defined as From this definition it follows that a curve α(t) = Φ(e, f ) will be a closed curve (α(0) = α(2π)) if and only if ||e|| = || f || and e, f = 2π 0 e(x) f (x) dx = 0. Moreover the length of α is given by Let us now consider the Grassmannian Gr(2,V ) of unoriented 2-dimensional subspaces of V defined by an orthonormal pair (e, f ) ∈ V 2 with ||e|| 2 + || f || 2 = 2 (that is, l(α) = 1); and let Gr 0 (2,V ) be the subset of Gr(2,V ) defined by e, f with {t / e(t) = f (t) = 0} = / 0 (α need not be an immersion if e and f vanish simultaneously).f Although the mapping Φ has been defined on V × V , it can be restricted to orthonormal pairs in V ×V ; and then modified so as to divide out by rotations to get a new map (which is still denoted Φ) defined on Gr 0 (2,V ) (Younes et al., 2008).Then the magic of the map Φ is shown in the following result: Theorem 1 Φ defines an isometry Φ : Gr 0 (2,V ) −→ M d when Gr 0 (2,V ) is given with its natural metric and M d is given with the metric G α .
Given two functions e and f which define a point in the Grassmann manifold Gr 0 (2,V ), from the basic mapping Φ, we obtain a closed curve that is taken as the representative of its equivalence class in M d .On the other hand, given a curve α ∈ M we obtain the functions e and f , which correspond to the representative of α in M d , from the following result: Proposition 1 Let α ∈ M and let θ α (t) be the tangent angle function of α; it follows that (2) Proof 1 From the definition of tangent angle function . (3) On the other hand, from Eq. 1, we have Finally using the Euler formula and comparing Eqs. 3 and 4 we obtain the result.
As was said before, the curve given by Φ(e, f )(t) will be the representative curve of α ∈ M in M d .Since Φ(e, f )(0) = 0, the representative elements of M d satisfy that α(0) = 0 and l(α) = 1.

COMPUTATION OF DISTANCES AND GEODESICS IN M d
In order to compute distances and geodesic lines between any pair of shapes (closed planar curves) in the pre-shape space, we will consider the basic mapping and the distances and explicit geodesics in the Grassmannian given in Neretin (2001) and Younes et al. (2008).

THE SHAPE SPACE
We are interested in geometric curves, i.e., curves considered up to reparameterizations.The shape space is then defined as the quotient space .
The map Φ can be converted in an isometry if we consider freely immersed curves and we make both quotients into Riemannian submersions in Theorem 1.However, we will consider only diffeomorphisms that are translations.We will consider digital curves which correspond to cell boundaries with a fixed orientation and approximately equally spaced discrete points have been considered in these curves, we suppose that the curves Therefore, instead of working in the shape space S we will consider distances in the pre-shape space M d and we will define the distance between two closed immersions α and β , each of length 1, representing two shapes, as where φ ∈ Di f f + (S 1 ) is given by φ (t) = t + t 0 , where t 0 ∈ S 1 is a constant, and With the present approach (Younes et al. (2008)) and the use of constant speed parameterizations, the task of finding distances and geodesics in the shape space (that is, factoring out diffeomorphisms on S 1 ) is direct and does not require of sophisticated numerical methods, as in the shape space of Klassen (2004, Section 4), or in the shape space of Younes et al. (2008, Section 6) for general parameterized closed curves.

GEOMETRIC REPRESENTATION OF THE DEFORMATION OF A TEMPLATE
Curve evolution of a simple closed curve, whose points move in the direction of the normal with a prescribed velocity, has been applied to a wide variety of problems.In this section we will consider geodesic paths between a template curve (circle or ellipse) and a curve obtained as a normal deformation of the template.In the applications we will consider two approaches; the use of the general theory (Section "The space of plane shapes") and the results presented in this section, particularized to deformation curves.Therefore, in this section we consider two shapes: a template curve α and a curve β obtained as a normal deformation of α.We plan to obtain suitable expressions for the functions {e 1 (t), f 1 (t)} and {e 2 (t), f 2 (t)} such that α = Φ(e 1 , f 1 ) and β = Φ(e 2 , f 2 ).
Let α be a closed curve of length 1 parameterized by arc length and let β be a curve defined as where n(t) is the inner unit normal vector to α(t) and f (t) is a differentiable function which represents the Euclidean signed distance between the points α(t) and β (t).
Proof 2 Let t(t) = α (t) be the unit tangent vector.
Then, from Eq. 7, we obtain the geodesic in the shape space joining α to β .

ELLIPSES AND CIRCLES AS TEMPLATES AND THE RADIUS VECTOR FUNCTION
Since normal cells are almost circular and most sickle cells are almost elliptical, we study in this section, as a particular case, shapes of objects obtained as normal deformations of ellipses and shapes of objects obtained as normal deformations of circles.Since ellipses and circles are boundaries of convex sets we will parameterize them using their support functions.In the second case because deformations of a circle are boundaries of star shaped objects, we will also express them in terms of their radius-vector functions.
Let p(t) be the support function of a bounded convex set K; then the boundary of K, ∂ K; can be parameterized as α(t) = (x(t), y(t)) where (Santaló, 1976), Proposition 3 The two functions associated to the curve α by the basic mapping are: ). ( Proof 3 Under the identification of R 2 and C, we have Then, proceeding as in Proposition 2 and, as we want the length of the curve α to be equal 1, we obtain the result. When the curve α is an ellipse α(t) = {a cos(t), b sin(t)}, t ∈ [0, 2π], the support function is and the corresponding functions e 1 , f 1 to α are known from the above proposition, and we do not have to compute them numerically.On the other hand, and Proposition 2 can be used to obtain the functions {e 2 (t), f 2 (t)} corresponding to a curve β , obtained as a deformation of an ellipse.
For a = b = 1 the curve α(t) = {cos(t), sin(t)} is a circle and n(t) = −α(t); then, Therefore, r(t) = 1 − f (t) is the radius vector function of the curve β with respect to the origin.Now we present a way to obtain the functions {e 2 (t), f 2 (t)} corresponding to a curve β (t) = r(t) {cos(t), sin(t)}.
Proposition 4 Let θ β (t) be the tangent angle function of β ; then, Proof 4 Since κ α = 2π and from Proposition 2 we obtain the result.
In the second approach considered in the applications, we use two templates α: a unit circle and an ellipse, whose associated functions are given by Proposition 3. To compute the distance and geodesic from a cell to a circular template α using Proposition 4, we consider as radius vector r(t) the distance from the centroid of the cell to its boundary; and to compute the distance and geodesic from a cell to an ellipse α as a template using Proposition 2, we consider that α is the ellipse that best fit the boundary of the cell.As expected, in general, the distance of a normal cell to the circle is almost zero and the distance from a sickle cell to the ellipse is almost zero.

APPLICATIONS
There are many interesting applications of the geometric representation of planar shapes proposed here.In this section we show some of these applications and we use them to analyze digital images of peripheral blood smears in the morphological study of erythrocytes.
All these applications represent a novelty for two reasons.Firstly, the novelty lies in the use of the general theory (Prop.1, Eqs. 6 and 12) which has never been used for these specific applications.On the other hand, in some applications, additionally, we will make use of the computation of representatives for a normal deformation of an ellipse or a circle (Prop.4).
Erythrocytes shape deformations are related to different illnesses, e.g., SCD, that cause the hardening or polymerization of the hemoglobin that contains the erythrocytes.The cells are deformed and this results in a risk of various complications, for example vasoocclusive crisis.In Fig. 1 we see images of normal and deformed cells, they will be used in the following applications.The study of this process using digital images of peripheral blood smears offers useful results in the clinical diagnosis of these illnesses.
The images were obtained of blood specimens from patients with SCD.They were of 500 × 375 pixels, with 480 ppi resolution.They were taken in the Clinical Laboratory of the Special Hematology Department of the "Dr.Juan Bruno Zayas Alfonso" General Hospital, Santiago de Cuba.In order to accomplish segmentation we used methods that are based on contour evolution and that appear in the literature under the name of active contours, deformable models, etc.Three applications are presented here: interpolation between shapes; supervised classification and unsupervised clustering.

INTERPOLATION BETWEEN SHAPES
Geodesic paths between shapes cited in section "The space of plane shapes" can be used to interpolate between shapes.Interpolation between shapes can be useful to estimate intermediate shapes of cells which vary their shapes as time goes by, for example those affected by SCD.For example, given a normal and a sickle cell like those shown in Fig. 2, one can build the geodesic path between both shapes, as it is shown in this figure.In Fig. 3 some of the intermediate shapes in this process are shown.
On the other hand, taking into account that a normal cell should be 'circular', we can make profit of Propositions 3 and 4 (where r(t) is the distance from the centroid of the cell to its boundary).Using this option, calculations are considerably reduced.In Fig. 4, intermediate shapes between a "completely circular" shape and the same sickle cell of Fig. 3

SUPERVISED CLASSIFICATION
One of the most important issues in the study of SCD is the search for an efficient automatic classification method to quantify the number of deformed cells that a patient has and so gauge the severity of the illness.Nowadays, automatic classification of erythrocytes using digital images is a very active field of research (Horiuchi et al., 1990;Wheeless et al., 1994;Asakura et al., 1996;Kavitha and Ramakrishnan, 2005;Jayavanth et al., 2010;Frejlichowski, 2010).All these studies are based on the extraction of shape descriptors (either boundary-based or region-based) and the use of Euclidean distance between these descriptors.
In this section, we begin by proposing the use of the distance between planar shapes cited previously as the metric in a supervised classification algorithm and demonstrated the excellent results obtained in the automatic classification of normal erythrocytes, those with sickle or those with other deformations.
Blood specimens were obtained from patients with SCD and 45 images different fields were taken for this study.A specialist selected the cells to be studied, differentiating between normal ones, elongated ones and those with other deformations present in the images (Fig. 1).
To carry out the task of the supervised classification, we propose again two possibilities, the first one is based on the general approximation to distances between shapes (Proposition 1 and Eqs.6 and 12).The second approach is based on considering the cells as deformations of known templates.
In the first approximation all the distances between pairs of cells were calculated and the k-nearest neighbor algorithm for supervised classification (k-NN) was used (Cover and Hart, 1967).In order to validate the results, a 5 × 1 cross-validation process was carried out.The confusion matrix of the process is showed in Table 1, and the sensibility, precision and specificity measures in Table 2 In Table 2 we see that the elongated class had high values of sensibility and specificity, with 96.19 % and 92.01 % respectively, and the normal class had 100.0 % of sensibility.No normal cell was classified as elongated nor vice versa.In the case of other deformations the sensibility decreased to 84.36 % due to several cells with other deformations having shapes very close to circular ones and in addition these elements differ a lot among themselves.This drop in accuracy is paralleled -though to a lesser extentin the values given for precision and specificity in the normal class.Nevertheless, a very few elements from the "other deformations" class were classified as elongated, the majority of misclassifications being to consider them as normal.The process had a general performance of 93.53 %.
In order to asses the goodness of our results, they were compared with those obtained using other approaches previously considered in the literature on analysis of erythrocytes images.In particular with descriptors used in Wheeless et al. (1994), Asakura et al. (1996) and Frejlichowski (2010).The first one is named the UNL-Fourier method and it is based on applying two transformations, firstly, the transformation of boundary points to polar coordinates (more precisely, UNL-transform) and secondly 2D Fourier transformation.The second one is based on two shape factors: circular-shape factor (CSF) (Wheeless et al., 1994) and elliptical-shape factor (ESF) (Asakura et al., 1996).Results can be found in Table 3).In all the cases the proposed descriptors have similar or higher classification sensibility, specificity and precision.
Although the results obtained using the above methodology have been excellent, and the distances between all the pairs of shapes are almost explicitly available (this is, for instance, a key advantage over the shape space of Klassen (2004)); in order to reduce computational cost, we are going to consider now that normal and sickle cells correspond to known templates.
As we said before, normal cells are almost circular and all deformed cells correspond to deformations of a circle.However sickle cells are almost elliptical and we could assume that elliptical forms appear in their geodesic path from the circle.Considering this fact we propose to consider distances from each cell to a circle and to an ellipse in the classification method.The e and f functions of a circle and of an ellipse were obtained analytically, by using Eq. 15. Results given in section "Geometric representation of the deformation of a template" were used to calculate the representative of each cell and then the distances to the circle and to the ellipse.The axes of the ellipse used as a template Linear discriminant analysis algorithm for classification with Leave-one-out cross-validation was used (Rao, 2009).The confusion matrix of the process is showed in Table 4, and the sensibility, precision and specificity measures in Table 5.In Table 5 we see that a high overall performance is also obtained (92.39 %), although it is slightly worse than that obtained in the previous case.The sensibility of normal and elongated classes is also high: 98.02 % and 98.57 %, respectively, and once again no normal cell was classified as elongated nor vice versa.

UNSUPERVISED CLUSTERING
Another interesting problem in the study of SCD is that -in classifying cells using a more complex system -the simplicity of SCD diagnosis is made more complicated.This more complex classification system is of interest in the diagnosis of other diseases and their own particular cell deformations, a study that merits its own importance.
Several studies consider cellular deformations produced by various diseases and have been the theme of study and interest for some years now.Morphological classification in 6 classes (Bacus et al., 1977), 12 classes (Frejlichowski, 2010) and 14 classes (Bacus et al., 1976) have been used.Other studies considering different shapes of cells depending on varying flow conditions in different blood vessels were carried out (Jayavanth et al., 2010).In all these studies homogeneous classes of deformations characterized by the morphology of the cells are defined.
In this section we propose to use distance between planar shapes cited previously as a dissimilarity measure in an unsupervised classification algorithm in order to define homogeneous classes of deformations.
As cluster procedure we will use a partitioning method called Partitioning Around Medoids (PAM).This method is a generalization of the well-known kmeans algorithm, which can be used with all types of data and dissimilarity measurement between objects.The PAM algorithm (in a similar way to k-means) is based on finding k representative objects (also known as medoids (Kaufman and Rousseeuw, 1990)) from the data set in such a way that the total of the dissimilarities within any given cluster is minimized.Medoids are representative objects in the clusters that always exist and we just have to compute the dissimilarities between cells.Unlike K-means, there is no need to calculate cluster Frechet means (Pennec, 2006) that would be a very complex task in our shape space.A gradient search algorithm in a infinite dimensional Riemannian manifold is required to find the Frechet mean (Woods, 2003).
PAM algorithm can be found in the Cluster package of the free software, R (R Development Core Team, 2009), a language and environment for statistical computing and graphics.
In most applications of clustering procedures and in particular in the problem that concerns us here, the number of groups k is not known in advance.In order to select the appropriate number of groups, we will run the clustering algorithm with different numbers of groups and we will choose (as suggested in Kaufman and Rousseeuw, 1990) the result with the largest average silhouette width.
We have to note that we want k ≥ 3 because at least we have three classes: normal, sickle and other deformations, but previously (in order to validate the clustering procedure) we apply the method to a supervised case where k = 3.In Table 6 the proportions of correctly classified objects are shown and we can see that results are excellent, with a sensibility value of 97.00 % in the detection of sickle cells, 100.00 % in the case of normal cells, and a general performance of the process of 87.67 %.We then used our methodology for k > 3. The optimal clustering has been achieved with k = 5.For this number of groups, the average silhouette width is 0.1592418.Results are shown in Table 7.In Fig. 5 we can see the medoids of each class.The medoids objects detected in normal and elongated classes were the same as in the process using 3 groups.The new study (with k = 5) verified that the detection in these classes was stable.
The cells in the generated new groups mostly would have belong to the class of other deformations in our earlier (k = 3) study.Thus we can infer that metrics allows additional grouping of several types of other deformations that show up in a study that is adequately accurate.No normal cell was classified as sickle and vice versa, and in the case of other deformation classes, the misclassification occurring was in great majority of cases of cells detected as normal.Only one cell in the case of five groups was erroneously detected as sickle.Thus we can say that the results were good.

PSEUDOCODES AND NOTATION PSEUDOCODES
The aim of this section is to present a description of both algorithms used to compute distances and geodesics between planar curves; from their general representation (Eq.2), and between an ellipse (or circle) and a deformation of this template (Propositions 2-4).Since most of the points of both algorithms are identical, we will list the steps of the first algorithm, and then we discuss the differences with the second one.

Algorithm 1 (Distances and geodesics between general digital curves)
Given c 1 and c 2 two digital closed curves, each of them given by an ordered set of N points approximately equally spaced: c i = c i (t) = {x i (t), y i (t)} t=1,••• ,N , i = 1, 2.
2. Compute the approximations of the functions given in Eq. 2, for t = 1, • • • , N. The tangent angle functions θ i (t) are derived from the cumulative angular function (Zhan et al., 1972) and |c i (t)|, although it should be constant, is approached from a second grade polynomial approximation using five points (Sauer, 2011).

CONCLUSIONS
In this paper we consider the metric introduced in Younes et al. (2008) and we apply it to calculate distances and geodesics between all pairs of a sample of closed planar curves, each one representing the boundary of an erythrocyte.From these distances and geodesics we show some practical applications in the morphological study of erythrocytes using digital images of peripheral blood smears of SCD: 1. Shape interpolation between erythrocytes.

Supervised classification of erythrocytes in
normal cells, sickle cells, and cells with other deformations, with a general effectiveness of the process of 93.53 % 3. Unsupervised classification of erythrocytes where the detection of the groups of normal and sickle cells in relation to the supervised classification remained stable while three new class of deformations were differentiated.
On the other hand, since normal erythrocytes are almost circular and many sickle cells have elliptical shape, we have adapted the general theory of Younes et al. (2008) to the particular case where we have a known convex template (in particular, circle or ellipse) and a normal deformation of it.Now, we only have to calculate the distance and geodesic of each shape in the sample, to a unit circle and to an ellipse, which is obtained from the mean value of the axes of the ellipses that best fit each of the sickle cells in the sample.
For the same sample of cells as in the preceding case, we have performed a second experiment, using templates (a circle and an ellipse) with the goal of reducing computational cost.In this case, the general effectiveness of the supervised classification process of erythrocytes in normal cells, sickle cells, and cells with other deformations was of 92.39 %.
With these results it can be affirmed that the method is feasible for morphologic study applications and can be used as support to the clinical diagnosis of the state of a patient with SCD.
The methodologies introduced in this paper could be also extended to other similar clinical applications.

Fig. 4 .
Fig. 4. Evolution from a circle to a sickle erythrocyte.

Fig. 5 .
Fig. 5. Medoids of each class obtained: Normal and Sickle classes above, the three classes of Other Deformations below.

Table 1 .
. Confusion matrix of supervised classification.

Table 2 .
Measures of supervised classification using the proposed distance.

Table 3 .
Sensibility, precision and specificity values for UNL-Fourier function and the elementary coefficients CSF and ESF.

Table 4 .
Confusion matrix of classification considering the distances of each cell to a circle and an ellipse.

Table 5 .
Measures of classification considering considering the distances of each cell to a circle and an ellipse.

Table 6 .
Unsupervised classification using the proposed distance: 3 groups generated.

Table 7 .
Unsupervised classification using the proposed distance: 5 groups generated.