Shape from Texture using Locally Scaled Point Processes

Shape from texture refers to the extraction of 3D information from 2D images with irregular texture. This paper introduces a statistical framework to learn shape from texture where convex texture elements in a 2D image are represented through a point process. In a ﬁrst step, the 2D image is preprocessed to generate a probability map corresponding to an estimate of the unnormalized intensity of the latent point process underlying the texture elements. The latent point process is subsequently inferred from the probability map in a non-parametric, model free manner. Finally, the 3D information is extracted from the point pattern by applying a locally scaled point process model where the local scaling function represents the deformation caused by the projection of a 3D surface onto a 2D image.


Introduction
Natural images contain a variety of perceptual information enabling the viewer to infer the three-dimensional shapes of objects and surfaces (Tuceryan and Jain, 1998).Stevens (1980) observed that surface geometry mainly has three effects on the appearance of texture in images: foreshortening and scaling of texture elements, and a change in their density.Gibson (1950) proposed the slant, the angle between a normal to the surface and a normal to the image plane, as a measure for surface orientation.Stevens amended this by introducing the tilt, the angle between the surface normal's projection onto the image plane and a fixed coordinate axis in the image plane.In this paper, we will directly infer the surface normal from a single image taken under standard perspective projection.
Statistical procedures for estimating surface orientation often make strong assumptions on the regularity of texture.Witkin (1981) assumes observed edge directions provide the necessary information, while Blostein and Ahuja (1989) consider circular texture elements with uniform intensity.Blake and Marions (1990) consider the bias of the orientation of line elements isotropically oriented on a 3D plane, induced by the plane's orientation under orthographic projection, along with a computational approach related to Kanatani's texture moments (Kanatani, 1989).Malik and Rosenholtz (1997) locally estimate "texture distortion" in terms of an affine transformation of adjacent image patches.The strong homogeneity assumption underlying this approach has been relaxed by Clerc and Mallat (2002), to a condition that is difficult to verify in practice.Forsyth (2006) eliminates assumptions on the non-local structure of textures (like homogeneity) altogether and aims to estimate shape from the deformation of individual texture elements.Loh and Hartley (2005) criticize prior work due to the restrictive assumptions related to homogeneity, isotropy, stationarity or orthographic projection, and claim to devise a shapefrom-texture approach in the most general form.Their work, however, also relies on estimating the deformation of single texture elements, similar to Forsyth (2006).
We propose a general framework for inferring shape from near regular textures, as defined by Liu et al. (2009), by applying the locally scaled point process model of Hahn et al. (2003).This framework enables the simultaneous representation of local variability and global regularity in the spatial arrangement of texture elements which are thought of as a marked point process.We preprocess the image to obtain a probability map representing an unnormalized intensity estimate for the underlying point process, subsequently apply a non-parametric framework to infer the point locations and based on the resulting point pattern, learn the parameters of a locally scaled point process model to obtain a compact description of 3D image attributes.
Point process models have previously been applied in image analysis applications where the goal is the detection of texture elements, see e.g.Lafarge et al. (2010) and references therein.These approaches usually apply a marked point process framework, with marks describing the texture elements.Such set-ups rely on a good geometric description of individual texture elements, limiting the class of feasible textures.As our goal is not the detection of individual texture elements but the extraction of 3D information, we omit the modeling of each texture element and infer the latent point locations in a model free manner.Thus, our sole assumption regarding texture element shape is approximate convexity which offers considerable flexibility.
The remainder of the paper is organized as follows.The next section contains preliminaries on image geometry followed by the method section describing the image preprocessing, the point pattern detection and the point process inference framework.We then present results for both simulated and real images with near regular textures.Finally, the paper closes with a short discussion section.

Preliminaries
with δ = 1 and δ, X < 0, denote a 3D plane with unknown unit normal δ and distance h from the origin.We assume δ to be oriented towards the camera, forming obtuse angles δ, X < 0 with projection rays X.The world coordinates X = (X 1 , X 2 , X 3 ) and image coordinates x = (x 1 , x 2 ) are aligned as shown in Fig. 1.Here, we denote the image domain by D and assume the image to be scaled to have fixed area, |D| = a.
We consider the basic pinhole camera (Hartley and Zisserman, 2000) and among the internal parameters, we only look at the focal length f > 0 which depends on the field of view, see Fig. 1.As usual, we identify image points and rays of the projective plane through (2) An image point X given by (2) meets P in λX with , λ > 0.
(3) It follows that a point X P in P is related to the image point X through A homogeneous texture covering P induces an inhomogeneous texture on the two-dimensional image plane with density given by the surface element where λ 2 denotes the two-dimensional Lebesgue measure.Taking, for instance, the frontoparallel plane δ = (0, 0, 1) results by (2) merely in the constant scale factor (h/f ) 2 , i.e. the homogeneous density (h/f ) 2 λ 2 (dx).However, for arbitrary orientation δ, this factor depends on X, as illustrated in Fig. 2. Eqn. ( 5) then quantifies perspective foreshortening and inhomogeneity of the texture, respectively, as observed in the image, and mathematically represents the visually apparent texture gradient.

Methods
In a first step, we apply image preprocessing that generates a probability map Y = {Y (x) : x ∈ D, 0 ≤ Y (x) ≤ 1} representing the spatial arrangement of texture elements in the image.To this end, two elementary techniques are locally applied: Boundary detection and the corresponding distance transform.The former step entails either gradient magnitude computation using small-scale derivative-of-Gaussian filters (Canny, 1986) or, for texture elements with less regular appearance, the earth-mover's distance (Pele and Werman, 2009) between local histograms.Inspecting in turn the histogram of the resulting soft-indicator function for boundaries enables one to determine a threshold and apply the distance transform.
In our framework, the texture elements are regarded as a realization of a marked point process where the underlying point pattern is latent.The value of the probability map Y (x) in x ∈ D denotes the probability that one of the latent points is located in x.To recover the latent point pattern based on the information in Y , we first search for local maxima in Y .That is, for some We then define a neighbourhood relation on Φ by setting where We may now write Φ as a union of disjoint neighbourhood components, Φ = ∪ i=1,...,n C i , where each x ∈ C i is neighbour with at least one point in C i \ x.Under the assumption that the texture elements are close to convex, two points x 1 and x 2 in Φ are neighbours if and only if they likely fall within the same texture element.Hence, we estimate the latent point process Ψ as Formally, a point process can be described as a random counting measure N (•), where N (A) is the number of events in A for a Borel set A of the relevant state space, in our context the image domain D. The intensity measure of the point process is given by Λ(A) = EN (A) and the associated intensity function is For a homogeneous point process, it holds that α(x) = β for some β > 0, while for an inhomogeneous point process where the inhomogeneity stems from local scaling (Hahn et al., 2003) we obtain For identifiability reasons, Prokešová et al. (2006) propose normalizing c η to conserve the total area of the state space.That is, they define the normalizing constant of the scaling function such that .This scaling function is particularly attractive in that locally scaled distances can be calculated explicitely, for any x i , x j ∈ D where d(•, •) denotes the Euclidean distance and d c (•, •) its scaled version.
Examples of exponentially scaled distances are given in Fig. 3.
Here, we employ the density in (5) as a scaling function where we choose spherical coordinates = (sin η 1 cos η 2 , sin η 1 sin η 2 , cos η 1 ) , with η 1 ∈ [0, u] and η 2 ∈ [0, 2π].The upper limit u restricting the range of the scaling parameter η 1 ensures that δ, X < 0 and therefore depends on the focal length f as well as on the size and location of the observation window D. As suggested by Prokešová et al. (2006), we normalize the scaling function such that (11) holds.That is, we solve It follows that Under the model in ( 5), the intensity function in (10) becomes with X = (x 1 , x 2 , −f ) as in ( 2).As a byproduct, the unknown plane parameter h cancels.
It sets the absolute scale and cannot be inferred from a single image.Furthermore, the scaling function is computationally tractable and, as for the exponential scaling discussed above, the scaled distance function is available in closed form, provided that the basic requirement δ, X i < 0 is fulfilled for all i = 1, . . ., n. Examples of scaled distances are given in Fig. 4. When compared with Fig. 3, we see that the perspective scaling in ( 15) results in similar distance scaling as the exponential scaling while it also provides a coherent description of the perspective foreshortening.
For a given image, we assume that the focal length f is known.It remains to estimate the parameters (β, η 1 , η 2 ) of the intensity function in (15) based on the estimated point pattern Ψ.The desired 3D image information, the slant and the tilt of the surface, may then be characterized by the scaling parameter estimates η1 and η2 .The parameter estimation is performed by maximizing the composite likelihood, see e.g.Møller (2010), that takes the form The maximum composite likelihood estimate for β is β = n/|D|.For the remaining two parameters-the parameters of interest in our setting-we maximize the function

Results
We first present the results of a simulation study where we analyse sets of 3D point coordinates sampled from either a perfectly regular pattern or a homogeneous Poisson processes and subsequently projected onto the 2D-plane see Fig. 2 and Fig. 5.
2 ) Figure 5: Simulated Poisson point patterns with 3D shape given by the outer normals in the subfigure captions.The internal parameters correspond to the settings in Fig. 2 and Fig. 4.
We estimate the scaling parameters associated with the synthetic patterns via the composite likelihood in (18).The true parameter values and the corresponding estimates are given in Table 1.While the estimation procedure is able to reconstruct the true values with a resonable accuracy, the results are slightly better for the regular patterns than for the random patterns.These results are representative for several further such examples (results not shown), and we conclude that the composite likelihood is able to identify the scaling parameters of the perspective scaling function irrespective of the second order structure of the point pattern.
Table 1: True angles and composite likelihood estimates for the surface normals of the simulated point patterns in Figures 2 and 5. Regular pattern type refers to the images in Figure 2 and Poisson type to the images in Figure 5.
Pattern type (η 1 , η 2 ) (η 1 , η2 ) For the analysis of real natural scenes, we apply our methodology to the set of tiling and brick images shown in Fig. 6.The original images are of size 1280 × 960 pixels and during the preprocessing they are downsided to 1066 × 846 pixels in order to eliminate boundary effects in the point detection.The probability maps and the resulting point patterns are shown in Fig. 7.We have here applied neighbourhoods of sixe 75 × 75 pixels for the tiling scenes and 55 × 55 pixels for the bricks scene, with a threshold of k 2 = 0.25 for the neighbourhood relation in all cases.The point detection is very robust in the selection of threshold value and threshold values from 0.15 to 0.5 have limited effects on the results.It is somewhat more sensitive to changes in the neighbourhood size; for the tiling images neighbourhoods from 55 × 55 to 95 × 95 result in similar scaling parameter estimates while for the bricks image, slightly smaller neighbourhoods seem to be needed.
For deriving the information on camera positioning and angle from the point configurations in Fig. 7, we project the point process realizations onto an observation window D of dimension [−0.69, 0.69]× [−0.50, 0.50].We further assume that the field of view corresponds to a standard wide angle setting of φ c = 54 • and hence take f = 0.98 as a basis, the same settings as we applied in the simulation examples above.The resulting scaling parameter estimates are listed in Table 2 and the 3D orientation of the camera toward the textures is illustrated in Fig. 6.

Discussion
This paper introduces a framework for extracting 3D information from a textured 2D image building on the recently developed locally scaled point processes (Hahn et al., 2003).The perspective scaling function quantifies perspective foreshortening and the resulting inhomogeneity of the texture.The framework is quite flexible regarding assumptions on the texture composition in that it only requires the texture elements to be close to convex in shape and it successfully extracts useful information related to camera orientation.The separation of image preprocessing and point detection on one hand and the estimation procedure for the scaling parameters on the other hand offers great flexibility.We believe that the locally scaled point process framework can be applied in more general settings to analyse point patterns in images, for instance, as a new additional inference step in the texture detection algorithms discussed in Lafarge et al. (2010) and references therein.Due to the low computational budget of our framework, it also seems feasible to combine it with image segmentation where 3D information is needed for several segments within an image, each of which might be covered with a different type of texture elements.
There are further considerable avenues for development.One area for future development is to build a large hierarchical framework where the three inference steps, the image preprocessing, the point detection and the parameter estimation, are joined in an iterative fashion.A fully Bayesian inference framework along the lines of the work of Rajala and Penttinen (2012) could also be an alternative to the composite likelihood estimation performed here.Future work will concentrate on embellishing our inference framework.

Appendix
In our data analysis, we assume that the image domain is normalized such that D ( It follows that
for some scaling function c η : R 2 → R + with parameters η.The scaling function c η acts as a local deformation in that it locally affects distances and areas.More precisely, ν d c (A) = A c η (x) −d ν d (dx), where v d denotes the d-dimensional volume measure and ν d c its scaled version for d = 1, 2.

Figure 6 :
Figure 6: Original natural scenes (left) and the estimated 3D orientation towards the camera (right).The field of view is assumed to be driven by a wide angle setting of φ c = 54 • .

Figure 7 :
Figure 7: Estimated probability maps and point configurations for the natural scenes in Fig. 6.

Table 2 :
Perspective scaling parameter estimates for the natural scenes in Fig.6.