LÉVY-BASED ERROR PREDICTION IN CIRCULAR SYSTEMATIC SAMPLING

In the present paper, Lévy-based error prediction in circular systematic sampling is developed. A model-based statistical setting as in Hobolth and Jensen (2002) is used, but the assumption that the measurement function is Gaussian is relaxed. The measurement function is represented as a periodic stationary stochastic process X obtained by a kernel smoothing of a Lévy basis. The process X may have an arbitrary covariance function. The distribution of the error predictor, based on measurements in n systematic directions is derived. Statistical inference is developed for the model parameters in the case where the covariance function follows the celebrated p -order covariance model.


INTRODUCTION
A long-standing problem in stereology is variance estimation in systematic sampling.One class of problems involves estimation of an integral of the form where x(θ ) is an integrable function on [0, 2π), called the measurement function.The estimator typically considered is based on circular systematic sampling and takes the form where Θ is uniformly distributed in [0, 2π n ).For instance, if Y is a bounded convex planar set containing the origin O, examples of Eq. 1 are where r(θ ) and h(θ ) are the radial function and the support function of Y in direction θ , respectively.The geometric identity (Eq. 1) is in these cases a consequence of polar decomposition in the plane and an identity for mean width (Schneider, 1993, Eq. 5.3.12).If instead Y is a bounded convex spatial set containing O, the volume and the surface area of Y may be estimated by a two-step procedure which involves circular systematic sampling in a section through O and the use of the cubed radial function or the squared support function (Gundersen, 1988;Cruz-Orive, 2005).Yet another example is volume estimation by the so-called vertical rotator (Jensen and Gundersen, 1993).
In Gual-Arnau and Cruz-Orive (2000), a designbased procedure of approximating the variance of Q n , based on modelling the covariogram of x(θ ) by a polynomial model, is developed.Hobolth and Jensen (2002) consider a model-based procedure, where the measurement function is assumed to be a realization of a periodic stationary stochastic Gaussian process X = {X(θ ) : θ ∈ [0, 2π)}.It is shown in Hobolth and Jensen (2002) that the covariogram model considered in the paper by Gual-Arnau and Cruz-Orive (2000) is a special case of a p-order covariance model for the stochastic process X in the model-based set-up.
The p-order covariance model is given by where the model parameters satisfy p > 1/2, α, β > 0.
In Hobolth et al. (2003), this parametric covariance function has been used in the modelling of the radial function of a random star-shaped planar particle.In this case, p determines the smoothness of the particles boundary while α and β determine the 'global' and the 'local' shape of the particle, respectively.(Note that in Eq. 2, λ 1 is set to zero which ensures that the reference point of the particle is approximately the centre of mass.)The model-based counterpart of the design-based methodology provided in Gual-Arnau and Cruz-Orive (2000) was further developed in Jónsdóttir et al. (2006), where the general form of the p-order covariance model (Eq.2) was used to obtain a more accurate approximation of the prediction error In Hobolth and Jensen (2002) and Jónsdóttir et al. (2006), the process X is assumed to be Gaussian.Motivated by the fact that powers of the radial function and the support function are used in practice, we will in this paper consider non-Gaussian models, obtained as a kernel smoothing of a so-called Lévy basis.As we will show, it is possible under the Lévy-based model to derive the distribution of the error predictor Q n − Q which may be markedly non-Gaussian for the moderate sizes of n used in practice.
Lévy-based modelling has been popular in recent years, e.g. in the modelling of turbulent flows, spatio-temporal growth, spatial point processes and random fields (Barndorff-Nielsen and Schmiegel, 2004;Jónsdóttir et al., 2008;Hellmund et al., 2008;Jónsdóttir et al., 2013).More specifically, we will consider stochastic processes of the form where µ determines the mean of the process, Z is a homogeneous and factorizable Lévy basis on [0, 2π) and k is a deterministic kernel function.In principle, any covariance model, including the porder covariance model, can be induced under this modelling framework, by assuming a specific form of the kernel function (see the next section).Under the porder model, it is easy to control the local and global fluctuations of the stochastic process X.The Lévybased models with p-order covariance thus constitute a flexible and tractable model class.In particular, this model class has more structure than the non-Gaussian models considered in Hobolth et al. (2003) and this allows us to derive distributional results.
The composition of the remaining part of the paper is as follows.First, a theoretical background for stationary periodic processes with period 2π, based on kernel smoothing of a Lévy basis, is given.Then, estimation of E( Q n − Q) 2 under the general Lévy-based model is discussed.The distribution of the error predictor Q n − Q under the Lévy-based model is derived, and it is shown how to estimate this distribution.An example of random particles simulated from a Lévy-based model is given together with the distribution of the n-point area estimator of these particles.Finally, a discussion is provided.Some technical derivations are deferred to an appendix.

LÉVY-BASED STOCHASTIC PROCESSES ON THE CIRCLE
This section provides an overview of stationary periodic processes on [0, 2π) based on integration with respect to a Lévy basis.For further details on the general theory on Lévy bases, in particular, the integration with respect to a Lévy basis, the reader is referred to Barndorff-Nielsen and Schmiegel (2004) and Hellmund et al. (2008).
Let X = {X(θ ) : θ ∈ [0, 2π)} be a 2π periodic stationary stochastic process on [0, 2π), given by where µ determines the mean of the process, Z is a homogeneous and factorizable Lévy basis on [0, 2π) and k is an even periodic kernel function with period 2π and a Fourier representation The model (Eq.3) is the continuous analogue of the following discrete model where the sum is over an equally spaced set of angles and the random variables Z(φ ) are independent and identically distributed, the common distribution being infinitely divisible.The integral in Eq. 3 is formally defined as a limit in probability (Rajput and Rosinski, 1989).A spatio-temporal version of Eq. 3 has previously been considered in Jónsdóttir et al. (2008).
The Lévy basis Z is extended by Z(A + 2πm) = Z(A) for all m ∈ Z and all Borel sets A ∈ B([0, 2π)).A Lévy basis has the property that Z(A 1 ), . . ., The assumption of homogeneity implies that all the finite-dimensional distributions of Z are translation invariant.
If Z is Gaussian, the integral in Eq. 3 exists if k is L 2 -integrable with respect to the Lebesgue measure on [0, 2π).When Z is a so-called Lévy jump basis (e.g., Gamma or inverse Gaussian Lévy basis), the integral exists if k is integrable with respect to the Lebesgue measure on [0, 2π) and if R |r|V (dr) < ∞, where V is the Lévy measure associated with Z.These results follow from Hellmund et al. (2008, Lemma 1).
An important entity associated with a Lévy basis is its spot variable Z which is infinitely divisible.Without loss of generality we will in what follows assume that the spot variable Z is centered, Z = W − E(W ), where W is an infinitely divisible random variable.
The following theorem characterizes the distribution of the stochastic variable X(θ ).
Theorem 1.The stochastic variable X(θ ) has the cumulant generating function where K W (t) is the cumulant generating function of W .Moreover, the derivatives of K X (t) are given by (when they exist) W (t) denotes the r'th derivative of K W (t). Proof.The result is obtained by using that the cumulant generating function of the integral f • Z = 2π 0 f (θ )Z(dθ ) of a function f with respect to a homogeneous and factorizable Lévy basis Z is given by where Z is the spot variable associated with Z, cf.Hellmund et al. (2008, Eq. 10).
It follows that the cumulants of the stochastic variable X(θ ) are given by where κ r (W ) denotes the r-th cumulant of the stochastic variable W . Possible choices of the distribution of W are the Gaussian, Gamma and inverse Gaussian distributions.When the kernel function is proportional to an indicator function, k(θ ) = c1 A (θ ) for A ∈ B([0, 2π)), the marginal distribution of X(θ ) will be of the same type as that of W . Otherwise, the marginal distribution will not be as simple, but the process X will inherent the name of the distribution of the underlying spot variable, e.g., when W is Gamma distributed, X is called a Gamma Lévy process, irrespectively of the choice of the kernel function.
We will typically assume that κ 2 (W ) = 1, i.e., the skewness and kurtosis of W are equal to the third and fourth cumulant, respectively.In Table 1, we give the cumulant generating function, third and fourth cumulants of W for the three distributions mentioned above.Note that as W has unit variance, the Gamma and inverse Gaussian Lévy bases are only determined by a single parameter η > 0. The Lévy measures V of the Gamma and inverse Gaussian Lévy bases satisfy the condition R |r|V (dr) < ∞ for the existence of the integral in Eq. 3, cf.Hellmund et al. (2008, Example 3).
In principle, any covariance function can be modelled within this set-up.This can be seen, using the theorem below.
Theorem 2. The stochastic process X has a mean value µ and a covariance function where we easily obtain Eq. 8.
Using Theorem 2, we can construct the candidate kernel k that induces a given covariance function c.For instance, if c follows the p-order model (Eq.2), then k is of the form specified in Eq. 4 with This choice of kernel will for a Gaussian Lévy basis give a well-defined integral in Eq. 3 if p > 1/2, while for a Gamma or an inverse Gaussian basis the integral is well-defined if p > 1.
Table 1.Examples of the distribution of W , together with the corresponding cumulant generating function, and the third and fourth cumulants.
In Hobolth and Jensen (2002), the focus was on the prediction error E( Q n − Q) 2 .If the covariance function of X has the following Fourier expansion then it was shown in Hobolth and Jensen (2002) that where λ nk is the Fourier coefficient of order n • k in the Fourier expansion of the covariance function of X.Note that in Hobolth and Jensen (2002), circular systematic sampling on [0, 1) instead of [0, 2π) is considered, so Eq. 9 represents an adjusted version of Hobolth and Jensen (2002, Eq. 8).
In Hobolth et al. (2003), a procedure for estimating the prediction error under a Gaussian p-order model was developed, based on a Fourier expansion of X X(θ When X is a periodic stationary Gaussian process, the Fourier coefficients of X become independent and normally distributed, A s ∼ B s ∼ N(0, λ s ).As suggested in Hobolth et al. (2003), the parameters α and β in the p-order model can then be estimated using maximum likelihood estimation based on the first S Fourier coefficients, ) where λ s (α, β ), s = 2, . . ., S, satisfy (2) and a s and b s , s = 2, . . ., S, denote discretized Fourier coefficients of X.
In this section, we will show that this procedure can be used under the general Lévy-based model.The following theorem gives the distribution of the Fourier coefficients and their relations under the general Lévybased model.In the Gaussian case, the theorem can be found e.g., in Dufour and Roy (1976).Theorem 3. The stationary Lévy-based stochastic process X can be written in terms of its Fourier coefficients as Moreover, the Fourier coefficients are pairwise uncorrelated and the Fourier coefficients of order s have the same distribution which is characterized by the cumulant generating function K Proof.Writing the kernel function in terms of its Fourier representation and then calculate the Fourier coefficients of X gives Eq. 11.The Fourier coefficients are uncorrelated as for all r, s ≥ 1, and a similar argument shows that K B s (t) = K U (ξ s t).
The cumulant generating function of A s and B s yield simple expressions for their cumulants, which are given by for s ≥ 1 and r ≥ 1.Here and in the following, we let for a positive integer n, This means that A s and B s have mean, variance, skewness and kurtosis of the following form: where γ 2 (W ) is the kurtosis of W .Moreover, the normalized Fourier coefficients of order s = 1, 2, . . ., obtained by Ãs = A s /ξ s and Bs = B s /ξ s , will all have the same distribution characterized by the cumulant generating function K U (t).
From Theorems 2 and 3, it follows that for a non-Gaussian Lévy basis Z the Fourier coefficients will be uncorrelated with variance λ s (α, β ).Furthermore, the distribution of the Fourier coefficients in the non-Gaussian and Gaussian model only differs in even cumulants of order four and higher.Therefore, Eq. 10 can be regarded as a pseudo-likelihood function for (α, β ) also in the non-Gaussian case.
Fig. 1 shows the small difference in the saddlepoint densities of the normalized Fourier coefficients and the Gaussian density for different values of η in the case of a Gamma Lévy basis.Furthermore, a simulation study indicated that the estimates of (α, β ) are robust against deviations from Gaussianity in the underlying distribution, but the mean square error of the estimates increases somewhat as η decreases.

THE DISTRIBUTION OF Qn -Q UNDER THE LÉVY-BASED MODEL
In the previous section, we have seen that the method developed in Hobolth et al. (2003) for estimating E( Q n − Q) 2 based on a Gaussian process is robust against departures from the distributional assumption.In this section, we will derive the distribution of the error predictor Q n − Q which may be markedly non-Gaussian.
Theorem 4.Under the Lévy-based model, the error predictor is distributed as where The distribution of Q n − Q is characterized by its cumulant generating function Proof.Recall that where Θ is uniformly distributed in [0, 2π /n).Without loss of generality we can assume that Θ = 0 as the distribution of Q n does not depend on Θ.The result given in Eq. 12 is obtained by observing that the mean of the kernel functions is given by The expression for the cumulant generating function of As the cumulant generating function of Q n − Q has a simple form, its cumulants are easily available.In particular, it enables us to obtain a saddlepoint approximation of its density.An alternative is to use Theorem 4 for simulating the distribution of Q n − Q.
Example.Let us consider a Lévy-based model (Eq.3) for X with a Gamma Lévy basis Z and k chosen such that the covariance function of X follows a p-order model.Under this model the Fourier coefficients A s , and B s , s ≥ 1, of X(θ ) have mean, variance, skewness and kurtosis, .
This model may, for instance, be used to model the squared radial function of random star-shaped planar particles containing the origin.Fig. 2 shows examples of particles simulated from such a model, using different values of η.The value of p, α and β was p = 2, log α = 6 and log β = −3.We used η = 2, 4, 16; a low value of η corresponds to an underlying distribution with a heavy tail.The value of η controls the frequency and size of the irregularities of the boundary of the particles.Small values of η will produce particles with few large fluctuations on the particle boundary and less smaller fluctuations.Higher values of η will produce particles with more frequently occurring moderate fluctuations across the boundary.In applications, it is needed to estimate the parameter η of the underlying Lévy basis Z, i.e., the parameter of the distribution of W .For a given kernel function k, we have a simple expression for the cumulant generating function of X and its derivatives (when they exist), cf.Theorem 1.We suggest estimating the parameter η determining the Lévy basis by considering the saddlepoint approximation of the density of X(θ ).For more details on saddlepoint approximations, cf.Jensen (1995).The first order saddlepoint approximation of the density is given by f e K X (t(x))−t(x)x , where t(x) is the solution to the saddlepoint equation Note that the saddlepoint equation is non-linear when Z is non-Gaussian, but can be solved numerically, using a Newton method for a given kernel function k.A better approximation of the density of X(θ ) is obtained by multiplying the density with the factor Given an estimation of the kernel function k we can establish a pseudo-likelihood function based on the observations of the stochastic process X given by Here, f (x(θ i )) is calculated using the approximated kernel function where ξs = λ s ( α, β )/π is obtained using the estimates of (α, β ).Note that we need to normalize the densities f (x i ) for each value of η, when maximizing the likelihood function L(η).If the estimated likelihood function is an increasing function of η, this suggests that the underlying Lévy basis is Gaussian.
The cumulant generating function and its derivatives have simple analytic expressions, when the underlying Lévy basis is a Gamma basis or Inverse Gaussian basis.These expressions can be derived by combining Theorem 1 and Table 1.For a Gamma Lévy process the cumulant generating function of X(θ ) and its derivatives are given by For an inverse Gaussian Lévy process the cumulant generating function of X(θ ) and its derivatives are given by In both cases, the saddlepoint approximation of the density of X(θ ) are easily obtainable using numerical integration and a Newton algorithm for finding the saddlepoint.Note that the saddlepoint approximation of the density function for an arbitrary η can be written in terms of the saddlepoint approximation of the density, the cumulant generating derivatives and saddlepoint solution for η = 1.

CONCLUSION AND PERSPECTIVES
We have developed a Lévy-based error prediction in circular systematic sampling.In contrast to previous model-based methods, we consider a flexible class of non-Gaussian measurement functions based on kernel smoothing of a homogeneous Lévy basis.In particular, we have derived the distribution of the error predictor in circular systematic sampling.The modelling framework allows us to consider in principle any given covariance structure of the measurement function, in particular the popular p-order covariance model which enables controlling the local and global fluctuations of the measurement function.
Relation to generalized p-order models.Note that as the Lévy-based process X is strictly stationary, it can be shown that X has a polar expansion of the form where the random variables are independent, cf. the Appendix, and E(Z s ) = 1.
Here, the variable Z s can be expressed as This shows that the Lévy-based models are closely related to the generalized p-order models proposed in Hobolth et al. (2003), but the Lévy-based models have more structure that allows for derivation of distributional results.
Modelling particles in 2D and 3D.The model (Eq.3) can be used directly to model the shape of featureless two-dimensional particles by assuming that a particle Y is a stochastic deformation of a template particle Y 0 .If r 0 is a radial function of a template particle, we let the radial function of Y be of the form R(θ ) = r 0 (θ ) + X(θ ), where X is a zero mean Lévy-based stochastic process.The strength of this technique is two-folded.Firstly, the global and local fluctuations of the Lévy-based stochastic process are controlled by the variance of the Fourier coefficients which are determined by the kernel function.Secondly, the underlying Lévy basis determines the frequency and size of the irregularities of the process.Finally, the methodology presented can be extended to model the shape of three-dimensional featureless particles, by considering Lévy-based processes on the unit sphere S 2 .Hansen et al. (2011) consider three-dimensional Lévy particles using different covariance models.The focus is here on the Hausdorff dimension of the boundary of particles obtained using a Gaussian basis.
Approximations of densities.The saddlepoint approximation was applied here to obtain an approximation of the density of X(θ ) in the Lévy-based stochastic model.As the cumulant generating function is easily obtainable for stochastic convolutions of the type where f : R n → R and Z is a homogeneous Lévy basis on R n , the saddlepoint approximation of the density is an attractive tool for studying Lévy-based convolution models in general.For the stochastic processes considered here, other types of approximations of densities can also be considered based on approximating A s and B s by differences of variables from the same family of distributions as W .As an example one could consider approximating the density of the Fourier coefficients by a type II McKay distribution (Holm and Alouini, 2004), when the underlying Lévy basis is a Gamma Lévy basis.

Fig. 3 Fig. 2 .
Fig. 3 shows the corresponding saddlepoint densities of the estimated area of the particles for two different values of n. η = 2 η = 4 η = 16