SPATIAL EXTRAPOLATION OF ANISOTROPIC ROAD TRAFFIC DATA

A method of spatial extrapolation of traf(cid:2)c data is proposed. The traf(cid:2)c data is given by GPS signals over downtown Berlin sent by approximately 300 taxis. To reconstruct the traf(cid:2)c situation at a given time spatially, i.e. , in the form of traf(cid:2)c maps, kriging with moving neighborhood based on residuals is used. Due to signi(cid:2)cant anisotropy in directed traf(cid:2)c data, the classical kriging has to be modi(cid:2)ed in order to include additional information. To verify the extrapolation results, test examples on the basis of a well-known model of stochastic geometry, the Boolean random function are considered.


INTRODUCTION
A common difficult problem of large cities with heavy traffic is the forecasting of traffic jams.In this paper, a first step towards mathematical traffic forecasting, namely the spatial reconstruction of the present traffic situation from point measurements is done.To describe the traffic states, models of stochastic geometry and spatial statistics (or geostatistics) are used.A corresponding Java software that implements efficient algorithms of spatial extrapolation is developed.
This research is based on real traffic data originating from downtown Berlin.They were provided by the Institute of Transport Research of the German Aerospace Center (DLR).Approximately 300 test vehicles (taxis) were equipped with GPS sensors transmitting their geographic coordinates, velocity and status line (e.g., "free", "hired", "at the taxi rank", etc.) to a central station within regular time intervals from 30 sec. up to 3 min.The regularity of these signals depends on the taxi's status.Thus, a large data base of more than 13 million positions was formed since April 2001 (see Fig. 1).
In the present paper, a smaller data set (taxi positions on all working days from 30.09.2001 till 19.02.2002, 5.00-5.30p.m., moving taxis only) is considered.The observation window was reduced to downtown Berlin in order to avoid inhomogeneities in the taxi positions.To study traffic jams, the rush hour (5.00-5.30p.m.) was chosen.To produce road traffic maps, the velocities of all vehicles at time t are assumed to be induced by a realization of a spatial random field V (t) = {V (t, u)} where V (t, u) is a traffic velocity vector at position u ∈ R 2 and time t 0. The spatial structure of such random velocity fields makes the analysis of traffic-jam mechanisms possible.Thus, the spatial localization of traffic jams can be obtained by a threshold operation on the grey-scale image of the map of velocities V (t, u): a point u lies within the traffic jam region at time t if |V (t, u)| is smaller than a given threshold value, e.g., 15 kph.Since V (t, u) can be measured just pointwise at observation points u i , a spatial extrapolation of the observed data is necessary.Notice that the velocities strongly depend on the movement directions, e.g., the speed limits and consequently the mean velocities are higher on motorways than in downtown streets.Furthermore, the formation of traffic jams is also directional since a vehicle can influence only those vehicles moving behind it along the same road in the same direction.Moreover, the traffic speed at position u clearly depends on the traffic direction on the road, e.g., in directions of the city center or suburbs.
The classical extrapolation methods of geostatistics such as the ordinary kriging (see, e.g., Stoyan et al., 1997, Wackernagel, 1998) either make no use of additional information or provide measurements V (t, u + u i ) and V (t, u − u i ) with equal weights.Both these features are not relevant to the above problem setting.An extrapolation method designed for directional data, the so-called complex cokriging of velocities and their directions (see, e.g., Wackernagel, 1998) cannot be used here as well since there is no one-to-one correspondence between measurement positions u and traffic directions.An obvious counterexample is a crossroads.Thus, the standard extrapolation methods had to be adapted to our specific problem.Therefore, a modified ordinary kriging with moving neighborhood is described that allows to extrapolate directed velocity fields.First, the original data set should be split into N directionally homogeneous subsets.A data unit (u,V (t, u)) belongs to the data set i (i = 1, . . ., N) if the polar angle of the vector V (t, u) lies within the directional sector By convention, the zero polar angle corresponds to the eastward direction on the city map.Throughout this paper, we put N = 4. From the practical point of view, this is sufficient for the separation of opposite traffic directions and, simultaneously, keeps the amount of resulting data sets small.Nevertheless, in principle, any other N 4 could be used instead.
The above data sets should be extrapolated separately from each other.This yields N velocity maps corresponding to N directional sectors.
In what follows, the data from a given time interval [t 1 ,t 2 ] will be taken for extrapolation.To be precise, we put t 1 = 5.00 p.m. and t 2 = 5.30 p.m. Keeping this in mind, we shall omit the time parameter t in further notation.The observed velocities are not spatially homogeneous.Hence, the mean velocity field {m(u)} obtained by averaging the traffic velocities over all working days from 5.00 p.m. till 5.30 p.m. should be considered.As far as this mean field is subtracted from the original data, the deviations of actual velocity values are extrapolated in order to create the spatial field of velocity residuals.
This extrapolation method has been implemented in Java.Thus, a software library was developed comprising the estimation and fitting of variograms as well as the ordinary kriging with moving neighborhood.As far as it is known to the authors, it is the first complete implementation of such kriging methods in Java.An advantage of the Java programming language lies in its platform independence.Great attention was paid to the efficient implementation of fast algorithms.In contrast to classical geostatistics operating with relatively small data sets, this efficiency is of great importance for larger data sets with more than 10000 entries.For instance, the Java package for variogram fitting described in Faulkner, 2002 cannot be used for data sets with more than 1000 entries due to unacceptable runtimes.Efficient image processing and computational algorithms (see, e.g., Mayer et al., 2004) enabled us to drastically reduce the runtimes of the Java library.
The extrapolation method itself as well as the software quality are verified on the test example of a Boolean random function; see Serra, 1988.Remarkable features of this model are its simplicity of simulation and nice analytical description.For test purposes, 90 independent realizations of a Boolean model with a deterministic drift have been simulated.The quality of extrapolation is proved by means of statistical significance tests of the area fraction.It is shown that extrapolated images perfectly retain the essential structure of original test images.This justifies the application of the above method to traffic data.First, the mean velocity fields are estimated for all directional sectors.Then, the deviations from the mean of actual speed values are extrapolated for particular days and time intervals.On their basis, traffic-jam maps are created; see Figs. 19-21. Image Anal Stereol 2004;23:185-198 There are several interesting perspectives for further research.In particular, using methods recently developed in Heinrich et al., 2004, Klenk et al., 2004, and Schmidt and Spodarev, 2004, models of stochastic geometry can be statistically fitted to extrapolated traffic maps.In the next step, the fitted models can be used in order to predict future traffic states on the basis of currently incoming traffic data.Such spacetime prediction models as well as their applications to forecasting of traffic states will be discussed in a forthcoming paper.

RANDOM FIELDS
To model traffic maps, non-stationary random fields composed of a deterministic drift and an intrinsically stationary random deviation field, the so-called residual, are used.See, e.g., monographs Cressie, 1993 andWackernagel, 1998 for details.

DRIFT AND DEVIATION FIELD
Let X = {X(u), u ∈ R 2 } be a non-stationary random field with finite second moments Then X can be decomposed into a sum where m(u) = E[X(u)] is the mean field (drift) and Y (u) = X(u) − m(u) is the deviation field from the mean or residual.Clearly, it holds E[Y (u)] = 0 for all u.Assume that Y is intrinsically stationary of order two.Denote by (1) its variogram function.In practice, the field X can be observed in a compact (say, rectangular) window W ⊂ R 2 .Let x(u 1 ), . . ., x(u n ) be a sample of observed values of X, u i ∈ W for all i.The extrapolation method described in the next section yields an "optimal" estimator X(u) of the value of X(u) for any u ∈ W based on the sample random variables X(u 1 ), . . ., X(u n ).Among the variety of extrapolation techniques for non-stationary random fields (see, e.g., the universal kriging in Cressie, 1993, Kitanidis, 1997, Wackernagel, 1998), our approach is similar to the so-called kriging based on the residuals; see Cressie, 1993, p. 190.The main idea of the method is straightforward.First of all, an estimator m(u) for the drift m(u) has to be constructed.Then, the deviation field is formed and its kriging estimator Y * (u) is computed.
Finally, the estimator X(u) is given by If we suppose that the drift is known, i.e., m(u) = m(u) for all u then we know the exact values be a realization of the sample values of Y .The extrapolation of Y (u) can be performed either by simple kriging based on the covariance function or by ordinary kriging making use of the variogram γ(h); see Cressie, 1993, Kitanidis, 1997, Wackernagel, 1998.In what follows, the second method is used.

ORDINARY KRIGING WITH MOVING NEIGHBORHOOD THE KRIGING ESTIMATOR
A simpler version of the following ordinary kriging with moving neighborhood can be found in Chilès and Delfiner, 1999, pp. 201-210, Kitanidis, 1997, p. 71 and Wackernagel, 1998, pp. 101-102.Denote by 1 the usual indicator function The estimation involves only the sample random variables Y (u i ) such that u i is positioned in the "neighborhood" A(u) of u, i.e., u i ∈ A(u).Being an arbitrary set, this moving neighborhood A(u) contains a priori information about the geometric dependence structure of the random field Y .For instance, it could be designed to model the formation of traffic jams.In the case of a Boolean model, this set A(u) is influenced by the shape of the primary grain.
In general, A(u) can be a random closed set, i.e., A(u Under such general assumptions on A(u), the system of linear equations on the weights λ i looks much more complicated than (7) considered below.
In order to solve it, additional parameters such as crosscovariances of Y and Z should be estimated.Even in the case of uncorrelated fields Y and Z, it makes the extrapolation unnecessary complex.To avoid this, the present paper uses only deterministic sets A(u).
The normalizing condition on the weights λ i n ensures the unbiasedness of the estimator given in (4).
In other words, it holds even if the mean of Y is not zero.Moreover, this condition makes it possible to use variograms in (7) since variograms are negative conditionally semidefinite (see, e.g., Wackernagel, 1998, pp. 52-53).
The "optimality" of the estimator Y (u) means that its variance should be minimal, i.e., This classical minimization problem yields further conditions on λ i which can be written together with (5) in the following system of linear equations.For all i = 1, . . ., n with In order to solve this system of equations, the knowledge of the variogram function γ(h) is required.However, in most practical cases γ(h) is unknown and has to be estimated from the data y(u 1 ), . . ., y(u n ).

ESTIMATION OF THE VARIOGRAM
In applications, a variogram estimation method to be used should always be chosen in accordance with the data framework.The most simple and popular one is undoubtedly the estimator of Matheron (see, e.g., Chilès andDelfiner, 1999, Wackernagel, 1998).Its drawback is sensitivity to outliers.Among robust estimation methods, the trimmed mean estimator (see, e.g., Lehmann and Casella, 1998) as well as the estimators of Cressie-Hawkins (see Cressie, 1993) and Genton (see Genton, 1998a, Genton, 2001) should be mentioned.These methods are designed for noisy data but they are biased.
Since the traffic data seem to be not contaminated with outliers, the estimator of Matheron is used here.It is defined by where u i − u j ≈ h means that u i − u j belongs to a certain neighborhood U(h) of vector h and N(h) denotes the number of such pairs (u i , u j ) for i, j = 1, . . ., n.The choice of U(h) depends on the problem.In the present paper, the following segment of a circle is used: where (|x|, ϕ) and (|h|, ϕ 0 ) are the polar coordinates of x and h; δ , ε > 0. If γ is continuous then the estimator in (8) is asymptotically unbiased, i.e., lim δ ,ε→0 Under further assumptions on Y such as ergodicity, it is also strongly consistent, i.e., it holds lim almost surely.

VARIOGRAM MODELS
In practice, the estimated variogram γ cannot be substituted directly for γ in the system of linear equations (7).Trying this would make the numerical computation in (7) unstable because of the singularity of its coefficient matrix.Even in the case when this computation is possible its result is not correct.The reason for that is simple: γ(h) is not a valid variogram function since it Image Anal Stereol 2004;23:185-198 is not conditionally negative semidefinite.Hence, a valid parametric variogram model γ (the so-called theoretical variogram) should be fitted to the empirical estimator γ.In the following, some valid variogram models are considered.The corresponding fitting procedures are discussed later on.A popular isotropic variogram model is the exponential one (see, e.g., Cressie, 1993, pp. 61-63, Wackernagel, 1998, pp. 244-246): As shown in Fig. 17, the traffic data lead to empirical variograms that are clearly zonally anisotropic.Below, we consider zonally anisotropic variogram models constructed from isotropic ones (see Cressie, 1993, Wackernagel, 1998).Introduce where γ 1 (h) is an exponential isotropic variogram model with nugget effect a 1 > 0, sill b 1 and range c 1 .
The second term where is a rotation by the angle α around the origin and is a scaling transformation with scaling factors λ 1 , λ 2 along the coordinate axes.For a vector h = (h 1 , h 2 ), we have Level curves of γ 2 (h) are ellipses with main axes of polar angles α and α + π/2.The range values in these directions are equal to Fig. 2 shows the level curves of the variogram model ( 10

VARIOGRAM FITTING
Let γ(h) be an empirical variogram estimated from the experimental data {y(u i )} for the field Y and let γ β (h) be a theoretical parametric variogram model with parameter vector β = (β 1 , . . . ,β k ).
In the example mentioned above, we have In practice, only a finite number m of values γ(h 1 ), . . ., γ(h m ) can be computed.For two reasons, it is enough to confine computations to vectors h i of length |h i | < diam(W )/2.First, in most cases the behavior of the variogram in a small neighborhood of the origin is decisive for the adequate choice of the model.Second, for large distances |h| > diam(W )/2 the estimated values γ(h) are contaminated by noise due to edge effects.
In order to estimate the parameter vector β , the least-squares method is used.The generalized leastsquares method (see Genton, 1998b) minimizes the following function of β , where the weights can be chosen in accordance with the a priori assumptions on Y (see Cressie, 1993 for Gaussian random fields).If the distribution of Y is unknown, the classical weighting scheme can be applied: with the function to be minimized.In the case of traffic data, no a priori assumptions on the structure of Y have been made.Thus, the classical least-squares methods is used.
For isotropic random fields, one fits the onedimensional curve of a parametric variogram model γ β (|h|) to an empirical one.In the anisotropic case, γ(h) is computed for vectors h on a square grid with m points and is fitted by a two-dimensional parametric surface γ β (h), h ∈ R 2 .This can be done either by summing in (15) over all grid points h i or only over vectors h i in a certain direction of interest ϕ, i.e., Since traffic data is substantially anisotropic, the variogram model (10) has to be fitted to the data on the whole grid as well as in two directions with polar angles α and α + π/2.

DRIFT ESTIMATION
The mean field {m(u)} can be estimated from the data by various methods ranging from radial extrapolation (see, e.g., zu Castell et al., 2002 and references therein) to smoothing techniques such as moving average and edge preserving smoothing (see, e.g., Tomasi and Manduchi, 1998).In what follows, the moving average is used because of its ease and computational efficiency for large data sets.
By moving average, the value m(u) is estimated as where W (u) is the "moving" neighborhood of the point u and N u denotes the number of measurement points u i ∈ W (u). The choice of the neighborhood W (u) is arbitrary.For fast computation, we put W (u) to be a square with side length τ centered in u.
The estimator ( 16) yields arbitrarily smooth results for large moving neighborhoods W (u). Thus, an optimal side length τ should be found to fit the problem.In the traffic problem, τ must be small because edges of the surface {m(u), u ∈ W } are intrinsic to the image structure and have to be preserved by smoothing.
In all large cities, there are areas D of parks, forests, building blocks, etc. where no road-traffic data is available.By ( 16), this implies m(u) = 0 for all points u with W (u) ⊂ D. Consequently, such points u would automatically belong to traffic-jam regions and so contaminate traffic-jam maps with artefacts.To avoid this, the neighborhood W (u) of points u with N u = 0 has to be enlarged till it contains at least one observation point.In this way, meaningful average velocity maps are obtained that allow the correct analysis of traffic jams.Since X is not stationary and, consequently, m(u) is not constant the estimator ( 16) is biased.Nevertheless, in practical applications, the bias E m(u) − m(u) is small provided that the area |W (u)| is small and the net of observation points is spatially dense enough.

RESIDUALS FORMED WITH ESTIMATED DRIFT
In previous sections, it has been assumed that the drift m(u) was explicitly known.If it has to be estimated from the data, the theoretical background for the application of the kriging method breaks down.Indeed, kriging requires intrinsic stationarity of the field of residuals Y * (u) introduced in (2).This Image Anal Stereol 2004;23:185-198 requirement is clearly not satisfied even in the case of an unbiased estimator m(u) since the variogram is not equal any more to the variogram γ(h) of Y (see Chilès and Delfiner, 1999, pp. 122-125, Cressie, 1993p. 72, Wackernagel, 1998, p. 214) and depends clearly on u.
Despite these theoretical obstacles, practitioners continue to use the ordinary kriging of residuals with estimated drift based on the data y * (u i ) = x(u i ) − m(u i ), i = 1, . . ., n legitimized by its ease and satisfactory results.

ALGORITHMS AND IMPLEMENTATION IN JAVA
In the following, some efficient algorithms for spatial extrapolation are discussed.Their implementation in Java was integrated into the GeoStoch library GeoStoch, 2004 as a separate package.The software is supplied with detailed comments generated by Java-Doc complying with the Sun standards; see Niemeyer and Peck, 1996, pp. 80-81.

FAST ESTIMATION OF VARIOGRAMS AND DRIFTS
Matheron's estimator (8) requires all pairs of positions u i and u j with u i − u j ∈ U(h) to be found for each lattice vector h.For k lattice vectors and n positions, it costs k O(n 2 ) operations.By means of the binary search tree structure DTree, this complexity can be significantly reduced.
Such algorithm tessellates the searching space into rectangles and saves positions of actual measurements in a binary tree.Thus, searching k points from p costs in average k + log(p) operations; see Segewick, 1992.Since k p always holds, the average complexity of the search is O(log(p)).Additionally, the complexity of filling the tree with values is O(p log(p)).For variogram estimation, one stores p = n(n−1) 2 polar coordinates of the vectors between any two measurement points in a DTree.Thus, the overall complexity for the variogram computation is O(p log(p)) + k O(log(p)).For large square lattices with side length m > 200 (k = m 2 ), the difference in run times is significant!
The DTree structures can be used also for the fast computation of the moving average.There, measurement points lying in a certain square neighborhood should be found.The complexity of such computation can be estimated as mentioned above.

VARIOGRAM FITTING
In variogram fitting, one employs essentially known algorithms for the minimization of functions.The idea of all stochastic algorithms lies in cleverly modifying parameters of the variogram model at random till the maximal quadratic distance to the empirical curve becomes smaller than a critical value ε.This can be done for instance by means of genetic algorithms (see Goldberg, 1989) or the method of simulated annealing; see, e.g., Press et al., 2002, pp. 448-460.Genetic algorithms were implemented in Java and integrated in the GeoStoch library.The simulated annealing Java package JSimul is available from Mégnin, 2001.Additionally, one-dimensional variogram fitting by slices was implemented in Java by Faulkner, 2002.This Java package provides good GUI but poor runtime performance for large data samples.

TEST EXAMPLE: BOOLEAN MODEL
To test the performance of the above extrapolation method, one needs to generate synthetic data whose theoretical properties are known.In other words, one has to find a random field {X(u)} with known structure of distribution, variogram and shape of realizations that is easy to simulate.In the following, we construct such a random field on the basis of the so-called Boolean random field, a model that is classical in stochastic geometry.
For simulation of Poisson processes, see, e.g., Lantuéjoul, 2002, Stoyan et al., 1995.A Boolean random set Ξ with deterministic grains can be introduced as where Ξ 0 is the so-called primary grain and Ξ 0 + X i a grain translated to the germ position X i .The primary grain Ξ 0 can be an arbitrary compact set in R 2 .In the present paper, a rectangle with width a > 0 and height b > 0 is considered; see The variogram γ(h) of Y is given by For a vector h = (h 1 , h 2 ), the area To test the extrapolation quality on synthetic data, one simulates X and measures its realization x(u) at Image Anal Stereol 2004;23:185-198 a finite number of points u i .Then one extrapolates X from the data x(u i ) and compares the result with the original realization x(u).Measurement points u i are generated by an independent Poisson process Φ 1 with intensity λ 1 = 0.01; see Fig. 7.The intensity of Φ 1 is substantially higher than that of Φ since otherwise the information contained in the data is insufficient to reconstruct the original image.

SYNTHETIC DATA
Practically, the experiment described above should be repeated many times in order to reduce the randomness in the quality of results.In this paper, 90 realizations of X have been sampled.They yield 90 data sets each of them containing ca. 300 pairs (u i , x(u i )).These data sets correspond to the traffic data of a half an hour.The intensity of Φ 1 is chosen to produce in average about 300 measurement points to comply with the real traffic situation.
For simulations, we used the following parameter values: The mean area fraction is then p = 0.38121662.
In Fig. 4, a realization of Ξ with these parameter values is shown.By adding a circle in the middle of the picture and subtracting p, one obtains a realization of the random field X; see Fig. 6.

NUMERICAL RESULTS; RECONSTRUCTION OF SIMULATED IMAGES
To estimate the drift, moving average with the side length τ = 3 of the square neighborhood was used.Thus, the extrapolation method will use only those points u i that can potentially affect the value Y * (u).The extrapolation results Y * (u) and X(u) are shown in Figs. 12 and 13.The striking similarity of the images for X(u) and X(u) in Figs. 6 and 13, respectively, is a clear evidence for the high quality of the extrapolation method.

STATISTICAL TESTS FOR THE AREA FRACTION
The threshold image of Y * in Fig. 14 is a binary image that can be compared with the original image of Y in Fig. 4. Written in terms of functions, it is equal to 1{u ∈ Ξ} where Ξ = {u : Y * (u) 1/2 − p} and 1/2 − p = 0.11878339181.To quantify visual similarities in both images, statistical tests for the area fraction can be used, see Böhm et al., 2004.For each of 90 threshold images, the null hypothesis H 0 : p = p is tested vs. its alternative

Fig. 1 .
Fig. 1.Observed positions of test vehicles in downtown Berlin.
where a 0, b 0 and c > 0 are parameters with the following geometric meaning.The value of the nugget effect a measures the discontinuity of the realizations of Y at the microscopic scale.If a > 0 then the realizations of Y are not continuous.The sill b describes the variability of the data for greater distances |h|.The third parameter c is the range of correlation of Y which implies that the random variables Y (x) and Y (x + h) are almost uncorrelated for |h| > 3c.A parametric variogram model γ is called geometrically anisotropic if the range value c (and none of the other parameters) depends on the direction of h.If, in addition, the sill value b depends on the direction of h, the variogram is called zonally anisotropic.

Fig. 11 .
Fig. 11.Difference between the fitted theoretical model γ * (h) and the empirical variogram γ * (h).The knowledge of the grain shape (17) can be integrated in the indicator functions of the kriging with moving neighborhood.Put the set {u i ∈ A(u)} in (4) to be equal to{|x − x i | a, |y − y i | b}where u i = (x i , y i ) and u = (x, y) denote the Euclidean coordinates of points u i and u.
, level curves are colored in accordance with the increasing variogram values from green, blue and yellow to red, where the zonally anisotropic behavior of γ * (h) near the origin becomes clear.The variogram values are low in a narrow sector at the polar angle of approximately 170 • , i.e., traffic velocities are highly correlated in this direction.