3,146,052 research outputs found
Understanding spatial data usability
In recent geographical information science literature, a number of researchers have made passing reference to an apparently new characteristic of spatial data known as 'usability'. While this attribute is well-known to professionals engaged in software engineering and computer interface design and testing, extension of the concept to embrace information would seem to be a new development. Furthermore, while notions such as the use and value of spatial information, and the diffusion of spatial information systems, have been the subject of research since the late-1980s, the current references to usability clearly represent something which extends well beyond that initial research. Accordingly, the purposes of this paper are: (1) to understand what is meant by spatial data usability; (2) to identify the elements that might comprise usability; and (3) to consider what the related research questions might be
Refining Coarse-grained Spatial Data using Auxiliary Spatial Data Sets with Various Granularities
We propose a probabilistic model for refining coarse-grained spatial data by
utilizing auxiliary spatial data sets. Existing methods require that the
spatial granularities of the auxiliary data sets are the same as the desired
granularity of target data. The proposed model can effectively make use of
auxiliary data sets with various granularities by hierarchically incorporating
Gaussian processes. With the proposed model, a distribution for each auxiliary
data set on the continuous space is modeled using a Gaussian process, where the
representation of uncertainty considers the levels of granularity. The
fine-grained target data are modeled by another Gaussian process that considers
both the spatial correlation and the auxiliary data sets with their
uncertainty. We integrate the Gaussian process with a spatial aggregation
process that transforms the fine-grained target data into the coarse-grained
target data, by which we can infer the fine-grained target Gaussian process
from the coarse-grained data. Our model is designed such that the inference of
model parameters based on the exact marginal likelihood is possible, in which
the variables of fine-grained target and auxiliary data are analytically
integrated out. Our experiments on real-world spatial data sets demonstrate the
effectiveness of the proposed model.Comment: Appears in Proceedings of the Thirty-Third AAAI Conference on
Artificial Intelligence (AAAI 2019
Is spatial information in ICT data reliable?
An increasing number of human activities are studied using data produced by
individuals' ICT devices. In particular, when ICT data contain spatial
information, they represent an invaluable source for analyzing urban dynamics.
However, there have been relatively few contributions investigating the
robustness of this type of results against fluctuations of data
characteristics. Here, we present a stability analysis of higher-level
information extracted from mobile phone data passively produced during an
entire year by 9 million individuals in Senegal. We focus on two
information-retrieval tasks: (a) the identification of land use in the region
of Dakar from the temporal rhythms of the communication activity; (b) the
identification of home and work locations of anonymized individuals, which
enable to construct Origin-Destination (OD) matrices of commuting flows. Our
analysis reveal that the uncertainty of results highly depends on the sample
size, the scale and the period of the year at which the data were gathered.
Nevertheless, the spatial distributions of land use computed for different
samples are remarkably robust: on average, we observe more than 75% of shared
surface area between the different spatial partitions when considering activity
of at least 100,000 users whatever the scale. The OD matrix is less stable and
depends on the scale with a share of at least 75% of commuters in common when
considering all types of flows constructed from the home-work locations of
100,000 users. For both tasks, better results can be obtained at larger levels
of aggregation or by considering more users. These results confirm that ICT
data are very useful sources for the spatial analysis of urban systems, but
that their reliability should in general be tested more thoroughly.Comment: 11 pages, 9 figures + Appendix, Extended version of the conference
paper published in the proceedings of the 2016 Spatial Accuracy Conference, p
9-17, Montpellier, Franc
Spatial interpolation of high-frequency monitoring data
Climate modelers generally require meteorological information on regular
grids, but monitoring stations are, in practice, sited irregularly. Thus, there
is a need to produce public data records that interpolate available data to a
high density grid, which can then be used to generate meteorological maps at a
broad range of spatial and temporal scales. In addition to point predictions,
quantifications of uncertainty are also needed. One way to accomplish this is
to provide multiple simulations of the relevant meteorological quantities
conditional on the observed data taking into account the various uncertainties
in predicting a space-time process at locations with no monitoring data. Using
a high-quality dataset of minute-by-minute measurements of atmospheric pressure
in north-central Oklahoma, this work describes a statistical approach to
carrying out these conditional simulations. Based on observations at 11
stations, conditional simulations were produced at two other sites with
monitoring stations. The resulting point predictions are very accurate and the
multiple simulations produce well-calibrated prediction uncertainties for
temporal changes in atmospheric pressure but are substantially overconservative
for the uncertainties in the predictions of (undifferenced) pressure.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS208 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Reducing Spatial Data Complexity for Classification Models
Intelligent data analytics gradually becomes a day-to-day reality of today's businesses. However, despite rapidly
increasing storage and computational power current state-of-the-art predictive models still can not handle massive and noisy
corporate data warehouses. What is more adaptive and real-time operational environment requires multiple models to be
frequently retrained which fiirther hinders their use. Various data reduction techniques ranging from data sampling up to
density retention models attempt to address this challenge by capturing a summarised data structure, yet they either do
not account for labelled data or degrade the classification performance of the model trained on the condensed dataset. Our
response is a proposition of a new general framework for reducing the complexity of labelled data by means of controlled
spatial redistribution of class densities in the input space. On the example of Parzen Labelled Data Compressor (PLDC) we
demonstrate a simulatory data condensation process directly inspired by the electrostatic field interaction where the data are
moved and merged following the attracting and repelling interactions with the other labelled data. The process is controlled
by the class density function built on the original data that acts as a class-sensitive potential field ensuring preservation of
the original class density distributions, yet allowing data to rearrange and merge joining together their soft class partitions.
As a result we achieved a model that reduces the labelled datasets much further than any competitive approaches yet with
the maximum retention of the original class densities and hence the classification performance. PLDC leaves the reduced
dataset with the soft accumulative class weights allowing for efficient online updates and as shown in a series of experiments
if coupled with Parzen Density Classifier (PDC) significantly outperforms competitive data condensation methods in terms of
classification performance at the comparable compression levels
Regularized Principal Component Analysis for Spatial Data
In many atmospheric and earth sciences, it is of interest to identify
dominant spatial patterns of variation based on data observed at locations
and time points with the possibility that . While principal component
analysis (PCA) is commonly applied to find the dominant patterns, the
eigenimages produced from PCA may exhibit patterns that are too noisy to be
physically meaningful when is large relative to . To obtain more precise
estimates of eigenimages, we propose a regularization approach incorporating
smoothness and sparseness of eigenimages, while accounting for their
orthogonality. Our method allows data taken at irregularly spaced or sparse
locations. In addition, the resulting optimization problem can be solved using
the alternating direction method of multipliers, which is easy to implement,
and applicable to a large spatial dataset. Furthermore, the estimated
eigenfunctions provide a natural basis for representing the underlying spatial
process in a spatial random-effects model, from which spatial covariance
function estimation and spatial prediction can be efficiently performed using a
regularized fixed-rank kriging method. Finally, the effectiveness of the
proposed method is demonstrated by several numerical example
Forecasting with Spatial Panel Data
This paper compares various forecasts using panel data with spatial error correlation. The true data generating process is assumed to be a simple error component regression model with spatial remainder disturbances of the autoregressive or moving average type. The best linear unbiased predictor is compared with other forecasts ignoring spatial correlation, or ignoring heterogeneity due to the individual effects, using Monte Carlo experiments. In addition, we check the performance of these forecasts under misspecification of the spatial error process, various spatial weight matrices, and heterogeneous rather than homogeneous panel data models.forecasting, BLUP, panel data, spatial dependence, heterogeneity
- …
