2,257 research outputs found
Measuring Membership Privacy on Aggregate Location Time-Series
While location data is extremely valuable for various applications,
disclosing it prompts serious threats to individuals' privacy. To limit such
concerns, organizations often provide analysts with aggregate time-series that
indicate, e.g., how many people are in a location at a time interval, rather
than raw individual traces. In this paper, we perform a measurement study to
understand Membership Inference Attacks (MIAs) on aggregate location
time-series, where an adversary tries to infer whether a specific user
contributed to the aggregates.
We find that the volume of contributed data, as well as the regularity and
particularity of users' mobility patterns, play a crucial role in the attack's
success. We experiment with a wide range of defenses based on generalization,
hiding, and perturbation, and evaluate their ability to thwart the attack
vis-a-vis the utility loss they introduce for various mobility analytics tasks.
Our results show that some defenses fail across the board, while others work
for specific tasks on aggregate location time-series. For instance, suppressing
small counts can be used for ranking hotspots, data generalization for
forecasting traffic, hotspot discovery, and map inference, while sampling is
effective for location labeling and anomaly detection when the dataset is
sparse. Differentially private techniques provide reasonable accuracy only in
very specific settings, e.g., discovering hotspots and forecasting their
traffic, and more so when using weaker privacy notions like crowd-blending
privacy. Overall, our measurements show that there does not exist a unique
generic defense that can preserve the utility of the analytics for arbitrary
applications, and provide useful insights regarding the disclosure of sanitized
aggregate location time-series
sasdim: self-adaptive noise scaling diffusion model for spatial time series imputation
Spatial time series imputation is critically important to many real
applications such as intelligent transportation and air quality monitoring.
Although recent transformer and diffusion model based approaches have achieved
significant performance gains compared with conventional statistic based
methods, spatial time series imputation still remains as a challenging issue
due to the complex spatio-temporal dependencies and the noise uncertainty of
the spatial time series data. Especially, recent diffusion process based models
may introduce random noise to the imputations, and thus cause negative impact
on the model performance. To this end, we propose a self-adaptive noise scaling
diffusion model named SaSDim to more effectively perform spatial time series
imputation. Specially, we propose a new loss function that can scale the noise
to the similar intensity, and propose the across spatial-temporal global
convolution module to more effectively capture the dynamic spatial-temporal
dependencies. Extensive experiments conducted on three real world datasets
verify the effectiveness of SaSDim by comparison with current state-of-the-art
baselines
Correlating sparse sensing for large-scale traffic speed estimation: A Laplacian-enhanced low-rank tensor kriging approach
Traffic speed is central to characterizing the fluidity of the road network.
Many transportation applications rely on it, such as real-time navigation,
dynamic route planning, and congestion management. Rapid advances in sensing
and communication techniques make traffic speed detection easier than ever.
However, due to sparse deployment of static sensors or low penetration of
mobile sensors, speeds detected are incomplete and far from network-wide use.
In addition, sensors are prone to error or missing data due to various kinds of
reasons, speeds from these sensors can become highly noisy. These drawbacks
call for effective techniques to recover credible estimates from the incomplete
data. In this work, we first identify the issue as a spatiotemporal kriging
problem and propose a Laplacian enhanced low-rank tensor completion (LETC)
framework featuring both lowrankness and multi-dimensional correlations for
large-scale traffic speed kriging under limited observations. To be specific,
three types of speed correlation including temporal continuity, temporal
periodicity, and spatial proximity are carefully chosen and simultaneously
modeled by three different forms of graph Laplacian, named temporal graph
Fourier transform, generalized temporal consistency regularization, and
diffusion graph regularization. We then design an efficient solution algorithm
via several effective numeric techniques to scale up the proposed model to
network-wide kriging. By performing experiments on two public million-level
traffic speed datasets, we finally draw the conclusion and find our proposed
LETC achieves the state-of-the-art kriging performance even under low
observation rates, while at the same time saving more than half computing time
compared with baseline methods. Some insights into spatiotemporal traffic data
modeling and kriging at the network level are provided as well
Anomaly Detection on Graph Time Series
In this paper, we use variational recurrent neural network to investigate the
anomaly detection problem on graph time series. The temporal correlation is
modeled by the combination of recurrent neural network (RNN) and variational
inference (VI), while the spatial information is captured by the graph
convolutional network. In order to incorporate external factors, we use feature
extractor to augment the transition of latent variables, which can learn the
influence of external factors. With the target function as accumulative ELBO,
it is easy to extend this model to on-line method. The experimental study on
traffic flow data shows the detection capability of the proposed method
Understanding Mobile Traffic Patterns of Large Scale Cellular Towers in Urban Environment
Understanding mobile traffic patterns of large scale cellular towers in urban
environment is extremely valuable for Internet service providers, mobile users,
and government managers of modern metropolis. This paper aims at extracting and
modeling the traffic patterns of large scale towers deployed in a metropolitan
city. To achieve this goal, we need to address several challenges, including
lack of appropriate tools for processing large scale traffic measurement data,
unknown traffic patterns, as well as handling complicated factors of urban
ecology and human behaviors that affect traffic patterns. Our core contribution
is a powerful model which combines three dimensional information (time,
locations of towers, and traffic frequency spectrum) to extract and model the
traffic patterns of thousands of cellular towers. Our empirical analysis
reveals the following important observations. First, only five basic
time-domain traffic patterns exist among the 9,600 cellular towers. Second,
each of the extracted traffic pattern maps to one type of geographical
locations related to urban ecology, including residential area, business
district, transport, entertainment, and comprehensive area. Third, our
frequency-domain traffic spectrum analysis suggests that the traffic of any
tower among the 9,600 can be constructed using a linear combination of four
primary components corresponding to human activity behaviors. We believe that
the proposed traffic patterns extraction and modeling methodology, combined
with the empirical analysis on the mobile traffic, pave the way toward a deep
understanding of the traffic patterns of large scale cellular towers in modern
metropolis.Comment: To appear at IMC 201
Forecasting Network Traffic: A Survey and Tutorial with Open-Source Comparative Evaluation
This paper presents a review of the literature on network traffic prediction, while also serving as a tutorial to the topic. We examine works based on autoregressive moving average models, like ARMA, ARIMA and SARIMA, as well as works based on Artifical Neural Networks approaches, such as RNN, LSTM, GRU, and CNN. In all cases, we provide a complete and self-contained presentation of the mathematical foundations of each technique, which allows the reader to get a full understanding of the operation of the different proposed methods. Further, we perform numerical experiments based on real data sets, which allows comparing the various approaches directly in terms of fitting quality and computational costs. We make our code publicly available, so that readers can readily access a wide range of forecasting tools, and possibly use them as benchmarks for more advanced solutions
- …