97 research outputs found
A Simple Baseline for Travel Time Estimation using Large-Scale Trip Data
The increased availability of large-scale trajectory data around the world
provides rich information for the study of urban dynamics. For example, New
York City Taxi Limousine Commission regularly releases source-destination
information about trips in the taxis they regulate. Taxi data provide
information about traffic patterns, and thus enable the study of urban flow --
what will traffic between two locations look like at a certain date and time in
the future? Existing big data methods try to outdo each other in terms of
complexity and algorithmic sophistication. In the spirit of "big data beats
algorithms", we present a very simple baseline which outperforms
state-of-the-art approaches, including Bing Maps and Baidu Maps (whose APIs
permit large scale experimentation). Such a travel time estimation baseline has
several important uses, such as navigation (fast travel time estimates can
serve as approximate heuristics for A search variants for path finding) and
trip planning (which uses operating hours for popular destinations along with
travel time estimates to create an itinerary).Comment: 12 page
Urban Anomaly Analytics: Description, Detection, and Prediction
Urban anomalies may result in loss of life or property if not handled properly. Automatically alerting anomalies in their early stage or even predicting anomalies before happening is of great value for populations. Recently, data-driven urban anomaly analysis frameworks have been forming, which utilize urban big data and machine learning algorithms to detect and predict urban anomalies automatically. In this survey, we make a comprehensive review of the state-of-the-art research on urban anomaly analytics. We first give an overview of four main types of urban anomalies, traffic anomaly, unexpected crowds, environment anomaly, and individual anomaly. Next, we summarize various types of urban datasets obtained from diverse devices, i.e., trajectory, trip records, CDRs, urban sensors, event records, environment data, social media and surveillance cameras. Subsequently, a comprehensive survey of issues on detecting and predicting techniques for urban anomalies is presented. Finally, research challenges and open problems as discussed.Peer reviewe
A Methodology with Distributed Algorithms for Large-Scale Human Mobility Prediction
In today’s era of big data, huge amounts of spatial-temporal data related to human mobility, e.g., vehicle trajectories, are generated daily from all kinds of city-wide infrastructures. Understanding and accurately predicting such a large amount of spatial-temporal data could benefit many real-world applications, e.g., efficient transportation resource relocation. However, the mix of spatial and temporal patterns among these activities and the scale of the data (in a city level) pose great challenges for accurate predictions under real-time constraints.
To bridge the gap, this dissertation proposes a methodology for the prediction of large-scale human mobility, especially a city level’s vehicle trajectory distribution across the road network. The thesis has several major components: (1) a novel model for the prediction of spatial-temporal activities such as people’s outflow/inflow movements combining the latent and explicit features; (2) different models for the simulation of corresponding flow trajectory distributions in the road network, from which hot road segments and their formation can be predicted and identified in advance; (3) different MapReduce-based distributed algorithms for the simulation and analysis of large-scale trajectory distributions under real-time constraints.
First, our proposed methodology quantifies the latent features of spatial environments and temporal factors through tensor factorization, given existing mobility datasets. We model the relationship between spatial-temporal activities and the latent and other explicit features as a Gaussian process, which can be viewed as a distribution over the possible functions to predict human mobility.
After the prediction of overall inflow/outflow, we further model these movements’ trajectory distributions in the road network, from which the corresponding hot road segments and its possible causes, among other things, can be predicted in advance. For example, based on our prediction, in the next half hour, a high percentage of vehicles that travel from region A/B toward region C/D might pass through the same road segment, which indicates that a possible traffic jam or bottleneck could form there later. This process is computationally intensive and would require efficient algorithms for real-time response because the scale of a city’s road network and the possible number of trajectories that people might choose to take during certain time periods could be very large. Efficient distributed algorithms are proposed and validated
Spatiotemporal Tensor Completion for Improved Urban Traffic Imputation
Effective management of urban traffic is important for any smart city
initiative. Therefore, the quality of the sensory traffic data is of paramount
importance. However, like any sensory data, urban traffic data are prone to
imperfections leading to missing measurements. In this paper, we focus on
inter-region traffic data completion. We model the inter-region traffic as a
spatiotemporal tensor that suffers from missing measurements. To recover the
missing data, we propose an enhanced CANDECOMP/PARAFAC (CP) completion approach
that considers the urban and temporal aspects of the traffic. To derive the
urban characteristics, we divide the area of study into regions. Then, for each
region, we compute urban feature vectors inspired from biodiversity which are
used to compute the urban similarity matrix. To mine the temporal aspect, we
first conduct an entropy analysis to determine the most regular time-series.
Then, we conduct a joint Fourier and correlation analysis to compute its
periodicity and construct the temporal matrix. Both urban and temporal matrices
are fed into a modified CP-completion objective function. To solve this
objective, we propose an alternating least square approach that operates on the
vectorized version of the inputs. We conduct comprehensive comparative study
with two evaluation scenarios. In the first one, we simulate random missing
values. In the second scenario, we simulate missing values at a given area and
time duration. Our results demonstrate that our approach provides effective
recovering performance reaching 26% improvement compared to state-of-art CP
approaches and 35% compared to state-of-art generative model-based approaches
Predicting passenger origin-destination in online taxi-hailing systems
Because of transportation planning, traffic management, and dispatch
optimization importance, passenger origin-destination prediction has become one
of the most important requirements for intelligent transportation systems
management. In this paper, we propose a model to predict the next specified
time window travels' origin and destination. To extract meaningful travel
flows, we use K-means clustering in four-dimensional space with maximum cluster
size limitation for origin and destination zones. Because of the large number
of clusters, we use non-negative matrix factorization to decrease the number of
travel clusters. Also, we use a stacked recurrent neural network model to
predict travel count in each cluster. Comparing our results with other existing
models shows that our proposed model has 5-7% lower mean absolute percentage
error (MAPE) for 1-hour time windows, and 14% lower MAPE for 30-minute time
windows.Comment: 25 pages, 20 figure
- …