448 research outputs found
On the imputation of missing data for road traffic forecasting: New insights and novel techniques
Vehicle flow forecasting is of crucial importance for the management of road traffic in complex urban networks, as well as a useful input for route planning algorithms. In general traffic predictive models rely on data gathered by different types of sensors placed on roads, which occasionally produce faulty readings due to several causes, such as malfunctioning hardware or transmission errors. Filling in those gaps is relevant for constructing accurate forecasting models, a task which is engaged by diverse strategies, from a simple null value imputation to complex spatio-temporal context imputation models. This work elaborates on two machine learning approaches to update missing data with no gap length restrictions: a spatial context sensing model based on the information provided by surrounding sensors, and an automated clustering analysis tool that seeks optimal pattern clusters in order to impute values. Their performance is assessed and compared to other common techniques and different missing data generation models over real data captured from the city of Madrid (Spain). The newly presented methods are found to be fairly superior when portions of missing data are large or very abundant, as occurs in most practical cases.This work has been supported by the Basque Government through the ELKARTEK program (Ref. KK-2015/0000080 and the
BID3ABI project), as well as by the H2020 programme of the European Commission (Grant No. 691735)
New Methods for Network Traffic Anomaly Detection
In this thesis we examine the efficacy of applying outlier detection techniques to understand the behaviour of anomalies in communication network traffic. We have identified several shortcomings. Our most finding is that known techniques either focus on characterizing the spatial or temporal behaviour of traffic but rarely both. For example DoS attacks are anomalies which violate temporal patterns while port scans violate the spatial equilibrium of network traffic. To address this observed weakness we have designed a new method for outlier detection based spectral decomposition of the Hankel matrix. The Hankel matrix is spatio-temporal correlation matrix and has been used in many other domains including climate data analysis and econometrics. Using our approach we can seamlessly integrate the discovery of both spatial and temporal anomalies. Comparison with other state of the art methods in the networks community confirms that our approach can discover both DoS and port scan attacks. The spectral decomposition of the Hankel matrix is closely tied to the problem of inference in Linear Dynamical Systems (LDS). We introduce a new problem, the Online Selective Anomaly Detection (OSAD) problem, to model the situation where the objective is to report new anomalies in the system and suppress know faults. For example, in the network setting an operator may be interested in triggering an alarm for malicious attacks but not on faults caused by equipment failure. In order to solve OSAD we combine techniques from machine learning and control theory in a unique fashion. Machine Learning ideas are used to learn the parameters of an underlying data generating system. Control theory techniques are used to model the feedback and modify the residual generated by the data generating state model. Experiments on synthetic and real data sets confirm that the OSAD problem captures a general scenario and tightly integrates machine learning and control theory to solve a practical problem
Ensemble deep learning: A review
Ensemble learning combines several individual models to obtain better
generalization performance. Currently, deep learning models with multilayer
processing architecture is showing better performance as compared to the
shallow or traditional classification models. Deep ensemble learning models
combine the advantages of both the deep learning models as well as the ensemble
learning such that the final model has better generalization performance. This
paper reviews the state-of-art deep ensemble models and hence serves as an
extensive summary for the researchers. The ensemble models are broadly
categorised into ensemble models like bagging, boosting and stacking, negative
correlation based deep ensemble models, explicit/implicit ensembles,
homogeneous /heterogeneous ensemble, decision fusion strategies, unsupervised,
semi-supervised, reinforcement learning and online/incremental, multilabel
based deep ensemble models. Application of deep ensemble models in different
domains is also briefly discussed. Finally, we conclude this paper with some
future recommendations and research directions
Road distance and travel time for spatial urban modelling
Interactions within and between urban environments include the price of houses, the flow of traffic and the intensity of noise pollution, which can all be restricted by various physical, regulatory and customary barriers. Examples of such restrictions include buildings, one-way systems and pedestrian crossings. These constrictive features create challenges for predictive modelling in urban space, which are not fully captured when proximity-based models rely on the typically used Euclidean (straight line) distance metric.
Over the course of this thesis, I ask three key questions in an attempt to identify how to improve spatial models in restricted urban areas. These are: (1) which distance function best models real world spatial interactions in an urban setting? (2) when, if ever, are non-Euclidean distance functions valid for urban spatial models? and (3) what is the best way to estimate the generalisation performance of urban models utilising spatial data?
This thesis answers each of these questions through three contributions supporting the interdisciplinary domain of Urban Sciences. These contributions are: (1) the provision of an improved approximation of road distance and travel time networks to model urban spatial interactions; (2) the approximation of valid distance metrics from non-Euclidean inputs for improved spatial predictions and (3) the presentation of a road distance and travel time cross-validation metric to improve the estimation of urban model generalisation. Each of these contributions provide improvements against the current state-of-the-art. Throughout, all experiments utilise real world datasets in England and Wales, such datasets contain information on restricted roads, travel times, house sales and traffic counts. With these datasets, I display a number of case studies which show up to a 32% improved model accuracy against Euclidean distances and in some cases, a 90% improvement for the estimation of model generalisation performance.
Combined, the contributions improve the way that proximity-based urban models perform and also provides a more accurate estimate of generalisation performance for predictive models in urban space. The main implication of these contributions to Urban Science is the ability to better model the challenges within a city based on how they interact with themselves and each other using an improved function of urban mobility, compared with the current state-of-the-art. Such challenges may include selecting the optimal locations for emergency services, identifying the causes of traffic incidents or estimating the density of air pollution. Additionally, the key implication of this research on geostatistics is that it provides the motivation and means of undertaking non-Euclidean based research for non-urban applications, for example predicting with alternative, non-road based, mobility patterns such as migrating animals, rivers and coast lines. Finally, the implication of my research to the real estate industry is significant, in which one can now improve the accuracy of the industry's state-of-the-art nationwide house price predictor, whilst also being able to more appropriately present their accuracy estimates for robustness
Design and validation of novel methods for long-term road traffic forecasting
132 p.Road traffic management is a critical aspect for the design and planning of complex urban transport networks for which vehicle flow forecasting is an essential component. As a testimony of its paramount relevance in transport planning and logistics, thousands of scientific research works have covered the traffic forecasting topic during the last 50 years. In the beginning most approaches relied on autoregressive models and other analysis methods suited for time series data. During the last two decades, the development of new technology, platforms and techniques for massive data processing under the Big Data umbrella, the availability of data from multiple sources fostered by the Open Data philosophy and an ever-growing need of decision makers for accurate traffic predictions have shifted the spotlight to data-driven procedures. Even in this convenient context, with abundance of open data to experiment and advanced techniques to exploit them, most predictive models reported in literature aim for shortterm forecasts, and their performance degrades when the prediction horizon is increased. Long-termforecasting strategies are more scarce, and commonly based on the detection and assignment to patterns. These approaches can perform reasonably well unless an unexpected event provokes non predictable changes, or if the allocation to a pattern is inaccurate.The main core of the work in this Thesis has revolved around datadriven traffic forecasting, ultimately pursuing long-term forecasts. This has broadly entailed a deep analysis and understanding of the state of the art, and dealing with incompleteness of data, among other lesser issues. Besides, the second part of this dissertation presents an application outlook of the developed techniques, providing methods and unexpected insights of the local impact of traffic in pollution. The obtained results reveal that the impact of vehicular emissions on the pollution levels is overshadowe
Design and validation of novel methods for long-term road traffic forecasting
132 p.Road traffic management is a critical aspect for the design and planning of complex urban transport networks for which vehicle flow forecasting is an essential component. As a testimony of its paramount relevance in transport planning and logistics, thousands of scientific research works have covered the traffic forecasting topic during the last 50 years. In the beginning most approaches relied on autoregressive models and other analysis methods suited for time series data. During the last two decades, the development of new technology, platforms and techniques for massive data processing under the Big Data umbrella, the availability of data from multiple sources fostered by the Open Data philosophy and an ever-growing need of decision makers for accurate traffic predictions have shifted the spotlight to data-driven procedures. Even in this convenient context, with abundance of open data to experiment and advanced techniques to exploit them, most predictive models reported in literature aim for shortterm forecasts, and their performance degrades when the prediction horizon is increased. Long-termforecasting strategies are more scarce, and commonly based on the detection and assignment to patterns. These approaches can perform reasonably well unless an unexpected event provokes non predictable changes, or if the allocation to a pattern is inaccurate.The main core of the work in this Thesis has revolved around datadriven traffic forecasting, ultimately pursuing long-term forecasts. This has broadly entailed a deep analysis and understanding of the state of the art, and dealing with incompleteness of data, among other lesser issues. Besides, the second part of this dissertation presents an application outlook of the developed techniques, providing methods and unexpected insights of the local impact of traffic in pollution. The obtained results reveal that the impact of vehicular emissions on the pollution levels is overshadowe
Physics-Guided Deep Learning for Dynamical Systems: A survey
Modeling complex physical dynamics is a fundamental task in science and
engineering. Traditional physics-based models are interpretable but rely on
rigid assumptions. And the direct numerical approximation is usually
computationally intensive, requiring significant computational resources and
expertise. While deep learning (DL) provides novel alternatives for efficiently
recognizing complex patterns and emulating nonlinear dynamics, it does not
necessarily obey the governing laws of physical systems, nor do they generalize
well across different systems. Thus, the study of physics-guided DL emerged and
has gained great progress. It aims to take the best from both physics-based
modeling and state-of-the-art DL models to better solve scientific problems. In
this paper, we provide a structured overview of existing methodologies of
integrating prior physical knowledge or physics-based modeling into DL and
discuss the emerging opportunities
Traffic Prediction using Artificial Intelligence: Review of Recent Advances and Emerging Opportunities
Traffic prediction plays a crucial role in alleviating traffic congestion
which represents a critical problem globally, resulting in negative
consequences such as lost hours of additional travel time and increased fuel
consumption. Integrating emerging technologies into transportation systems
provides opportunities for improving traffic prediction significantly and
brings about new research problems. In order to lay the foundation for
understanding the open research challenges in traffic prediction, this survey
aims to provide a comprehensive overview of traffic prediction methodologies.
Specifically, we focus on the recent advances and emerging research
opportunities in Artificial Intelligence (AI)-based traffic prediction methods,
due to their recent success and potential in traffic prediction, with an
emphasis on multivariate traffic time series modeling. We first provide a list
and explanation of the various data types and resources used in the literature.
Next, the essential data preprocessing methods within the traffic prediction
context are categorized, and the prediction methods and applications are
subsequently summarized. Lastly, we present primary research challenges in
traffic prediction and discuss some directions for future research.Comment: Published in Transportation Research Part C: Emerging Technologies
(TR_C), Volume 145, 202
- …