448 research outputs found

    On the imputation of missing data for road traffic forecasting: New insights and novel techniques

    Get PDF
    Vehicle flow forecasting is of crucial importance for the management of road traffic in complex urban networks, as well as a useful input for route planning algorithms. In general traffic predictive models rely on data gathered by different types of sensors placed on roads, which occasionally produce faulty readings due to several causes, such as malfunctioning hardware or transmission errors. Filling in those gaps is relevant for constructing accurate forecasting models, a task which is engaged by diverse strategies, from a simple null value imputation to complex spatio-temporal context imputation models. This work elaborates on two machine learning approaches to update missing data with no gap length restrictions: a spatial context sensing model based on the information provided by surrounding sensors, and an automated clustering analysis tool that seeks optimal pattern clusters in order to impute values. Their performance is assessed and compared to other common techniques and different missing data generation models over real data captured from the city of Madrid (Spain). The newly presented methods are found to be fairly superior when portions of missing data are large or very abundant, as occurs in most practical cases.This work has been supported by the Basque Government through the ELKARTEK program (Ref. KK-2015/0000080 and the BID3ABI project), as well as by the H2020 programme of the European Commission (Grant No. 691735)

    New Methods for Network Traffic Anomaly Detection

    Get PDF
    In this thesis we examine the efficacy of applying outlier detection techniques to understand the behaviour of anomalies in communication network traffic. We have identified several shortcomings. Our most finding is that known techniques either focus on characterizing the spatial or temporal behaviour of traffic but rarely both. For example DoS attacks are anomalies which violate temporal patterns while port scans violate the spatial equilibrium of network traffic. To address this observed weakness we have designed a new method for outlier detection based spectral decomposition of the Hankel matrix. The Hankel matrix is spatio-temporal correlation matrix and has been used in many other domains including climate data analysis and econometrics. Using our approach we can seamlessly integrate the discovery of both spatial and temporal anomalies. Comparison with other state of the art methods in the networks community confirms that our approach can discover both DoS and port scan attacks. The spectral decomposition of the Hankel matrix is closely tied to the problem of inference in Linear Dynamical Systems (LDS). We introduce a new problem, the Online Selective Anomaly Detection (OSAD) problem, to model the situation where the objective is to report new anomalies in the system and suppress know faults. For example, in the network setting an operator may be interested in triggering an alarm for malicious attacks but not on faults caused by equipment failure. In order to solve OSAD we combine techniques from machine learning and control theory in a unique fashion. Machine Learning ideas are used to learn the parameters of an underlying data generating system. Control theory techniques are used to model the feedback and modify the residual generated by the data generating state model. Experiments on synthetic and real data sets confirm that the OSAD problem captures a general scenario and tightly integrates machine learning and control theory to solve a practical problem

    Ensemble deep learning: A review

    Get PDF
    Ensemble learning combines several individual models to obtain better generalization performance. Currently, deep learning models with multilayer processing architecture is showing better performance as compared to the shallow or traditional classification models. Deep ensemble learning models combine the advantages of both the deep learning models as well as the ensemble learning such that the final model has better generalization performance. This paper reviews the state-of-art deep ensemble models and hence serves as an extensive summary for the researchers. The ensemble models are broadly categorised into ensemble models like bagging, boosting and stacking, negative correlation based deep ensemble models, explicit/implicit ensembles, homogeneous /heterogeneous ensemble, decision fusion strategies, unsupervised, semi-supervised, reinforcement learning and online/incremental, multilabel based deep ensemble models. Application of deep ensemble models in different domains is also briefly discussed. Finally, we conclude this paper with some future recommendations and research directions

    Data analytics 2016: proceedings of the fifth international conference on data analytics

    Get PDF

    Road distance and travel time for spatial urban modelling

    Get PDF
    Interactions within and between urban environments include the price of houses, the flow of traffic and the intensity of noise pollution, which can all be restricted by various physical, regulatory and customary barriers. Examples of such restrictions include buildings, one-way systems and pedestrian crossings. These constrictive features create challenges for predictive modelling in urban space, which are not fully captured when proximity-based models rely on the typically used Euclidean (straight line) distance metric. Over the course of this thesis, I ask three key questions in an attempt to identify how to improve spatial models in restricted urban areas. These are: (1) which distance function best models real world spatial interactions in an urban setting? (2) when, if ever, are non-Euclidean distance functions valid for urban spatial models? and (3) what is the best way to estimate the generalisation performance of urban models utilising spatial data? This thesis answers each of these questions through three contributions supporting the interdisciplinary domain of Urban Sciences. These contributions are: (1) the provision of an improved approximation of road distance and travel time networks to model urban spatial interactions; (2) the approximation of valid distance metrics from non-Euclidean inputs for improved spatial predictions and (3) the presentation of a road distance and travel time cross-validation metric to improve the estimation of urban model generalisation. Each of these contributions provide improvements against the current state-of-the-art. Throughout, all experiments utilise real world datasets in England and Wales, such datasets contain information on restricted roads, travel times, house sales and traffic counts. With these datasets, I display a number of case studies which show up to a 32% improved model accuracy against Euclidean distances and in some cases, a 90% improvement for the estimation of model generalisation performance. Combined, the contributions improve the way that proximity-based urban models perform and also provides a more accurate estimate of generalisation performance for predictive models in urban space. The main implication of these contributions to Urban Science is the ability to better model the challenges within a city based on how they interact with themselves and each other using an improved function of urban mobility, compared with the current state-of-the-art. Such challenges may include selecting the optimal locations for emergency services, identifying the causes of traffic incidents or estimating the density of air pollution. Additionally, the key implication of this research on geostatistics is that it provides the motivation and means of undertaking non-Euclidean based research for non-urban applications, for example predicting with alternative, non-road based, mobility patterns such as migrating animals, rivers and coast lines. Finally, the implication of my research to the real estate industry is significant, in which one can now improve the accuracy of the industry's state-of-the-art nationwide house price predictor, whilst also being able to more appropriately present their accuracy estimates for robustness

    Design and validation of novel methods for long-term road traffic forecasting

    Get PDF
    132 p.Road traffic management is a critical aspect for the design and planning of complex urban transport networks for which vehicle flow forecasting is an essential component. As a testimony of its paramount relevance in transport planning and logistics, thousands of scientific research works have covered the traffic forecasting topic during the last 50 years. In the beginning most approaches relied on autoregressive models and other analysis methods suited for time series data. During the last two decades, the development of new technology, platforms and techniques for massive data processing under the Big Data umbrella, the availability of data from multiple sources fostered by the Open Data philosophy and an ever-growing need of decision makers for accurate traffic predictions have shifted the spotlight to data-driven procedures. Even in this convenient context, with abundance of open data to experiment and advanced techniques to exploit them, most predictive models reported in literature aim for shortterm forecasts, and their performance degrades when the prediction horizon is increased. Long-termforecasting strategies are more scarce, and commonly based on the detection and assignment to patterns. These approaches can perform reasonably well unless an unexpected event provokes non predictable changes, or if the allocation to a pattern is inaccurate.The main core of the work in this Thesis has revolved around datadriven traffic forecasting, ultimately pursuing long-term forecasts. This has broadly entailed a deep analysis and understanding of the state of the art, and dealing with incompleteness of data, among other lesser issues. Besides, the second part of this dissertation presents an application outlook of the developed techniques, providing methods and unexpected insights of the local impact of traffic in pollution. The obtained results reveal that the impact of vehicular emissions on the pollution levels is overshadowe

    Design and validation of novel methods for long-term road traffic forecasting

    Get PDF
    132 p.Road traffic management is a critical aspect for the design and planning of complex urban transport networks for which vehicle flow forecasting is an essential component. As a testimony of its paramount relevance in transport planning and logistics, thousands of scientific research works have covered the traffic forecasting topic during the last 50 years. In the beginning most approaches relied on autoregressive models and other analysis methods suited for time series data. During the last two decades, the development of new technology, platforms and techniques for massive data processing under the Big Data umbrella, the availability of data from multiple sources fostered by the Open Data philosophy and an ever-growing need of decision makers for accurate traffic predictions have shifted the spotlight to data-driven procedures. Even in this convenient context, with abundance of open data to experiment and advanced techniques to exploit them, most predictive models reported in literature aim for shortterm forecasts, and their performance degrades when the prediction horizon is increased. Long-termforecasting strategies are more scarce, and commonly based on the detection and assignment to patterns. These approaches can perform reasonably well unless an unexpected event provokes non predictable changes, or if the allocation to a pattern is inaccurate.The main core of the work in this Thesis has revolved around datadriven traffic forecasting, ultimately pursuing long-term forecasts. This has broadly entailed a deep analysis and understanding of the state of the art, and dealing with incompleteness of data, among other lesser issues. Besides, the second part of this dissertation presents an application outlook of the developed techniques, providing methods and unexpected insights of the local impact of traffic in pollution. The obtained results reveal that the impact of vehicular emissions on the pollution levels is overshadowe

    Physics-Guided Deep Learning for Dynamical Systems: A survey

    Full text link
    Modeling complex physical dynamics is a fundamental task in science and engineering. Traditional physics-based models are interpretable but rely on rigid assumptions. And the direct numerical approximation is usually computationally intensive, requiring significant computational resources and expertise. While deep learning (DL) provides novel alternatives for efficiently recognizing complex patterns and emulating nonlinear dynamics, it does not necessarily obey the governing laws of physical systems, nor do they generalize well across different systems. Thus, the study of physics-guided DL emerged and has gained great progress. It aims to take the best from both physics-based modeling and state-of-the-art DL models to better solve scientific problems. In this paper, we provide a structured overview of existing methodologies of integrating prior physical knowledge or physics-based modeling into DL and discuss the emerging opportunities

    Traffic Prediction using Artificial Intelligence: Review of Recent Advances and Emerging Opportunities

    Full text link
    Traffic prediction plays a crucial role in alleviating traffic congestion which represents a critical problem globally, resulting in negative consequences such as lost hours of additional travel time and increased fuel consumption. Integrating emerging technologies into transportation systems provides opportunities for improving traffic prediction significantly and brings about new research problems. In order to lay the foundation for understanding the open research challenges in traffic prediction, this survey aims to provide a comprehensive overview of traffic prediction methodologies. Specifically, we focus on the recent advances and emerging research opportunities in Artificial Intelligence (AI)-based traffic prediction methods, due to their recent success and potential in traffic prediction, with an emphasis on multivariate traffic time series modeling. We first provide a list and explanation of the various data types and resources used in the literature. Next, the essential data preprocessing methods within the traffic prediction context are categorized, and the prediction methods and applications are subsequently summarized. Lastly, we present primary research challenges in traffic prediction and discuss some directions for future research.Comment: Published in Transportation Research Part C: Emerging Technologies (TR_C), Volume 145, 202
    corecore