453 research outputs found

    Data mining based cyber-attack detection

    Get PDF

    Large-Scale Traffic Flow Prediction Using Deep Learning in the Context of Smart Mobility

    Get PDF
    Designing and developing a new generation of cities around the world (termed as smart cities) is fast becoming one of the ultimate solutions to overcome cities' problems such as population growth, pollution, energy crisis, and pressure demand on existing transportation infrastructure. One of the major aspects of a smart city is smart mobility. Smart mobility aims at improving transportation systems in several aspects: city logistics, info-mobility, and people-mobility. The emergence of the Internet of Car (IoC) phenomenon alongside with the development of Intelligent Transportation Systems (ITSs) opens some opportunities in improving the tra c management systems and assisting the travelers and authorities in their decision-making process. However, this has given rise to the generation of huge amount of data originated from human-device and device-device interaction. This is an opportunity and a challenge, and smart mobility will not meet its full potential unless valuable insights are extracted from these big data. Although the smart city environment and IoC allow for the generation and exchange of large amounts of data, there have not been yet well de ned and mature approaches for mining this wealth of information to bene t the drivers and traffic authorities. The main reason is most likely related to fundamental challenges in dealing with big data of various types and uncertain frequency coming from diverse sources. Mainly, the issues of types of data and uncertainty analysis in the predictions are indicated as the most challenging areas of study that have not been tackled yet. Important issues such as the nature of the data, i.e., stationary or non-stationary, and the prediction tasks, i.e., short-term or long-term, should also be taken into consideration. Based on this observation, a data-driven traffic flow prediction framework within the context of big data environment is proposed in this thesis. The main goal of this framework is to enhance the quality of traffic flow predictions, which can be used to assist travelers and traffic authorities in the decision-making process (whether for travel or management purposes). The proposed framework is focused around four main aspects that tackle major data-driven traffic flow prediction problems: the fusion of hard data for traffic flow prediction; the fusion of soft data for traffic flow prediction; prediction of non-stationary traffic flow; and prediction of multi-step traffic flow. All these aspects are investigated and formulated as computational based tools/algorithms/approaches adequately tailored to the nature of the data at hand. The first tool tackles the inherent big data problems and deals with the uncertainty in the prediction. It relies on the ability of deep learning approaches in handling huge amounts of data generated by a large-scale and complex transportation system with limited prior knowledge. Furthermore, motivated by the close correlation between road traffic and weather conditions, a novel deep-learning-based approach that predicts traffic flow by fusing the traffic history and weather data is proposed. The second tool fuses the streams of data (hard data) and event-based data (soft data) using Dempster Shafer Evidence Theory (DSET). One of the main features of the DSET is its ability to capture uncertainties in probabilities. Subsequently, an extension of DSET, namely Dempsters conditional rules for updating belief, is used to fuse traffic prediction beliefs coming from streams of data and event-based data sources. The third tool consists of a method to detect non-stationarities in the traffic flow and an algorithm to perform online adaptations of the tra c prediction model. The proposed detection approach is developed by monitoring the evolution of the spectral contents of the traffic flow. Furthermore, the approach is specfi cally developed to work in conjunction with state-of-the-art machine learning methods such as Deep Neural Network (DNN). By combining the power of frequency domain features and the known generalization capability and scalability of DNN in handling real-world data, it is expected that high prediction performances can be achieved. The last tool is developed to improve multi-step traffic flow prediction in the recursive and multi-output settings. In the recursive setting, an algorithm that augments the information about the current time-step is proposed. This algorithm is called Conditional Data as Demonstrator (C-DaD) and is an extension of an algorithm called Data as Demonstrator (DaD). Furthermore, in the multi-output setting, a novel approach of generating new history-future pairs of data that are aggregated with the original training data using Conditional Generative Adversarial Network (C-GAN) is developed. To demonstrate the capabilities of the proposed approaches, a series of experiments using arti cial and real-world data are conducted. Each of the proposed approaches is compared with the state-of-the-art or currently existing approaches

    Temporospatial Context-Aware Vehicular Crash Risk Prediction

    Get PDF
    With the demand for more vehicles increasing, road safety is becoming a growing concern. Traffic collisions take many lives and cost billions of dollars in losses. This explains the growing interest of governments, academic institutions and companies in road safety. The vastness and availability of road accident data has provided new opportunities for gaining a better understanding of accident risk factors and for developing more effective accident prediction and prevention regimes. Much of the empirical research on road safety and accident analysis utilizes statistical models which capture limited aspects of crashes. On the other hand, data mining has recently gained interest as a reliable approach for investigating road-accident data and for providing predictive insights. While some risk factors contribute more frequently in the occurrence of a road accident, the importance of driver behavior, temporospatial factors, and real-time traffic dynamics have been underestimated. This study proposes a framework for predicting crash risk based on historical accident data. The proposed framework incorporates machine learning and data analytics techniques to identify driving patterns and other risk factors associated with potential vehicle crashes. These techniques include clustering, association rule mining, information fusion, and Bayesian networks. Swarm intelligence based association rule mining is employed to uncover the underlying relationships and dependencies in collision databases. Data segmentation methods are employed to eliminate the effect of dependent variables. Extracted rules can be used along with real-time mobility to predict crashes and their severity in real-time. The national collision database of Canada (NCDB) is used in this research to generate association rules with crash risk oriented subsequents, and to compare the performance of the swarm intelligence based approach with that of other association rule miners. Many industry-demanding datasets, including road-accident datasets, are deficient in descriptive factors. This is a significant barrier for uncovering meaningful risk factor relationships. To resolve this issue, this study proposes a knwoledgebase approximation framework to enhance the crash risk analysis by integrating pieces of evidence discovered from disparate datasets capturing different aspects of mobility. Dempster-Shafer theory is utilized as a key element of this knowledgebase approximation. This method can integrate association rules with acceptable accuracy under certain circumstances that are discussed in this thesis. The proposed framework is tested on the lymphography dataset and the road-accident database of the Great Britain. The derived insights are then used as the basis for constructing a Bayesian network that can estimate crash likelihood and risk levels so as to warn drivers and prevent accidents in real-time. This Bayesian network approach offers a way to implement a naturalistic driving analysis process for predicting traffic collision risk based on the findings from the data-driven model. A traffic incident detection and localization method is also proposed as a component of the risk analysis model. Detecting and localizing traffic incidents enables timely response to accidents and facilitates effective and efficient traffic flow management. The results obtained from the experimental work conducted on this component is indicative of the capability of our Dempster-Shafer data-fusion-based incident detection method in overcoming the challenges arising from erroneous and noisy sensor readings

    Deep Learning and Dempster-Shafer Theory Based Insider Threat Detection

    Get PDF
    Organizations' own personnel now have a greater ability than ever before to misuse their access to critical organizational assets. Insider threat detection is a key component in identifying rare anomalies in context, which is a growing concern for many organizations. Existing perimeter security mechanisms are proving to be ineffective against insider threats. As a prospective filter for the human analysts, a new deep learning based insider threat detection method that uses the Dempster-Shafer theory is proposed to handle both accidental as well as intentional insider threats via organization's channels of communication in real time. The long short-term memory (LSTM) architecture is applied to a recurrent neural network (RNN) in this work to detect anomalous network behavior patterns. Furthermore, belief is updated with Dempster's conditional rule and utilized to fuse evidence to achieve enhanced prediction. The CERT Insider Threat Dataset v6.2 is used to train the behavior model. Through performance evaluation, our proposed method is proven to be effective as an insider threat detection technique

    Review of data fusion methods for real-time and multi-sensor traffic flow analysis

    Get PDF
    Recently, development in intelligent transportation systems (ITS) requires the input of various kinds of data in real-time and from multiple sources, which imposes additional research and application challenges. Ongoing studies on Data Fusion (DF) have produced significant improvement in ITS and manifested an enormous impact on its growth. This paper reviews the implementation of DF methods in ITS to facilitate traffic flow analysis (TFA) and solutions that entail the prediction of various traffic variables such as driving behavior, travel time, speed, density, incident, and traffic flow. It attempts to identify and discuss real-time and multi-sensor data sources that are used for various traffic domains, including road/highway management, traffic states estimation, and traffic controller optimization. Moreover, it attempts to associate abstractions of data level fusion, feature level fusion, and decision level fusion on DF methods to better understand the role of DF in TFA and ITS. Consequently, the main objective of this paper is to review DF methods used for real-time and multi-sensor (heterogeneous) TFA studies. The review outcomes are (i) a guideline of constructing DF methods which involve preprocessing, filtering, decision, and evaluation as core steps, (ii) a description of the recent DF algorithms or methods that adopt real-time and multi-sensor sources data and the impact of these data sources on the improvement of TFA, (iii) an examination of the testing and evaluation methodologies and the popular datasets and (iv) an identification of several research gaps, some current challenges, and new research trends

    Development of a process for identification of the operational mode of industrial sites using high dimensional multi-modal data

    Get PDF
    Many algorithms exist to determine the physical contents of an image. Target detection or anomaly detection algorithms, for example, use statistical and geometric approaches in high dimensional space to locate objects within a scene. Instead of target detection, however, it has become of interest of late to delve deeper into the field of remote sensing in order to perform textit{process detection}. Process detection refers to the ability to identify the operational mode of an industrial facility. To accurately complete this task will require a new set of analysis tools. This thesis discusses a method that can be used to perform process detection with multi-modal remotely sensed data. Using a local industrial facility, operational modes were identified, as well as the subtle differences between them. Combinations of hourly data, sparse data, and latent variables were combined through analytical tools and a prediction of the process taking place at different moments was performing using both real and simulated data sets. An advanced analyst environment is also discussed, with a few demonstrations from a test environment developed by a small team at RIT. Temporal analysis, multi-modal data integration, and the use of process models to make latent observables are discussed. This thesis shows the utility of such an environment and demonstrates the need for the further development
    corecore