144 research outputs found

    Online failure prediction in air traffic control systems

    Get PDF
    This thesis introduces a novel approach to online failure prediction for mission critical distributed systems that has the distinctive features to be black-box, non-intrusive and online. The approach combines Complex Event Processing (CEP) and Hidden Markov Models (HMM) so as to analyze symptoms of failures that might occur in the form of anomalous conditions of performance metrics identified for such purpose. The thesis presents an architecture named CASPER, based on CEP and HMM, that relies on sniffed information from the communication network of a mission critical system, only, for predicting anomalies that can lead to software failures. An instance of Casper has been implemented, trained and tuned to monitor a real Air Traffic Control (ATC) system developed by Selex ES, a Finmeccanica Company. An extensive experimental evaluation of CASPER is presented. The obtained results show (i) a very low percentage of false positives over both normal and under stress conditions, and (ii) a sufficiently high failure prediction time that allows the system to apply appropriate recovery procedures

    Online failure prediction in air traffic control systems

    Get PDF
    This thesis introduces a novel approach to online failure prediction for mission critical distributed systems that has the distinctive features to be black-box, non-intrusive and online. The approach combines Complex Event Processing (CEP) and Hidden Markov Models (HMM) so as to analyze symptoms of failures that might occur in the form of anomalous conditions of performance metrics identified for such purpose. The thesis presents an architecture named CASPER, based on CEP and HMM, that relies on sniffed information from the communication network of a mission critical system, only, for predicting anomalies that can lead to software failures. An instance of Casper has been implemented, trained and tuned to monitor a real Air Traffic Control (ATC) system developed by Selex ES, a Finmeccanica Company. An extensive experimental evaluation of CASPER is presented. The obtained results show (i) a very low percentage of false positives over both normal and under stress conditions, and (ii) a sufficiently high failure prediction time that allows the system to apply appropriate recovery procedures

    Towards efficient error detection in large-scale HPC systems

    Get PDF
    The need for computer systems to be reliable has increasingly become important as the dependence on their accurate functioning by users increases. The failure of these systems could very costly in terms of time and money. In as much as system's designers try to design fault-free systems, it is practically impossible to have such systems as different factors could affect them. In order to achieve system's reliability, fault tolerance methods are usually deployed; these methods help the system to produce acceptable results even in the presence of faults. Root cause analysis, a dependability method for which the causes of failures are diagnosed for the purpose of correction or prevention of future occurrence is less efficient. It is reactive and would not prevent the first failure from occurring. For this reason, methods with predictive capabilities are preferred; failure prediction methods are employed to predict the potential failures to enable preventive measures to be applied. Most of the predictive methods have been supervised, requiring accurate knowledge of the system's failures, errors and faults. However, with changing system components and system updates, supervised methods are ineffective. Error detection methods allows error patterns to be detected early to enable preventive methods to be applied. Performing this detection in an unsupervised way could be more effective as changes to systems or updates would less affect such a solution. In this thesis, we introduced an unsupervised approach to detecting error patterns in a system using its data. More specifically, the thesis investigates the use of both event logs and resource utilization data to detect error patterns. It addresses both the spatial and temporal aspects of achieving system dependability. The proposed unsupervised error detection method has been applied on real data from two different production systems. The results are positive; showing average detection F-measure of about 75%

    Features correlation-based workflows for high-performance computing systems diagnosis

    Get PDF
    Analysing failures to improve the reliability of high performance computing systems and data centres is important. The primary source of information for diagnosing system failures is the system logs and it is widely known that finding the cause of a system failure using only system logs is incomplete. Resource utilisation data – recently made available – is another potential useful source of information for failure analysis. However, large High-Performance Computing (HPC) systems generate a lot of data. Processing the huge amount of data presents a significant challenge for online failure diagnosis. Most of the work on failure diagnosis have studied errors that lead to system failures only, but there is little work that study errors which lead to a system failure or recovery on real data. In this thesis, we design, implement and evaluate two failure diagnostics frameworks. We name the frameworks CORRMEXT and EXERMEST. We implement the Data Type Extraction, Feature Extraction, Correlation and Time-bin Extraction modules. CORRMEXT integrates the Data Type Extraction, Correlation and Time-bin Extraction modules. It identifies error cases that occur frequently and reports the success and failure of error recovery protocols. EXERMEST integrates the Feature Extraction and Correlation modules. It extracts significant errors and resource use counters and identifies error cases that are rare. We apply the diagnostics frameworks on the resource use data and system logs on three HPC systems operated by the Texas Advanced Computing Center (TACC). Our results show that: (i) multiple correlation methods are required for identifying more dates of groups of correlated resource use counters and groups of correlated errors, (ii) the earliest hour of change in system behaviour can only be identified by using the correlated resource use counters and correlated errors, (iii) multiple feature extraction methods are required for identifying the rare error cases, and (iv) time-bins of multiple granularities are necessary for identifying the rare error cases. CORRMEXT and EXERMEST are available on the public domain for supporting system administrators in failure diagnosis

    BIG DATA ANALYTICS IN TRANSPORTATION NETWORKS USING THE NPMRDS

    Get PDF
    Urban traffic congestion is common and the cause for loss of productivity (due to trip delays) and higher risk to passenger safety (due to increased time in the automobile), not to mention an increase in fuel consumption, pollution, and vehicle wear. The fiduciary effect is a tremendous burden for citizens and states alike. One way to alleviate these ill effects is increasing state roadway and highway capacity. Doing so, however, is cost prohibitive. A better option is improving performance measurements in an effort to manage current roadway assets, improve traffic flow, and reduce road congestion. Variables like segment travel time, speed, delay, and origin-to-destination trip time are measures frequently used to monitor traffic and improve traffic flow on the state roadways. In 2014, ODOT was given access to the FHWA’s National Performance Management Research Data Set (NPMRDS), which includes average travel times divided into contiguous segments with travel time measured every 5 minutes. Travel times are subsequently segregated into passenger vehicle travel time and freight travel time. Both types of time are calculated using GPS location transmitted by way of participating drivers traveling along interstate highways. This thesis presents research detailing the use of NPMRDS dataset consisting of highway vehicle travel times, for computing performance measurements in the state of Oklahoma. Data extraction, preprocessing, and statistical analysis were performed on the dataset. A comprehensive study of the dataset characteristics, including influencing variables that affect data measurements are presented. A process for identifying anomalies is developed, and recommendations for improving accuracy and alleviating data anomalies are reported. Furthermore, a process for filtering and removing speed data outliers across multiple road segments is developed, and comparative analysis of raw baseline speed data and cleansed data is performed. Identification and computational comparison of travel time reliability performance measurements is done. A method for improved congestion detection is investigated and developed. Finally, traffic analytics using machine learning is performed to identify and to classify congested segments and a novel approach for identifying non-recurrent congestion sources using Bayesian inference of speed data is also developed and introduced

    National Performance Management Research Dataset (NPMRDS) - Speed Validation for Traffic Performance Measures (FHWA-OK-17-02)

    Get PDF
    This report presents research detailing the use of the first version of the National Performance Management Research Data Set (NPMRDS v.1) comprised of highway vehicle travel times used for computing performance measurements in the state of Oklahoma. Data extraction, preprocessing, and statistical analysis were performed on the dataset and acomprehensive study of dataset characteristics, influencing variables, outliersand anomalies was carried out. In addition, a study on filtering and removing speed data outliers across multiple road segments is developed, and a comparative analysis of raw baseline speed data and cleansed data is performed. A method for improved congestion detection is investigated and developed. Identification and a computational comparison analysis of travel time reliability performance metrics for both raw and cleansed datasets is shown. An outlier removal framework is formulated, and a cleansed and complete version of NPMRDS v.1 is generated. Finally, a validation analysis on the cleansed dataset is presented. In the end, research affirmsthat understanding domain specific characteristics is vital for filtering data outliers and anomalies of this dataset,which in turn is key for calculating accurate performance measurements. Thus, careful consideration for outlierremoval must be taken into account when computing travel time reliability metrics using the NPMRDS.October 2015-October 2017N

    14th Conference on DATA ANALYSIS METHODS for Software Systems

    Get PDF
    DAMSS-2023 is the 14th International Conference on Data Analysis Methods for Software Systems, held in Druskininkai, Lithuania. Every year at the same venue and time. The exception was in 2020, when the world was gripped by the Covid-19 pandemic and the movement of people was severely restricted. After a year’s break, the conference was back on track, and the next conference was successful in achieving its primary goal of lively scientific communication. The conference focuses on live interaction among participants. For better efficiency of communication among participants, most of the presentations are poster presentations. This format has proven to be highly effective. However, we have several oral sections, too. The history of the conference dates back to 2009 when 16 papers were presented. It began as a workshop and has evolved into a well-known conference. The idea of such a workshop originated at the Institute of Mathematics and Informatics, now the Institute of Data Science and Digital Technologies of Vilnius University. The Lithuanian Academy of Sciences and the Lithuanian Computer Society supported this idea, which gained enthusiastic acceptance from both the Lithuanian and international scientific communities. This year’s conference features 84 presentations, with 137 registered participants from 11 countries. The conference serves as a gathering point for researchers from six Lithuanian universities, making it the main annual meeting for Lithuanian computer scientists. The primary aim of the conference is to showcase research conducted at Lithuanian and foreign universities in the fields of data science and software engineering. The annual organization of the conference facilitates the rapid exchange of new ideas within the scientific community. Seven IT companies supported the conference this year, indicating the relevance of the conference topics to the business sector. In addition, the conference is supported by the Lithuanian Research Council and the National Science and Technology Council (Taiwan, R. O. C.). The conference covers a wide range of topics, including Applied Mathematics, Artificial Intelligence, Big Data, Bioinformatics, Blockchain Technologies, Business Rules, Software Engineering, Cybersecurity, Data Science, Deep Learning, High-Performance Computing, Data Visualization, Machine Learning, Medical Informatics, Modelling Educational Data, Ontological Engineering, Optimization, Quantum Computing, Signal Processing. This book provides an overview of all presentations from the DAMSS-2023 conference

    Big data driven assessment of probe-sourced data

    Get PDF
    Presently, there is an expanding interest among transportation agencies and state Departments of Transportation to consider augmenting traffic data collection with probe-based services, such as INRIX. The objective is to decrease the cost of deploying and maintaining sensors and increase the coverage under constrained budgets. This dissertation documents a study evaluating the opportunities and challenges of using INRIX data in Midwest. The objective of this study is threefold: (1) quantitative analysis of probe data characteristics: coverage, speed bias, and congestion detection precision (2) improving probe based congestion performance metrics accuracy by using change point detection, and (3) assessing the impact of game day schedule and opponents on travel patterns and route choice. The first study utilizes real-time and historical traffic data which are collected through two different data sources; INRIX and Wavetronix. The INRIX probe data stream is compared to a benchmarked Wavetronix sensor data source in order to explain some of the challenges and opportunities associated with using wide area probe data. In the following, INRIX performance is thoroughly evaluated in three major criteria: coverage and penetration, speed bias, congestion detection precision. The second study focuses on the number of congested events and congested hour as two important performance measures. To improve the accuracy and reliability of performance measures, this study addresses a big issue in calculating performance measures by comparing Wavetronix against INRIX. We examine the very traditional and common method of congestion detection and congested hour calculation which utilized a fixed-threshold and we show how unreliable and erroneous that method can be. After that, a novel traffic congestion identification method is proposed in this paper and in the following the number of congested events and congested hour are computed as two performance measures. After evaluating the accuracy and reliability of INRIX probe data in chapter 2 and 3, the purpose of the last study in chapter 4 is to assess the impacts of game day on travel pattern and route choice behaviors using INRIX, the accurate and reliable data source. It is shown that the impacts vary depending on the schedule and also the opponents. Also, novel methods are proposed for hotspot detection and prediction. Overall, this dissertation evaluates probe-sourced streaming data from INRIX, to study its characteristics as a data source, challenges and opportunities associated with using wide area probe data, and finally make use of INRIX as a reliable data source for travel behavior analysis

    Intelligent Sensor Networks

    Get PDF
    In the last decade, wireless or wired sensor networks have attracted much attention. However, most designs target general sensor network issues including protocol stack (routing, MAC, etc.) and security issues. This book focuses on the close integration of sensing, networking, and smart signal processing via machine learning. Based on their world-class research, the authors present the fundamentals of intelligent sensor networks. They cover sensing and sampling, distributed signal processing, and intelligent signal learning. In addition, they present cutting-edge research results from leading experts
    • …
    corecore