4,072 research outputs found

    Proactive Anomaly Detection in Large-Scale Cloud-Native Databases

    Get PDF
    This disclosure describes techniques to identify anomalous patterns in customer workloads from database logs and to enable timely, corrective action that ensures uninterrupted operation of the database. Examples of anomalies include sudden increases (bursts) in the number of error messages written to a log file. An adaptive behavior norm is defined for each message type. Time instances or periods when the gap between messages of a given type in the database log deviate from the expected behavior norms are detected. A deviation from the behavior norm is a potential indicator of database problems. An anomaly detection tool outputs a ranked list of log statements exhibiting spikes of activity along with their time intervals that a database administrator (DBA) can examine to take corrective action. By automating anomaly detection, the valuable time of DBAs can be spent acting on issues rather than finding them

    Next stop 'NoOps': enabling cross-system diagnostics through graph-based composition of logs and metrics

    Get PDF
    Performing diagnostics in IT systems is an increasingly complicated task, and it is not doable in satisfactory time by even the most skillful operators. Systems and their architecture change very rapidly in response to business and user demand. Many organizations see value in the maintenance and management model of NoOps that stands for No Operations. One of the implementations of this model is a system that is maintained automatically without any human intervention. The path to NoOps involves not only precise and fast diagnostics but also reusing as much knowledge as possible after the system is reconfigured or changed. The biggest challenge is to leverage knowledge on one IT system and reuse this knowledge for diagnostics of another, different system. We propose a framework of weighted graphs which can transfer knowledge, and perform high-quality diagnostics of IT systems. We encode all possible data in a graph representation of a system state and automatically calculate weights of these graphs. Then, thanks to the evaluation of similarity between graphs, we transfer knowledge about failures from one system to another and use it for diagnostics. We successfully evaluate the proposed approach on Spark, Hadoop, Kafka and Cassandra systems.Peer ReviewedPostprint (author's final draft

    Probabilistic Approach to Structural Change Prediction in Evolving Social Networks

    Get PDF
    We propose a predictive model of structural changes in elementary subgraphs of social network based on Mixture of Markov Chains. The model is trained and verified on a dataset from a large corporate social network analyzed in short, one day-long time windows, and reveals distinctive patterns of evolution of connections on the level of local network topology. We argue that the network investigated in such short timescales is highly dynamic and therefore immune to classic methods of link prediction and structural analysis, and show that in the case of complex networks, the dynamic subgraph mining may lead to better prediction accuracy. The experiments were carried out on the logs from the Wroclaw University of Technology mail server

    Try with Simpler -- An Evaluation of Improved Principal Component Analysis in Log-based Anomaly Detection

    Full text link
    The rapid growth of deep learning (DL) has spurred interest in enhancing log-based anomaly detection. This approach aims to extract meaning from log events (log message templates) and develop advanced DL models for anomaly detection. However, these DL methods face challenges like heavy reliance on training data, labels, and computational resources due to model complexity. In contrast, traditional machine learning and data mining techniques are less data-dependent and more efficient but less effective than DL. To make log-based anomaly detection more practical, the goal is to enhance traditional techniques to match DL's effectiveness. Previous research in a different domain (linking questions on Stack Overflow) suggests that optimized traditional techniques can rival state-of-the-art DL methods. Drawing inspiration from this concept, we conducted an empirical study. We optimized the unsupervised PCA (Principal Component Analysis), a traditional technique, by incorporating lightweight semantic-based log representation. This addresses the issue of unseen log events in training data, enhancing log representation. Our study compared seven log-based anomaly detection methods, including four DL-based, two traditional, and the optimized PCA technique, using public and industrial datasets. Results indicate that the optimized unsupervised PCA technique achieves similar effectiveness to advanced supervised/semi-supervised DL methods while being more stable with limited training data and resource-efficient. This demonstrates the adaptability and strength of traditional techniques through small yet impactful adaptations

    Telemetry Fault-Detection Algorithms: Applications for Spacecraft Monitoring and Space Environment Sensing

    Get PDF
    Algorithms have been developed that identify unusual behavior in satellite health telemetry. Telemetry from solid-state power amplifiers and amplifier thermistors from 32 geostationary Earth orbit communications satellites from 1991 to 2015 are examined. Transient event detection and change-point event detection techniques that use a sliding window-based median are used, statistically evaluating the telemetry stream compared to the local norm. This approach allows application of the algorithms to any spacecraft platform because there is no reliance in the algorithms on satellite- or component-specific parameters, and it does not require a priori knowledge about the data distribution. Individual telemetry data streams are analyzed with the event detection algorithms, resulting in a compiled list of unusual events for each satellite. This approach identifies up to six events of up to six events that affect 51 of 53 telemetry streams at once, indicative of a spacecraft system-level event. In two satellites, the same top event date (4 December 2008) occurs over more than 10 years of telemetry from both satellites. Of the five spacecraft with known maneuvers, the algorithms identify the maneuvers in all cases. Event dates are compared to known operational activities, space weather events, and available anomaly lists to assess the use of event detection algorithms for spacecraft monitoring and sensing of the space environment.The authors would like to acknowledge the U.S. Air Force Office of Sponsored Research grant FA9550-13-1-0099 and NASA for funding this work through NASA Space Technology and Research Fellowship grant NNX16AM74H

    Anomaly detection and event mining in cold forming manufacturing processes

    Get PDF
    Predictive maintenance is one of the main goals within the Industry 4.0 trend. Advances in data-driven techniques offer new opportunities in terms of cost reduction, improved quality control, and increased work safety. This work brings data-driven techniques for two predictive maintenance tasks: anomaly detection and event prediction, applied in the real-world use case of a cold forming manufacturing line for consumer lifestyle products by using acoustic emissions sensors in proximity of the dies of the press module. The proposed models are robust and able to cope with problems such as noise, missing values, and irregular sampling. The detected anomalies are investigated by experts and confirmed to correspond to deviations in the normal operation of the machine. Moreover, we are able to find patterns which are related to the events of interest
    • …
    corecore