Search CORE

19 research outputs found

An Unsupervised Anomaly Detection Framework for Detecting Anomalies in Real Time through Network System’s Log Files Analysis

Author: Zeufack Vannel
Publication venue: DigitalCommons@Kennesaw State University
Publication date: 18/12/2020
Field of study

Nowadays, in almost every computer system, log files are used to keep records of occurring events. Those log files are then used for analyzing and debugging system failures. Due to this important utility, researchers have worked on finding fast and efficient ways to detect anomalies in a computer system by analyzing its log records. Research in log-based anomaly detection can be divided into two main categories: batch log-based anomaly detection and streaming logbased anomaly detection. Batch log-based anomaly detection is computationally heavy and does not allow us to instantaneously detect anomalies. On the other hand, streaming anomaly detection allows for immediate alert. However, current streaming approaches are mainly supervised. In this work, we propose a fully unsupervised framework which can detect anomalies in real time. We test our framework on hdfs log files and successfully detect anomalies with an F- 1 score of 83%

DigitalCommons@Kennesaw State University

Understanding error log event sequence for failure analysis

Author: Bisandu Desmond Bala
Gurumdimma Nentawe
Publication venue: 'African Journals Online (AJOL)'
Publication date: 18/02/2019
Field of study

Due to the evolvement of large-scale parallel systems, they are mostly employed for mission critical applications. The anticipation and accommodation of failure occurrences is crucial to the design. A commonplace feature of these large-scale systems is failure, and they cannot be treated as exception. The system state is mostly captured through the logs. The need for proper understanding of these error logs for failure analysis is extremely important. This is because the logs contain the “health” information of the system. In this paper we design an approach that seeks to find similarities in patterns of these logs events that leads to failures. Our experiment shows that several root causes of soft lockup failures could be traced through the logs. We capture the behavior of failure inducing patterns and realized that the logs pattern of failure and non-failure patterns are dissimilar.Keywords: Failure Sequences; Cluster; Error Logs; HPC; Similarit

AJOL - African Journals Online

PerfXplain: Debugging MapReduce Job Performance

Author: Balazinska Magdalena
Khoussainova Nodira
Suciu Dan
Publication venue
Publication date: 01/01/2012
Field of study

While users today have access to many tools that assist in performing large scale data analysis tasks, understanding the performance characteristics of their parallel computations, such as MapReduce jobs, remains difficult. We present PerfXplain, a system that enables users to ask questions about the relative performances (i.e., runtimes) of pairs of MapReduce jobs. PerfXplain provides a new query language for articulating performance queries and an algorithm for generating explanations from a log of past MapReduce job executions. We formally define the notion of an explanation together with three metrics, relevance, precision, and generality, that measure explanation quality. We present the explanation-generation algorithm based on techniques related to decision-tree building. We evaluate the approach on a log of past executions on Amazon EC2, and show that our approach can generate quality explanations, outperforming two naive explanation-generation methods.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

Unsupervised Threat Hunting using Continuous Bag of Terms and Time (CBoTT)

Author: Agrawal Manish Kumar
Behnia Rouzbeh
Daniel Clinton
Kayhan Varol
Shivendu Shivendu
Publication venue: AIS Electronic Library (AISeL)
Publication date: 11/12/2023
Field of study

Threat hunting is sifting through system logs to detect malicious activities that might have bypassed existing security measures. It can be performed in several ways, one of which is based on detecting anomalies. We propose an unsupervised framework, called continuous bag-of-terms-and-time (CBoTT), and publish its application programming interface (API) to help researchers and cybersecurity analysts perform anomaly-based threat hunting among SIEM logs geared toward process auditing on endpoint devices. Analyses show that our framework consistently outperforms benchmark approaches. When logs are sorted by likelihood of being an anomaly (from most likely to least), our approach identifies anomalies at higher percentiles (between 1.82-6.46) while benchmark approaches identify the same anomalies at lower percentiles (between 3.25-80.92). This framework can be used by other researchers to conduct benchmark analyses and cybersecurity analysts to find anomalies in SIEM logs

AIS Electronic Library (AISeL)

Intelligent Log Analysis for Anomaly Detection

Author: Yen Steven
Publication venue: SJSU ScholarWorks
Publication date: 22/05/2019
Field of study

Computer logs are a rich source of information that can be analyzed to detect various issues. The large volumes of logs limit the effectiveness of manual approaches to log analysis. The earliest automated log analysis tools take a rule-based approach, which can only detect known issues with existing rules. On the other hand, anomaly detection approaches can detect new or unknown issues. This is achieved by looking for unusual behavior different from the norm, often utilizing machine learning (ML) or deep learning (DL) models. In this project, we evaluated various ML and DL techniques used for log anomaly detection. We propose a hybrid neural network (NN) we call CausalConvLSTM for modeling log sequences, which takes advantage of both Convolutional Neural Network and Long Short-Term Memory Network\u27s strengths. Furthermore, we evaluated and proposed a concrete strategy for retraining NN anomaly detection models to maintain a low false-positive rate in a drifting environment

SJSU ScholarWorks

CloudLens, un langage de script pour l'analyse de données semi-structurées

Author: Baudart Guillaume
Mandel Louis
Tardieu Olivier
Vaziri Mandana
Publication venue: HAL CCSD
Publication date: 04/01/2017
Field of study

International audienceLors de la mise au point d'applications qui s'exécutent dans le nuage, les programmeurs ne peuvent souvent que tracer l'exécution de leur code en multipliant les impressions. Cette pratique génère des quantités astronomiques de fichiers de traces qu'il faut ensuite analyser pour trouver les causes d'un bogue. De plus, les applications combinent souvent de nombreux micro-services, et les programmeurs n'ont qu'un contrôle partiel du format des données qu'ils manipulent. Cet article présente CloudLens, un langage dédié à l'analyse de ce type de données dites semi-structurées. Il repose sur un modèle de programmation flot de données qui permet à la fois d'analyser les sources d'une erreur et de surveiller une application en cours d'exécution

INRIA a CCSD electronic archive server

Toward Fine-Grained, Unsupervised, Scalable Performance Diagnosis for Production Cloud Computing Systems

Author: Haibo Mi
Hua Cai
Huaimin Wang
Michael Rung-Tsong Lyu
Yangfan Zhou
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

logram: efficient log paring using n-gram model

Author: Dai Hetong
Publication venue
Publication date: 01/09/2020
Field of study

Software systems usually record important runtime information in their logs. Logs help practitioners understand system runtime behaviors and diagnose field failures. As logs are usually very large in size, automated log analysis is needed to assist practitioners in their software operation and maintenance efforts. Typically, the first step of automated log analysis is log parsing, i.e., converting unstructured raw logs into structured data. However, log parsing is challenging, because logs are produced by static templates in the source code (i.e., logging statements) yet the templates are usually inaccessible when parsing logs. Prior work proposed automated log parsing approaches that have achieved high accuracy. However, as the volume of logs grows rapidly in the era of cloud computing, efficiency becomes a major concern in log parsing. In this work, we propose an automated log parsing approach, Logram, which leverages n-gram dictionaries to achieve efficient log parsing. We evaluated Logram on 16 public log datasets and compared Logram with five state-of-the-art log parsing approaches. We found that Logram achieves a higher parsing accuracy than the best existing approaches (i.e., at least 10% higher, on average) and also outperforms these approaches in efficiency (i.e., 1.8 to 5.1 times faster than the second-fastest approaches in terms of end-to-end parsing time). Furthermore, we deployed Logram on Spark and we found that Logram scales out efficiently with the number of Spark nodes (e.g., with near- linear scalability for some logs) without sacrificing parsing accuracy. In addition, we demonstrated that Logram can support effective online parsing of logs, achieving similar parsing results and efficiency to the offline mode

Concordia University Research Repository