175 research outputs found
Mining of uncertain Web log sequences with access history probabilities
An uncertain data sequence is a sequence of data that exist with some level of doubt or probability. Each data item in the uncertain sequence is represented with a label and probability values, referred to as existential probability, ranging from 0 to 1.
Existing algorithms are either unsuitable or inefficient for discovering frequent sequences in uncertain data. This thesis presents mining of uncertain Web sequences with a method that combines access history probabilities from several Web log sessions with features of the PLWAP web sequential miner. The method is Uncertain Position Coded Pre-order Linked Web Access Pattern (U-PLWAP) algorithm for mining frequent sequential patterns in uncertain web logs. While PLWAP only considers a session of weblogs, U-PLWAP takes more sessions of weblogs from which existential probabilities are generated. Experiments show that U-PLWAP is at least 100% faster than U-apriori, and 33% faster than UF-growth. The UF-growth algorithm also fails to take into consideration the order of the items, thereby making U-PLWAP a richer algorithm in terms of the information its result contains
LogLG: Weakly Supervised Log Anomaly Detection via Log-Event Graph Construction
Fully supervised log anomaly detection methods suffer the heavy burden of
annotating massive unlabeled log data. Recently, many semi-supervised methods
have been proposed to reduce annotation costs with the help of parsed
templates. However, these methods consider each keyword independently, which
disregards the correlation between keywords and the contextual relationships
among log sequences. In this paper, we propose a novel weakly supervised log
anomaly detection framework, named LogLG, to explore the semantic connections
among keywords from sequences. Specifically, we design an end-to-end iterative
process, where the keywords of unlabeled logs are first extracted to construct
a log-event graph. Then, we build a subgraph annotator to generate pseudo
labels for unlabeled log sequences. To ameliorate the annotation quality, we
adopt a self-supervised task to pre-train a subgraph annotator. After that, a
detection model is trained with the generated pseudo labels. Conditioned on the
classification results, we re-extract the keywords from the log sequences and
update the log-event graph for the next iteration. Experiments on five
benchmarks validate the effectiveness of LogLG for detecting anomalies on
unlabeled log data and demonstrate that LogLG, as the state-of-the-art weakly
supervised method, achieves significant performance improvements compared to
existing methods.Comment: 12 page
Adversarially Reweighted Sequence Anomaly Detection With Limited Log Data
In the realm of safeguarding digital systems, the ability to detect anomalies in log sequences is paramount, with applications spanning cybersecurity, network surveillance, and financial transaction monitoring. This thesis presents AdvSVDD, a sophisticated deep learning model designed for sequence anomaly detection. Built upon the foundation of Deep Support Vector Data Description (Deep SVDD), AdvSVDD stands out by incorporating Adversarial Reweighted Learning (ARL) to enhance its performance, particularly when confronted with limited training data. By leveraging the Deep SVDD technique to map normal log sequences into a hypersphere and harnessing the amplification effects of Adversarial Reweighted Learning, AdvSVDD demonstrates remarkable efficacy in anomaly detection. Empirical evaluations on the BlueGene/L (BG/L) and Thunderbird supercomputer datasets showcase AdvSVDD’s superiority over conventional machine learning and deep learning approaches, including the foundational Deep SVDD framework. Performance metrics such as Precision, Recall, F1-Score, ROC AUC, and PR AUC attest to its proficiency. Furthermore, the study emphasizes AdvSVDD’s effectiveness under constrained training data and offers valuable insights into the role of adversarial component has in the enhancement of anomaly detection
LogGPT: Log Anomaly Detection via GPT
Detecting system anomalies based on log data is important for ensuring the
security and reliability of computer systems. Recently, deep learning models
have been widely used for log anomaly detection. The core idea is to model the
log sequences as natural language and adopt deep sequential models, such as
LSTM or Transformer, to encode the normal patterns in log sequences via
language modeling. However, there is a gap between language modeling and
anomaly detection as the objective of training a sequential model via a
language modeling loss is not directly related to anomaly detection. To fill up
the gap, we propose LogGPT, a novel framework that employs GPT for log anomaly
detection. LogGPT is first trained to predict the next log entry based on the
preceding sequence. To further enhance the performance of LogGPT, we propose a
novel reinforcement learning strategy to finetune the model specifically for
the log anomaly detection task. The experimental results on three datasets show
that LogGPT significantly outperforms existing state-of-the-art approaches
CSCLog: A Component Subsequence Correlation-Aware Log Anomaly Detection Method
Anomaly detection based on system logs plays an important role in intelligent
operations, which is a challenging task due to the extremely complex log
patterns. Existing methods detect anomalies by capturing the sequential
dependencies in log sequences, which ignore the interactions of subsequences.
To this end, we propose CSCLog, a Component Subsequence Correlation-Aware Log
anomaly detection method, which not only captures the sequential dependencies
in subsequences, but also models the implicit correlations of subsequences.
Specifically, subsequences are extracted from log sequences based on components
and the sequential dependencies in subsequences are captured by Long Short-Term
Memory Networks (LSTMs). An implicit correlation encoder is introduced to model
the implicit correlations of subsequences adaptively. In addition, Graph
Convolution Networks (GCNs) are employed to accomplish the information
interactions of subsequences. Finally, attention mechanisms are exploited to
fuse the embeddings of all subsequences. Extensive experiments on four publicly
available log datasets demonstrate the effectiveness of CSCLog, outperforming
the best baseline by an average of 7.41% in Macro F1-Measure.Comment: submitted to TKDD, 18 pages and 7 figure
LogGD:Detecting Anomalies from System Logs by Graph Neural Networks
Log analysis is one of the main techniques engineers use to troubleshoot
faults of large-scale software systems. During the past decades, many log
analysis approaches have been proposed to detect system anomalies reflected by
logs. They usually take log event counts or sequential log events as inputs and
utilize machine learning algorithms including deep learning models to detect
system anomalies. These anomalies are often identified as violations of
quantitative relational patterns or sequential patterns of log events in log
sequences. However, existing methods fail to leverage the spatial structural
relationships among log events, resulting in potential false alarms and
unstable performance. In this study, we propose a novel graph-based log anomaly
detection method, LogGD, to effectively address the issue by transforming log
sequences into graphs. We exploit the powerful capability of Graph Transformer
Neural Network, which combines graph structure and node semantics for log-based
anomaly detection. We evaluate the proposed method on four widely-used public
log datasets. Experimental results show that LogGD can outperform
state-of-the-art quantitative-based and sequence-based methods and achieve
stable performance under different window size settings. The results confirm
that LogGD is effective in log-based anomaly detection.Comment: 12 pages, 12 figure
Log-based Anomaly Detection of CPS Using a Statistical Method
Detecting anomalies of a cyber physical system (CPS), which is a complex
system consisting of both physical and software parts, is important because a
CPS often operates autonomously in an unpredictable environment. However,
because of the ever-changing nature and lack of a precise model for a CPS,
detecting anomalies is still a challenging task. To address this problem, we
propose applying an outlier detection method to a CPS log. By using a log
obtained from an actual aquarium management system, we evaluated the
effectiveness of our proposed method by analyzing outliers that it detected. By
investigating the outliers with the developer of the system, we confirmed that
some outliers indicate actual faults in the system. For example, our method
detected failures of mutual exclusion in the control system that were unknown
to the developer. Our method also detected transient losses of functionalities
and unexpected reboots. On the other hand, our method did not detect anomalies
that were too many and similar. In addition, our method reported rare but
unproblematic concurrent combinations of operations as anomalies. Thus, our
approach is effective at finding anomalies, but there is still room for
improvement
AutoLog: A Log Sequence Synthesis Framework for Anomaly Detection
The rapid progress of modern computing systems has led to a growing interest
in informative run-time logs. Various log-based anomaly detection techniques
have been proposed to ensure software reliability. However, their
implementation in the industry has been limited due to the lack of high-quality
public log resources as training datasets.
While some log datasets are available for anomaly detection, they suffer from
limitations in (1) comprehensiveness of log events; (2) scalability over
diverse systems; and (3) flexibility of log utility. To address these
limitations, we propose AutoLog, the first automated log generation methodology
for anomaly detection. AutoLog uses program analysis to generate run-time log
sequences without actually running the system. AutoLog starts with probing
comprehensive logging statements associated with the call graphs of an
application. Then, it constructs execution graphs for each method after pruning
the call graphs to find log-related execution paths in a scalable manner.
Finally, AutoLog propagates the anomaly label to each acquired execution path
based on human knowledge. It generates flexible log sequences by walking along
the log execution paths with controllable parameters. Experiments on 50 popular
Java projects show that AutoLog acquires significantly more (9x-58x) log events
than existing log datasets from the same system, and generates log messages
much faster (15x) with a single machine than existing passive data collection
approaches. We hope AutoLog can facilitate the benchmarking and adoption of
automated log analysis techniques.Comment: The paper has been accepted by ASE 2023 (Research Track
- …