7 research outputs found
Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Errors
Although the dynamic type system of Python facilitates the developers in
writing Python programs, it also brings type errors at run-time. There exist
rule-based approaches for automatically repairing Python type errors. The
approaches can generate accurate patches but they require domain experts to
design patch synthesis rules and suffer from low template coverage of
real-world type errors. Learning-based approaches alleviate the manual efforts
in designing patch synthesis rules. Among the learning-based approaches, the
prompt-based approach which leverages the knowledge base of code pre-trained
models via pre-defined prompts, obtains state-of-the-art performance in general
program repair tasks. However, such prompts are manually defined and do not
involve any specific clues for repairing Python type errors, resulting in
limited effectiveness. How to automatically improve prompts with the domain
knowledge for type error repair is challenging yet under-explored. In this
paper, we present TypeFix, a novel prompt-based approach with fix templates
incorporated for repairing Python type errors. TypeFix first mines generalized
fix templates via a novel hierarchical clustering algorithm. The identified fix
templates indicate the common edit patterns and contexts of existing type error
fixes. TypeFix then generates code prompts for code pre-trained models by
employing the generalized fix templates as domain knowledge, in which the masks
are adaptively located for each type error instead of being pre-determined.
Experiments on two benchmarks, including BugsInPy and TypeBugs, show that
TypeFix successfully repairs 26 and 55 type errors, outperforming the best
baseline approach by 9 and 14, respectively. Besides, the proposed fix template
mining approach can cover 75% of developers' patches in both benchmarks,
increasing the best rule-based approach PyTER by more than 30%.Comment: This paper has been accepted by ICSE'2
EvLog: Evolving Log Analyzer for Anomalous Logs Identification
Software logs record system activities, aiding maintainers in identifying the
underlying causes for failures and enabling prompt mitigation actions. However,
maintainers need to inspect a large volume of daily logs to identify the
anomalous logs that reveal failure details for further diagnosis. Thus, how to
automatically distinguish these anomalous logs from normal logs becomes a
critical problem. Existing approaches alleviate the burden on software
maintainers, but they are built upon an improper yet critical assumption:
logging statements in the software remain unchanged. While software keeps
evolving, our empirical study finds that evolving software brings three
challenges: log parsing errors, evolving log events, and unstable log
sequences.
In this paper, we propose a novel unsupervised approach named Evolving Log
analyzer (EvLog) to mitigate these challenges. We first build a multi-level
representation extractor to process logs without parsing to prevent errors from
the parser. The multi-level representations preserve the essential semantics of
logs while leaving out insignificant changes in evolving events. EvLog then
implements an anomaly discriminator with an attention mechanism to identify the
anomalous logs and avoid the issue brought by the unstable sequence. EvLog has
shown effectiveness in two real-world system evolution log datasets with an
average F1 score of 0.955 and 0.847 in the intra-version setting and
inter-version setting, respectively, which outperforms other state-of-the-art
approaches by a wide margin. To our best knowledge, this is the first study on
tackling anomalous logs over software evolution. We believe our work sheds new
light on the impact of software evolution with the corresponding solutions for
the log analysis community
AutoLog: A Log Sequence Synthesis Framework for Anomaly Detection
The rapid progress of modern computing systems has led to a growing interest
in informative run-time logs. Various log-based anomaly detection techniques
have been proposed to ensure software reliability. However, their
implementation in the industry has been limited due to the lack of high-quality
public log resources as training datasets.
While some log datasets are available for anomaly detection, they suffer from
limitations in (1) comprehensiveness of log events; (2) scalability over
diverse systems; and (3) flexibility of log utility. To address these
limitations, we propose AutoLog, the first automated log generation methodology
for anomaly detection. AutoLog uses program analysis to generate run-time log
sequences without actually running the system. AutoLog starts with probing
comprehensive logging statements associated with the call graphs of an
application. Then, it constructs execution graphs for each method after pruning
the call graphs to find log-related execution paths in a scalable manner.
Finally, AutoLog propagates the anomaly label to each acquired execution path
based on human knowledge. It generates flexible log sequences by walking along
the log execution paths with controllable parameters. Experiments on 50 popular
Java projects show that AutoLog acquires significantly more (9x-58x) log events
than existing log datasets from the same system, and generates log messages
much faster (15x) with a single machine than existing passive data collection
approaches. We hope AutoLog can facilitate the benchmarking and adoption of
automated log analysis techniques.Comment: The paper has been accepted by ASE 2023 (Research Track
Go Static: Contextualized Logging Statement Generation
Logging practices have been extensively investigated to assist developers in
writing appropriate logging statements for documenting software behaviors.
Although numerous automatic logging approaches have been proposed, their
performance remains unsatisfactory due to the constraint of the single-method
input, without informative programming context outside the method.
Specifically, we identify three inherent limitations with single-method
context: limited static scope of logging statements, inconsistent logging
styles, and missing type information of logging variables. To tackle these
limitations, we propose SCLogger, the first contextualized logging statement
generation approach with inter-method static contexts. First, SCLogger extracts
inter-method contexts with static analysis to construct the contextualized
prompt for language models to generate a tentative logging statement. The
contextualized prompt consists of an extended static scope and sampled similar
methods, ordered by the chain-of-thought (COT) strategy. Second, SCLogger
refines the access of logging variables by formulating a new refinement prompt
for language models, which incorporates detailed type information of variables
in the tentative logging statement. The evaluation results show that SCLogger
surpasses the state-of-the-art approach by 8.7% in logging position accuracy,
32.1% in level accuracy, 19.6% in variable precision, and 138.4% in text BLEU-4
score. Furthermore, SCLogger consistently boosts the performance of logging
statement generation across a range of large language models, thereby
showcasing the generalizability of this approach.Comment: This paper was accepted by The ACM International Conference on the
Foundations of Software Engineering (FSE 2024
A Large-scale Benchmark for Log Parsing
Log data is pivotal in activities like anomaly detection and failure
diagnosis in the automated maintenance of software systems. Due to their
unstructured format, log parsing is often required to transform them into a
structured format for automated analysis. A variety of log parsers exist,
making it vital to benchmark these tools to comprehend their features and
performance. However, existing datasets for log parsing are limited in terms of
scale and representativeness, posing challenges for studies that aim to
evaluate or develop log parsers. This problem becomes more pronounced when
these parsers are evaluated for production use. To address these issues, we
introduce a new collection of large-scale annotated log datasets, named LogPub,
which more accurately mirrors log data observed in real-world software systems.
LogPub comprises 14 datasets, each averaging 3.6 million log lines. Utilizing
LogPub, we re-evaluate 15 log parsers in a more rigorous and practical setting.
We also propose a new evaluation metric to lessen the sensitivity of current
metrics to imbalanced data distribution. Furthermore, we are the first to
scrutinize the detailed performance of log parsers on logs that represent rare
system events and offer comprehensive information for system troubleshooting.
Parsing such logs accurately is vital yet challenging. We believe that our work
could shed light on the design and evaluation of log parsers in more realistic
settings, thereby facilitating their implementation in production systems