60 research outputs found
EvLog: Evolving Log Analyzer for Anomalous Logs Identification
Software logs record system activities, aiding maintainers in identifying the
underlying causes for failures and enabling prompt mitigation actions. However,
maintainers need to inspect a large volume of daily logs to identify the
anomalous logs that reveal failure details for further diagnosis. Thus, how to
automatically distinguish these anomalous logs from normal logs becomes a
critical problem. Existing approaches alleviate the burden on software
maintainers, but they are built upon an improper yet critical assumption:
logging statements in the software remain unchanged. While software keeps
evolving, our empirical study finds that evolving software brings three
challenges: log parsing errors, evolving log events, and unstable log
sequences.
In this paper, we propose a novel unsupervised approach named Evolving Log
analyzer (EvLog) to mitigate these challenges. We first build a multi-level
representation extractor to process logs without parsing to prevent errors from
the parser. The multi-level representations preserve the essential semantics of
logs while leaving out insignificant changes in evolving events. EvLog then
implements an anomaly discriminator with an attention mechanism to identify the
anomalous logs and avoid the issue brought by the unstable sequence. EvLog has
shown effectiveness in two real-world system evolution log datasets with an
average F1 score of 0.955 and 0.847 in the intra-version setting and
inter-version setting, respectively, which outperforms other state-of-the-art
approaches by a wide margin. To our best knowledge, this is the first study on
tackling anomalous logs over software evolution. We believe our work sheds new
light on the impact of software evolution with the corresponding solutions for
the log analysis community
Recommending Analogical APIs via Knowledge Graph Embedding
Library migration, which re-implements the same software behavior by using a
different library instead of using the current one, has been widely observed in
software evolution. One essential part of library migration is to find an
analogical API that could provide the same functionality as current ones.
However, given the large number of libraries/APIs, manually finding an
analogical API could be very time-consuming and error-prone. Researchers have
developed multiple automated analogical API recommendation techniques.
Documentation-based methods have particularly attracted significant interest.
Despite their potential, these methods have limitations, such as a lack of
comprehensive semantic understanding in documentation and scalability
challenges. In this work, we propose KGE4AR, a novel documentation-based
approach that leverages knowledge graph (KG) embedding to recommend analogical
APIs during library migration. Specifically, KGE4AR proposes a novel unified
API KG to comprehensively and structurally represent three types of knowledge
in documentation, which can better capture the high-level semantics. Moreover,
KGE4AR then proposes to embed the unified API KG into vectors, enabling more
effective and scalable similarity calculation. We build KGE4AR' s unified API
KG for 35,773 Java libraries and assess it in two API recommendation scenarios:
with and without target libraries. Our results show that KGE4AR substantially
outperforms state-of-the-art documentation-based techniques in both evaluation
scenarios in terms of all metrics (e.g., 47.1%-143.0% and 11.7%-80.6% MRR
improvements in each scenario). Additionally, we explore KGE4AR' s scalability,
confirming its effective scaling with the growing number of libraries.Comment: Accepted by FSE 202
Is Your Quantum Program Bug-Free?
Quantum computers are becoming more mainstream. As more programmers are
starting to look at writing quantum programs, they face an inevitable task of
debugging their code. How should the programs for quantum computers be
debugged? In this paper, we discuss existing debugging tactics, used in
developing programs for classic computers, and show which ones can be readily
adopted. We also highlight quantum-computer-specific debugging issues and list
novel techniques that are needed to address these issues. The practitioners can
readily apply some of these tactics to their process of writing quantum
programs, while researchers can learn about opportunities for future work.Comment: 12 pages, 2 figures, accepted for publication in Proceedings of the
42nd International Conference on Software Engineering: New Ideas and Emerging
Results, 202
Heterogeneous Anomaly Detection for Software Systems via Semi-supervised Cross-modal Attention
Prompt and accurate detection of system anomalies is essential to ensure the
reliability of software systems. Unlike manual efforts that exploit all
available run-time information, existing approaches usually leverage only a
single type of monitoring data (often logs or metrics) or fail to make
effective use of the joint information among different types of data.
Consequently, many false predictions occur. To better understand the
manifestations of system anomalies, we conduct a systematical study on a large
amount of heterogeneous data, i.e., logs and metrics. Our study demonstrates
that logs and metrics can manifest system anomalies collaboratively and
complementarily, and neither of them only is sufficient. Thus, integrating
heterogeneous data can help recover the complete picture of a system's health
status. In this context, we propose Hades, the first end-to-end semi-supervised
approach to effectively identify system anomalies based on heterogeneous data.
Our approach employs a hierarchical architecture to learn a global
representation of the system status by fusing log semantics and metric
patterns. It captures discriminative features and meaningful interactions from
heterogeneous data via a cross-modal attention module, trained in a
semi-supervised manner. We evaluate Hades extensively on large-scale simulated
data and datasets from Huawei Cloud. The experimental results present the
effectiveness of our model in detecting system anomalies. We also release the
code and the annotated dataset for replication and future research.Comment: In Proceedings of the 2023 IEEE/ACM 45th International Conference on
Software Engineering (ICSE). arXiv admin note: substantial text overlap with
arXiv:2207.0291
Combining Solution Reuse and Bound Tightening for Efficient Analysis of Evolving Systems
Software engineers have long employed formal verification to ensure the safety and validity of their system designs. As the system changes—often via predictable, domain-specific operations—their models must also change, requiring system designers to repeatedly execute the same formal verification on similar system models. State-of-the-art formal verification techniques can be expensive at scale, the cost of which is multiplied by repeated analysis. This paper presents a novel analysis technique—implemented in a tool called SoRBoT—which can automatically determine domain-specific optimizations that can dramatically reduce the cost of repeatedly analyzing evolving systems. Different from all prior approaches, which focus on either tightening the bounds for analysis or reusing all or part of prior solutions, SoRBoT’s automated derivation of domain-specific optimizations combines the benefits of both solution reuse and bound tightening while avoiding the main pitfalls of each. We experimentally evaluate SoRBoT against state-of-the-art techniques for verifying evolving specifications, demonstrating that SoRBoT substantially exceeds the run time performance of those state-of-the-art techniques while introducing only a negligible overhead, in contrast to the expensive additional computations required by the state-of-the-art verification techniques
Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Errors
Although the dynamic type system of Python facilitates the developers in
writing Python programs, it also brings type errors at run-time. There exist
rule-based approaches for automatically repairing Python type errors. The
approaches can generate accurate patches but they require domain experts to
design patch synthesis rules and suffer from low template coverage of
real-world type errors. Learning-based approaches alleviate the manual efforts
in designing patch synthesis rules. Among the learning-based approaches, the
prompt-based approach which leverages the knowledge base of code pre-trained
models via pre-defined prompts, obtains state-of-the-art performance in general
program repair tasks. However, such prompts are manually defined and do not
involve any specific clues for repairing Python type errors, resulting in
limited effectiveness. How to automatically improve prompts with the domain
knowledge for type error repair is challenging yet under-explored. In this
paper, we present TypeFix, a novel prompt-based approach with fix templates
incorporated for repairing Python type errors. TypeFix first mines generalized
fix templates via a novel hierarchical clustering algorithm. The identified fix
templates indicate the common edit patterns and contexts of existing type error
fixes. TypeFix then generates code prompts for code pre-trained models by
employing the generalized fix templates as domain knowledge, in which the masks
are adaptively located for each type error instead of being pre-determined.
Experiments on two benchmarks, including BugsInPy and TypeBugs, show that
TypeFix successfully repairs 26 and 55 type errors, outperforming the best
baseline approach by 9 and 14, respectively. Besides, the proposed fix template
mining approach can cover 75% of developers' patches in both benchmarks,
increasing the best rule-based approach PyTER by more than 30%.Comment: This paper has been accepted by ICSE'2
Logram: Efficient Log Parsing Using n-Gram Dictionaries
Software systems usually record important runtime information in their logs.
Logs help practitioners understand system runtime behaviors and diagnose field
failures. As logs are usually very large in size, automated log analysis is
needed to assist practitioners in their software operation and maintenance
efforts. Typically, the first step of automated log analysis is log parsing,
i.e., converting unstructured raw logs into structured data. However, log
parsing is challenging, because logs are produced by static templates in the
source code (i.e., logging statements) yet the templates are usually
inaccessible when parsing logs. Prior work proposed automated log parsing
approaches that have achieved high accuracy. However, as the volume of logs
grows rapidly in the era of cloud computing, efficiency becomes a major concern
in log parsing. In this work, we propose an automated log parsing approach,
Logram, which leverages n-gram dictionaries to achieve efficient log parsing.
We evaluated Logram on 16 public log datasets and compared Logram with five
state-of-the-art log parsing approaches. We found that Logram achieves a
similar parsing accuracy to the best existing approaches while outperforms
these approaches in efficiency (i.e., 1.8 to 5.1 times faster than the second
fastest approaches). Furthermore, we deployed Logram on Spark and we found that
Logram scales out efficiently with the number of Spark nodes (e.g., with
near-linear scalability) without sacrificing parsing accuracy. In addition, we
demonstrated that Logram can support effective online parsing of logs,
achieving similar parsing results and efficiency with the offline mode.Comment: 13 pages, IEEE journal forma
Un análisis preliminar sobre reparación de modelos Alloy utilizando Sketching
El tamaño y complejidad de los sistemas de software modernos muestran de manera taxativa la necesidad e importancia de contemplar las etapas tempranas en el desarrollo de software. En particular, una de estas etapas que permite tener una comprensión más abstracta y general del sistema como un todo, es la etapa de modelado. Si bien existe una variada gama de lenguajes para tal fin, dos caracterÃsticas que consideramos importantes para su elección son su output como entrada para las etapas siguientes y su versatilidad en el análisis. En este sentido, los lenguajes con algún grado de formalismo subyacente prevalecen al permitir construir herramientas automáticas o semi-automáticas para su procesamiento.
Al igual que en las siguientes etapas del desarrollo, el modelado no se encuentra exento de errores como producto de una actividad humana.
Para abordar este problema, diferentes técnicas y herramientas fueron propuestos. En este trabajo proponemos combinar dos técnicas conocidas con el objetivo de reparar posibles errores en modelos especificados en Alloy. Utilizando el testing como herramienta para localizar errores, nuestra técnica emplea el concepto de Sketching para descubrir y proponer una posible reparación de los mismos.XVI Workshop IngenierÃa de Software.Red de Universidades con Carreras en Informátic
- …