35 research outputs found
Autonomous Large Language Model Agents Enabling Intent-Driven Mobile GUI Testing
GUI testing checks if a software system behaves as expected when users
interact with its graphical interface, e.g., testing specific functionality or
validating relevant use case scenarios. Currently, deciding what to test at
this high level is a manual task since automated GUI testing tools target lower
level adequacy metrics such as structural code coverage or activity coverage.
We propose DroidAgent, an autonomous GUI testing agent for Android, for
semantic, intent-driven automation of GUI testing. It is based on Large
Language Models and support mechanisms such as long- and short-term memory.
Given an Android app, DroidAgent sets relevant task goals and subsequently
tries to achieve them by interacting with the app. Our empirical evaluation of
DroidAgent using 15 apps from the Themis benchmark shows that it can set up and
perform realistic tasks, with a higher level of autonomy. For example, when
testing a messaging app, DroidAgent created a second account and added a first
account as a friend, testing a realistic use case, without human intervention.
On average, DroidAgent achieved 61% activity coverage, compared to 51% for
current state-of-the-art GUI testing techniques. Further, manual analysis shows
that 317 out of the 374 autonomously created tasks are realistic and relevant
to app functionalities, and also that DroidAgent interacts deeply with the apps
and covers more features.Comment: 10 page
Large Language Models are Few-shot Testers: Exploring LLM-based General Bug Reproduction
Many automated test generation techniques have been developed to aid
developers with writing tests. To facilitate full automation, most existing
techniques aim to either increase coverage, or generate exploratory inputs.
However, existing test generation techniques largely fall short of achieving
more semantic objectives, such as generating tests to reproduce a given bug
report. Reproducing bugs is nonetheless important, as our empirical study shows
that the number of tests added in open source repositories due to issues was
about 28% of the corresponding project test suite size. Meanwhile, due to the
difficulties of transforming the expected program semantics in bug reports into
test oracles, existing failure reproduction techniques tend to deal exclusively
with program crashes, a small subset of all bug reports. To automate test
generation from general bug reports, we propose LIBRO, a framework that uses
Large Language Models (LLMs), which have been shown to be capable of performing
code-related tasks. Since LLMs themselves cannot execute the target buggy code,
we focus on post-processing steps that help us discern when LLMs are effective,
and rank the produced tests according to their validity. Our evaluation of
LIBRO shows that, on the widely studied Defects4J benchmark, LIBRO can generate
failure reproducing test cases for 33% of all studied cases (251 out of 750),
while suggesting a bug reproducing test in first place for 149 bugs. To
mitigate data contamination, we also evaluate LIBRO against 31 bug reports
submitted after the collection of the LLM training data terminated: LIBRO
produces bug reproducing tests for 32% of the studied bug reports. Overall, our
results show LIBRO has the potential to significantly enhance developer
efficiency by automatically generating tests from bug reports.Comment: Accepted to IEEE/ACM International Conference on Software Engineering
2023 (ICSE 2023
Towards Autonomous Testing Agents via Conversational Large Language Models
Software testing is an important part of the development cycle, yet it
requires specialized expertise and substantial developer effort to adequately
test software. The recent discoveries of the capabilities of large language
models (LLMs) suggest that they can be used as automated testing assistants,
and thus provide helpful information and even drive the testing process. To
highlight the potential of this technology, we present a taxonomy of LLM-based
testing agents based on their level of autonomy, and describe how a greater
level of autonomy can benefit developers in practice. An example use of LLMs as
a testing assistant is provided to demonstrate how a conversational framework
for testing can help developers. This also highlights how the often criticized
hallucination of LLMs can be beneficial while testing. We identify other
tangible benefits that LLM-driven testing agents can bestow, and also discuss
some potential limitations
The GitHub Recent Bugs Dataset for Evaluating LLM-based Debugging Applications
Large Language Models (LLMs) have demonstrated strong natural language
processing and code synthesis capabilities, which has led to their rapid
adoption in software engineering applications. However, details about LLM
training data are often not made public, which has caused concern as to whether
existing bug benchmarks are included. In lieu of the training data for the
popular GPT models, we examine the training data of the open-source LLM
StarCoder, and find it likely that data from the widely used Defects4J
benchmark was included, raising the possibility of its inclusion in GPT
training data as well. This makes it difficult to tell how well LLM-based
results on Defects4J would generalize, as for any results it would be unclear
whether a technique's performance is due to LLM generalization or memorization.
To remedy this issue and facilitate continued research on LLM-based SE, we
present the GitHub Recent Bugs (GHRB) dataset, which includes 76 real-world
Java bugs that were gathered after the OpenAI data cut-off point
Recommended from our members
Interface polarization model for a 2-dimensional electron gas at the BaSnO3/LaInO3 interface
In order to explain the experimental sheet carrier density n2D at the interface of BaSnO3/LaInO3, we consider a model that is based on the presence of interface polarization in LaInO3 which extends over 2 pseudocubic unit cells from the interface and eventually disappears in the next 2 unit cells. Considering such interface polarization in calculations based on 1D Poisson-Schrödinger equations, we consistently explain the dependence of the sheet carrier density of BaSnO3/LaInO3 heterinterfaces on the thickness of the LaInO3 layer and the La doping of the BaSnO3 layer. Our model is supported by a quantitative analysis of atomic position obtained from high resolution transmission electron microscopy which evidences suppression of the octahedral tilt and a vertical lattice expansion in LaInO3 over 2–3 pseudocubic unit cells at the coherently strained interface
Structural basis for arginine glycosylation of host substrates by bacterial effector proteins
The bacterial effector proteins SseK and NleB glycosylate host proteins on arginine residues, leading to reduced NF-κB-dependent responses to infection. Salmonella SseK1 and SseK2 are E. coli NleB1 orthologs that behave as NleB1-like GTs, although they differ in protein substrate specificity. Here we report that these enzymes are retaining glycosyltransferases composed of a helix-loop-helix (HLH) domain, a lid domain, and a catalytic domain. A conserved HEN motif (His-Glu-Asn) in the active site is important for enzyme catalysis and bacterial virulence. We observe differences between SseK1 and SseK2 in interactions with substrates and identify substrate residues that are critical for enzyme recognition. Long Molecular Dynamics simulations suggest that the HLH domain determines substrate specificity and the lid-domain regulates the opening of the active site. Overall, our data suggest a front-face SNi mechanism, explain differences in activities among these effectors, and have implications for future drug development against enteric pathogens
Immunohistochemical localization of galectin-3 in the granulomatous lesions of paratuberculosis-infected bovine intestine
The presence of galectin-3 was immunohistochemically quantified in bovine intestines infected with paratuberculosis (Johne's disease) to determine whether galectin-3 was involved in the formation of granulation tissue associated with the disease. Mycobacterium avium subsp. paratuberculosis infection was histochemically confirmed using Ziehl-Neelsen staining and molecularly diagnosed through rpoB DNA sequencing. Galectin-3 was detected in the majority of inflammatory cells, possibly macrophages, in the granulomatous lesions within affected tissues, including the ileum. These findings suggest that galectin-3 is associated with the formation of chronic granulation tissues in bovine paratuberculosis, probably through cell adhesion and anti-apoptosis mechanisms
2011 경제발전경험모듈화사업 : 한국의 산업발전 단계별 지식재산권 제도·정책의 정비 및 활용사례 : 특허제도를 중심으로
Prologue
Summary
Chapter 1 Introduction
1. Existing Discussions on the IPR System and Industrial Development
2. Observation of Korea’s IPR System in the Context of Industrial Development
3. Proposing Three Developmental Phases of Korea’s IPR System
Chapter 2 The First Developmental Phase of Korea’s IPR System: Introduction Period (1900s-70s)
1. Overview of the Introduction Period
2. Outline of Patent System&Policy
3. Major Details of Patent Administration and Infrastructure
Chapter 3 The Second Developmental Phase of Korea’s IPR System: Settlement Period (1980-late 1990s)
1. Overview of the Settlement Period (Second Period: from 1980 to the late 1990s)
2. Major Details of Patent System and Policy
3. Administration Institutes and Infrastructure for Patents
Chapter 4 The Third Developmental Phase of Korea’s IPR System: Advancement Period (late 1990s-Present)
1. Overview of the Advancement Period
2. Execution of Patent Institutions and Policies
3. Restructuring of Patent Administration and Infrastructure
Chapter 5 Evaluation and Implications
1. Three Keys to Korea’s Success
2. Discussion on Limitations
3. Closing Remarks
References
Appendi