699 research outputs found
Communicative Agents for Software Development
Software engineering is a domain characterized by intricate decision-making
processes, often relying on nuanced intuition and consultation. Recent
advancements in deep learning have started to revolutionize software
engineering practices through elaborate designs implemented at various stages
of software development. In this paper, we present an innovative paradigm that
leverages large language models (LLMs) throughout the entire software
development process, streamlining and unifying key processes through natural
language communication, thereby eliminating the need for specialized models at
each phase. At the core of this paradigm lies ChatDev, a virtual chat-powered
software development company that mirrors the established waterfall model,
meticulously dividing the development process into four distinct chronological
stages: designing, coding, testing, and documenting. Each stage engages a team
of agents, such as programmers, code reviewers, and test engineers, fostering
collaborative dialogue and facilitating a seamless workflow. The chat chain
acts as a facilitator, breaking down each stage into atomic subtasks. This
enables dual roles, allowing for proposing and validating solutions through
context-aware communication, leading to efficient resolution of specific
subtasks. The instrumental analysis of ChatDev highlights its remarkable
efficacy in software generation, enabling the completion of the entire software
development process in under seven minutes at a cost of less than one dollar.
It not only identifies and alleviates potential vulnerabilities but also
rectifies potential hallucinations while maintaining commendable efficiency and
cost-effectiveness. The potential of ChatDev unveils fresh possibilities for
integrating LLMs into the realm of software development.Comment: 25 pages, 9 figures, 2 table
Empowering LLM to use Smartphone for Intelligent Task Automation
Mobile task automation is an attractive technique that aims to enable
voice-based hands-free user interaction with smartphones. However, existing
approaches suffer from poor scalability due to the limited language
understanding ability and the non-trivial manual efforts required from
developers or end-users. The recent advance of large language models (LLMs) in
language understanding and reasoning inspires us to rethink the problem from a
model-centric perspective, where task preparation, comprehension, and execution
are handled by a unified language model. In this work, we introduce AutoDroid,
a mobile task automation system that can handle arbitrary tasks on any Android
application without manual efforts. The key insight is to combine the
commonsense knowledge of LLMs and domain-specific knowledge of apps through
automated dynamic analysis. The main components include a functionality-aware
UI representation method that bridges the UI with the LLM, exploration-based
memory injection techniques that augment the app-specific domain knowledge of
LLM, and a multi-granularity query optimization module that reduces the cost of
model inference. We integrate AutoDroid with off-the-shelf LLMs including
online GPT-4/GPT-3.5 and on-device Vicuna, and evaluate its performance on a
new benchmark for memory-augmented Android task automation with 158 common
tasks. The results demonstrated that AutoDroid is able to precisely generate
actions with an accuracy of 90.9%, and complete tasks with a success rate of
71.3%, outperforming the GPT-4-powered baselines by 36.4% and 39.7%. The demo,
benchmark suites, and source code of AutoDroid will be released at
url{https://autodroid-sys.github.io/}
14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon
Large-language models (LLMs) such as GPT-4 caught the interest of many scientists. Recent studies suggested that these models could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon.
This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of molecules and materials, designing novel interfaces for tools, extracting knowledge from unstructured data, and developing new educational applications.
The diverse topics and the fact that working prototypes could be generated in less than two days highlight that LLMs will profoundly impact the future of our fields. The rich collection of ideas and projects also indicates that the applications of LLMs are not limited to materials science and chemistry but offer potential benefits to a wide range of scientific disciplines
14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon
Chemistry and materials science are complex. Recently, there have been great
successes in addressing this complexity using data-driven or computational
techniques. Yet, the necessity of input structured in very specific forms and
the fact that there is an ever-growing number of tools creates usability and
accessibility challenges. Coupled with the reality that much data in these
disciplines is unstructured, the effectiveness of these tools is limited.
Motivated by recent works that indicated that large language models (LLMs)
might help address some of these issues, we organized a hackathon event on the
applications of LLMs in chemistry, materials science, and beyond. This article
chronicles the projects built as part of this hackathon. Participants employed
LLMs for various applications, including predicting properties of molecules and
materials, designing novel interfaces for tools, extracting knowledge from
unstructured data, and developing new educational applications.
The diverse topics and the fact that working prototypes could be generated in
less than two days highlight that LLMs will profoundly impact the future of our
fields. The rich collection of ideas and projects also indicates that the
applications of LLMs are not limited to materials science and chemistry but
offer potential benefits to a wide range of scientific disciplines
ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing
Evaluating outputs of large language models (LLMs) is challenging, requiring
making -- and making sense of -- many responses. Yet tools that go beyond basic
prompting tend to require knowledge of programming APIs, focus on narrow
domains, or are closed-source. We present ChainForge, an open-source visual
toolkit for prompt engineering and on-demand hypothesis testing of text
generation LLMs. ChainForge provides a graphical interface for comparison of
responses across models and prompt variations. Our system was designed to
support three tasks: model selection, prompt template design, and hypothesis
testing (e.g., auditing). We released ChainForge early in its development and
iterated on its design with academics and online users. Through in-lab and
interview studies, we find that a range of people could use ChainForge to
investigate hypotheses that matter to them, including in real-world settings.
We identify three modes of prompt engineering and LLM hypothesis testing:
opportunistic exploration, limited evaluation, and iterative refinement.Comment: 23 pages, 7 figures, in submissio
Generative Artificial Intelligence for Software Engineering -- A Research Agenda
Generative Artificial Intelligence (GenAI) tools have become increasingly
prevalent in software development, offering assistance to various managerial
and technical project activities. Notable examples of these tools include
OpenAIs ChatGPT, GitHub Copilot, and Amazon CodeWhisperer. Although many recent
publications have explored and evaluated the application of GenAI, a
comprehensive understanding of the current development, applications,
limitations, and open challenges remains unclear to many. Particularly, we do
not have an overall picture of the current state of GenAI technology in
practical software engineering usage scenarios. We conducted a literature
review and focus groups for a duration of five months to develop a research
agenda on GenAI for Software Engineering. We identified 78 open Research
Questions (RQs) in 11 areas of Software Engineering. Our results show that it
is possible to explore the adoption of GenAI in partial automation and support
decision-making in all software development activities. While the current
literature is skewed toward software implementation, quality assurance and
software maintenance, other areas, such as requirements engineering, software
design, and software engineering education, would need further research
attention. Common considerations when implementing GenAI include industry-level
assessment, dependability and accuracy, data accessibility, transparency, and
sustainability aspects associated with the technology. GenAI is bringing
significant changes to the field of software engineering. Nevertheless, the
state of research on the topic still remains immature. We believe that this
research agenda holds significance and practical value for informing both
researchers and practitioners about current applications and guiding future
research
Language models in molecular discovery
The success of language models, especially transformer-based architectures,
has trickled into other domains giving rise to "scientific language models"
that operate on small molecules, proteins or polymers. In chemistry, language
models contribute to accelerating the molecule discovery cycle as evidenced by
promising recent findings in early-stage drug discovery. Here, we review the
role of language models in molecular discovery, underlining their strength in
de novo drug design, property prediction and reaction chemistry. We highlight
valuable open-source software assets thus lowering the entry barrier to the
field of scientific language modeling. Last, we sketch a vision for future
molecular design that combines a chatbot interface with access to computational
chemistry tools. Our contribution serves as a valuable resource for
researchers, chemists, and AI enthusiasts interested in understanding how
language models can and will be used to accelerate chemical discovery.Comment: Under revie
Towards Generating Functionally Correct Code Edits from Natural Language Issue Descriptions
Large language models (LLMs), such as OpenAI's Codex, have demonstrated their
potential to generate code from natural language descriptions across a wide
range of programming tasks. Several benchmarks have recently emerged to
evaluate the ability of LLMs to generate functionally correct code from natural
language intent with respect to a set of hidden test cases. This has enabled
the research community to identify significant and reproducible advancements in
LLM capabilities. However, there is currently a lack of benchmark datasets for
assessing the ability of LLMs to generate functionally correct code edits based
on natural language descriptions of intended changes. This paper aims to
address this gap by motivating the problem NL2Fix of translating natural
language descriptions of code changes (namely bug fixes described in Issue
reports in repositories) into correct code fixes. To this end, we introduce
Defects4J-NL2Fix, a dataset of 283 Java programs from the popular Defects4J
dataset augmented with high-level descriptions of bug fixes, and empirically
evaluate the performance of several state-of-the-art LLMs for the this task.
Results show that these LLMS together are capable of generating plausible fixes
for 64.6% of the bugs, and the best LLM-based technique can achieve up to
21.20% top-1 and 35.68% top-5 accuracy on this benchmark
PROPOSED MIDDLEWARE SOLUTION FOR RESOURCE-CONSTRAINED DISTRIBUTED EMBEDDED NETWORKS
The explosion in processing power of embedded systems has enabled distributed embedded networks to perform more complicated tasks. Middleware are sets of encapsulations of common and network/operating system-specific functionality into generic, reusable frameworks to manage such distributed networks. This thesis will survey and categorize popular middleware implementations into three adapted layers: host-infrastructure, distribution, and common services. This thesis will then apply a quantitative approach to grading and proposing a single middleware solution from all layers for two target platforms: CubeSats and autonomous unmanned aerial vehicles (UAVs). CubeSats are 10x10x10cm nanosatellites that are popular university-level space missions, and impose power and volume constraints. Autonomous UAVs are similarly-popular hobbyist-level vehicles that exhibit similar power and volume constraints. The MAVLink middleware from the host-infrastructure layer is proposed as the middleware to manage the distributed embedded networks powering these platforms in future projects. Finally, this thesis presents a performance analysis on MAVLink managing the ARM Cortex-M 32-bit processors that power the target platforms
- …