Search CORE

699 research outputs found

Communicative Agents for Software Development

Author: Chen Weize
Cong Xin
Liu Zhiyuan
Qian Chen
Su Yusheng
Sun Maosong
Xu Juyuan
Yang Cheng
Publication venue
Publication date: 18/07/2023
Field of study

Software engineering is a domain characterized by intricate decision-making processes, often relying on nuanced intuition and consultation. Recent advancements in deep learning have started to revolutionize software engineering practices through elaborate designs implemented at various stages of software development. In this paper, we present an innovative paradigm that leverages large language models (LLMs) throughout the entire software development process, streamlining and unifying key processes through natural language communication, thereby eliminating the need for specialized models at each phase. At the core of this paradigm lies ChatDev, a virtual chat-powered software development company that mirrors the established waterfall model, meticulously dividing the development process into four distinct chronological stages: designing, coding, testing, and documenting. Each stage engages a team of agents, such as programmers, code reviewers, and test engineers, fostering collaborative dialogue and facilitating a seamless workflow. The chat chain acts as a facilitator, breaking down each stage into atomic subtasks. This enables dual roles, allowing for proposing and validating solutions through context-aware communication, leading to efficient resolution of specific subtasks. The instrumental analysis of ChatDev highlights its remarkable efficacy in software generation, enabling the completion of the entire software development process in under seven minutes at a cost of less than one dollar. It not only identifies and alleviates potential vulnerabilities but also rectifies potential hallucinations while maintaining commendable efficiency and cost-effectiveness. The potential of ChatDev unveils fresh possibilities for integrating LLMs into the realm of software development.Comment: 25 pages, 9 figures, 2 table

arXiv.org e-Print Archive

Empowering LLM to use Smartphone for Intelligent Task Automation

Author: Jiang Shiqi
Li Toby Jia-Jun
Li Yuanchun
Liu Guohong
Liu Yunhao
Liu Yunxin
Wen Hao
Yu Tao
Zhang Yaqin
Zhao Shanhui
Publication venue
Publication date: 06/09/2023
Field of study

Mobile task automation is an attractive technique that aims to enable voice-based hands-free user interaction with smartphones. However, existing approaches suffer from poor scalability due to the limited language understanding ability and the non-trivial manual efforts required from developers or end-users. The recent advance of large language models (LLMs) in language understanding and reasoning inspires us to rethink the problem from a model-centric perspective, where task preparation, comprehension, and execution are handled by a unified language model. In this work, we introduce AutoDroid, a mobile task automation system that can handle arbitrary tasks on any Android application without manual efforts. The key insight is to combine the commonsense knowledge of LLMs and domain-specific knowledge of apps through automated dynamic analysis. The main components include a functionality-aware UI representation method that bridges the UI with the LLM, exploration-based memory injection techniques that augment the app-specific domain knowledge of LLM, and a multi-granularity query optimization module that reduces the cost of model inference. We integrate AutoDroid with off-the-shelf LLMs including online GPT-4/GPT-3.5 and on-device Vicuna, and evaluate its performance on a new benchmark for memory-augmented Android task automation with 158 common tasks. The results demonstrated that AutoDroid is able to precisely generate actions with an accuracy of 90.9%, and complete tasks with a success rate of 71.3%, outperforming the GPT-4-powered baselines by 36.4% and 39.7%. The demo, benchmark suites, and source code of AutoDroid will be released at url{https://autodroid-sys.github.io/}

arXiv.org e-Print Archive

14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon

Author: Benjamin Weiser
Christoph Völker
Ghezal Ahmad Zia
Other Authors
Sabine Kruschwitz
Publication venue
Publication date: 21/09/2023
Field of study

Large-language models (LLMs) such as GPT-4 caught the interest of many scientists. Recent studies suggested that these models could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon. This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of molecules and materials, designing novel interfaces for tools, extracting knowledge from unstructured data, and developing new educational applications. The diverse topics and the fact that working prototypes could be generated in less than two days highlight that LLMs will profoundly impact the future of our fields. The rich collection of ideas and projects also indicates that the applications of LLMs are not limited to materials science and chemistry but offer potential benefits to a wide range of scientific disciplines

ZENODO

14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon

Author: Ai Qianxiang
Al-Feghali Alexander
Badhwar Shruti
Blaiszik Ben
Bran Joshua D. Bocarsly Andres M
Bringuier Stefan
Brinson L. Catherine
Choudhary Kamal
Circi Defne
Cox Sam
de Jong Wibe A.
Evans Matthew L.
Foster Ian
Gastellu Nicolas
Genzling Jerome
Gil María Victoria
Gupta Ankur K.
Hong Zhi
Imran Alishba
Jablonka Kevin Maik
Kruschwitz Sabine
Labarre Anne
Liu Tao
Lála Jakub
Ma Steven
Majumdar Sauradeep
Merz Garrett W.
Moitessier Nicolas
Moubarak Elias
Mouriño Beatriz
Pelkie Brenden
Pieler Michael
Ramos Mayk Caldas
Ranković Bojana
Rodriques Samuel G.
Sanders Jacob N.
Schmidt KJ
Schwaller Philippe
Schwarting Marcus
Scourtas Aristana
Shi Jiale
Smit Berend
Smith Ben E.
Van Heck Joren
Völker Christoph
Ward Logan
Warren Sean
Weiser Benjamin
White Andrew D.
Zhang Sylvester
Zhang Xiaoqi
Zia Ghezal Ahmad
Publication venue
Publication date: 13/06/2023
Field of study

Chemistry and materials science are complex. Recently, there have been great successes in addressing this complexity using data-driven or computational techniques. Yet, the necessity of input structured in very specific forms and the fact that there is an ever-growing number of tools creates usability and accessibility challenges. Coupled with the reality that much data in these disciplines is unstructured, the effectiveness of these tools is limited. Motivated by recent works that indicated that large language models (LLMs) might help address some of these issues, we organized a hackathon event on the applications of LLMs in chemistry, materials science, and beyond. This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of molecules and materials, designing novel interfaces for tools, extracting knowledge from unstructured data, and developing new educational applications. The diverse topics and the fact that working prototypes could be generated in less than two days highlight that LLMs will profoundly impact the future of our fields. The rich collection of ideas and projects also indicates that the applications of LLMs are not limited to materials science and chemistry but offer potential benefits to a wide range of scientific disciplines

arXiv.org e-Print Archive

ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing

Author: Arawjo Ian
Glassman Elena
Swoopes Chelse
Vaithilingam Priyan
Wattenberg Martin
Publication venue
Publication date: 16/09/2023
Field of study

Evaluating outputs of large language models (LLMs) is challenging, requiring making -- and making sense of -- many responses. Yet tools that go beyond basic prompting tend to require knowledge of programming APIs, focus on narrow domains, or are closed-source. We present ChainForge, an open-source visual toolkit for prompt engineering and on-demand hypothesis testing of text generation LLMs. ChainForge provides a graphical interface for comparison of responses across models and prompt variations. Our system was designed to support three tasks: model selection, prompt template design, and hypothesis testing (e.g., auditing). We released ChainForge early in its development and iterated on its design with academics and online users. Through in-lab and interview studies, we find that a range of people could use ChainForge to investigate hypotheses that matter to them, including in real-world settings. We identify three modes of prompt engineering and LLM hypothesis testing: opportunistic exploration, limited evaluation, and iterative refinement.Comment: 23 pages, 7 figures, in submissio

arXiv.org e-Print Archive

Generative Artificial Intelligence for Software Engineering -- A Research Agenda

Author: Abrahamsson Pekka
Arora Chetan
Cabrero-Daniel Beatriz
Guerra Eduardo
Herda Tomas
Kemell Kai-Kristian
Khanna Dron
Le Huy
Melegati Jorge
Nguyen-Duc Anh
Przybylek Adam
Quan Tho
Rafiq Usman
Saari Mika
Zhang Zheying
Publication venue
Publication date: 28/10/2023
Field of study

Generative Artificial Intelligence (GenAI) tools have become increasingly prevalent in software development, offering assistance to various managerial and technical project activities. Notable examples of these tools include OpenAIs ChatGPT, GitHub Copilot, and Amazon CodeWhisperer. Although many recent publications have explored and evaluated the application of GenAI, a comprehensive understanding of the current development, applications, limitations, and open challenges remains unclear to many. Particularly, we do not have an overall picture of the current state of GenAI technology in practical software engineering usage scenarios. We conducted a literature review and focus groups for a duration of five months to develop a research agenda on GenAI for Software Engineering. We identified 78 open Research Questions (RQs) in 11 areas of Software Engineering. Our results show that it is possible to explore the adoption of GenAI in partial automation and support decision-making in all software development activities. While the current literature is skewed toward software implementation, quality assurance and software maintenance, other areas, such as requirements engineering, software design, and software engineering education, would need further research attention. Common considerations when implementing GenAI include industry-level assessment, dependability and accuracy, data accessibility, transparency, and sustainability aspects associated with the technology. GenAI is bringing significant changes to the field of software engineering. Nevertheless, the state of research on the topic still remains immature. We believe that this research agenda holds significance and practical value for informing both researchers and practitioners about current applications and guiding future research

arXiv.org e-Print Archive

Language models in molecular discovery

Author: Born Jannis
Erdmann Tim
Janakarajan Nikita
Laino Teodoro
Swaminathan Sarath
Publication venue
Publication date: 28/09/2023
Field of study

The success of language models, especially transformer-based architectures, has trickled into other domains giving rise to "scientific language models" that operate on small molecules, proteins or polymers. In chemistry, language models contribute to accelerating the molecule discovery cycle as evidenced by promising recent findings in early-stage drug discovery. Here, we review the role of language models in molecular discovery, underlining their strength in de novo drug design, property prediction and reaction chemistry. We highlight valuable open-source software assets thus lowering the entry barrier to the field of scientific language modeling. Last, we sketch a vision for future molecular design that combines a chatbot interface with access to computational chemistry tools. Our contribution serves as a valuable resource for researchers, chemists, and AI enthusiasts interested in understanding how language models can and will be used to accelerate chemical discovery.Comment: Under revie

arXiv.org e-Print Archive

Towards Generating Functionally Correct Code Edits from Natural Language Issue Descriptions

Author: Chakraborty Saikat
Fakhoury Sarah
Lahiri Shuvendu K.
Musuvathi Madan
Publication venue
Publication date: 07/04/2023
Field of study

Large language models (LLMs), such as OpenAI's Codex, have demonstrated their potential to generate code from natural language descriptions across a wide range of programming tasks. Several benchmarks have recently emerged to evaluate the ability of LLMs to generate functionally correct code from natural language intent with respect to a set of hidden test cases. This has enabled the research community to identify significant and reproducible advancements in LLM capabilities. However, there is currently a lack of benchmark datasets for assessing the ability of LLMs to generate functionally correct code edits based on natural language descriptions of intended changes. This paper aims to address this gap by motivating the problem NL2Fix of translating natural language descriptions of code changes (namely bug fixes described in Issue reports in repositories) into correct code fixes. To this end, we introduce Defects4J-NL2Fix, a dataset of 283 Java programs from the popular Defects4J dataset augmented with high-level descriptions of bug fixes, and empirically evaluate the performance of several state-of-the-art LLMs for the this task. Results show that these LLMS together are capable of generating plausible fixes for 64.6% of the bugs, and the best LLM-based technique can achieve up to 21.20% top-1 and 35.68% top-5 accuracy on this benchmark

arXiv.org e-Print Archive

PROPOSED MIDDLEWARE SOLUTION FOR RESOURCE-CONSTRAINED DISTRIBUTED EMBEDDED NETWORKS

Author: Rexroat Jason T
Publication venue: UKnowledge
Publication date: 01/01/2014
Field of study

The explosion in processing power of embedded systems has enabled distributed embedded networks to perform more complicated tasks. Middleware are sets of encapsulations of common and network/operating system-specific functionality into generic, reusable frameworks to manage such distributed networks. This thesis will survey and categorize popular middleware implementations into three adapted layers: host-infrastructure, distribution, and common services. This thesis will then apply a quantitative approach to grading and proposing a single middleware solution from all layers for two target platforms: CubeSats and autonomous unmanned aerial vehicles (UAVs). CubeSats are 10x10x10cm nanosatellites that are popular university-level space missions, and impose power and volume constraints. Autonomous UAVs are similarly-popular hobbyist-level vehicles that exhibit similar power and volume constraints. The MAVLink middleware from the host-infrastructure layer is proposed as the middleware to manage the distributed embedded networks powering these platforms in future projects. Finally, this thesis presents a performance analysis on MAVLink managing the ARM Cortex-M 32-bit processors that power the target platforms

University of Kentucky