Search CORE

5,076 research outputs found

Engineering design using game-enhanced CAD: The potential to augment the user experience with game elements

Author: Kosmadoudi Zoe
Lim Theodore
Liu Ying
Louchart Sandy
Ritchie James Millar
Sung Raymond
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Heriot Watt Pure

Crossref

Agent AI: Surveying the Horizons of Multimodal Interaction

Author: Choi Yejin
Durante Zane
Fei-Fei Li
Gao Jianfeng
Gong Ran
Huang Qiuyuan
Ikeuchi Katsushi
Noda Yusuke
Park Jae Sung
Sarkar Bidipta
Taori Rohan
Terzopoulos Demetri
Vo Hoi
Wake Naoki
Publication venue
Publication date: 25/01/2024
Field of study

Multi-modal AI systems will likely become a ubiquitous presence in our everyday lives. A promising approach to making these systems more interactive is to embody them as agents within physical and virtual environments. At present, systems leverage existing foundation models as the basic building blocks for the creation of embodied agents. Embedding agents within such environments facilitates the ability of models to process and interpret visual and contextual data, which is critical for the creation of more sophisticated and context-aware AI systems. For example, a system that can perceive user actions, human behavior, environmental objects, audio expressions, and the collective sentiment of a scene can be used to inform and direct agent responses within the given environment. To accelerate research on agent-based multimodal intelligence, we define "Agent AI" as a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data, and can produce meaningful embodied actions. In particular, we explore systems that aim to improve agents based on next-embodied action prediction by incorporating external knowledge, multi-sensory inputs, and human feedback. We argue that by developing agentic AI systems in grounded environments, one can also mitigate the hallucinations of large foundation models and their tendency to generate environmentally incorrect outputs. The emerging field of Agent AI subsumes the broader embodied and agentic aspects of multimodal interactions. Beyond agents acting and interacting in the physical world, we envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment

arXiv.org e-Print Archive

Large Language Model-based Human-Agent Collaboration for Complex Task Solving

Author: Chen Xu
Chen Zhi-Yuan
Feng Xueyang
Lin Yankai
Liu Zhiyuan
Qin Yujia
Wen Ji-Rong
Publication venue
Publication date: 20/02/2024
Field of study

In recent developments within the research community, the integration of Large Language Models (LLMs) in creating fully autonomous agents has garnered significant interest. Despite this, LLM-based agents frequently demonstrate notable shortcomings in adjusting to dynamic environments and fully grasping human needs. In this work, we introduce the problem of LLM-based human-agent collaboration for complex task-solving, exploring their synergistic potential. In addition, we propose a Reinforcement Learning-based Human-Agent Collaboration method, ReHAC. This approach includes a policy model designed to determine the most opportune stages for human intervention within the task-solving process. We construct a human-agent collaboration dataset to train this policy model in an offline reinforcement learning environment. Our validation tests confirm the model's effectiveness. The results demonstrate that the synergistic efforts of humans and LLM-based agents significantly improve performance in complex tasks, primarily through well-planned, limited human intervention. Datasets and code are available at: https://github.com/XueyangFeng/ReHAC

arXiv.org e-Print Archive

Skill and Knowledge Sharing in Cyber-Augmented Collaborative Physical Work Systems with HUB-CI

Author: Ajidarma Praditya
Nof Shimon Y.
Publication venue: 'Purdue University (bepress)'
Publication date: 20/07/2022
Field of study

Purdue E-Pubs

Communicative Agents for Software Development

Author: Chen Weize
Cong Xin
Liu Zhiyuan
Qian Chen
Su Yusheng
Sun Maosong
Xu Juyuan
Yang Cheng
Publication venue
Publication date: 18/07/2023
Field of study

Software engineering is a domain characterized by intricate decision-making processes, often relying on nuanced intuition and consultation. Recent advancements in deep learning have started to revolutionize software engineering practices through elaborate designs implemented at various stages of software development. In this paper, we present an innovative paradigm that leverages large language models (LLMs) throughout the entire software development process, streamlining and unifying key processes through natural language communication, thereby eliminating the need for specialized models at each phase. At the core of this paradigm lies ChatDev, a virtual chat-powered software development company that mirrors the established waterfall model, meticulously dividing the development process into four distinct chronological stages: designing, coding, testing, and documenting. Each stage engages a team of agents, such as programmers, code reviewers, and test engineers, fostering collaborative dialogue and facilitating a seamless workflow. The chat chain acts as a facilitator, breaking down each stage into atomic subtasks. This enables dual roles, allowing for proposing and validating solutions through context-aware communication, leading to efficient resolution of specific subtasks. The instrumental analysis of ChatDev highlights its remarkable efficacy in software generation, enabling the completion of the entire software development process in under seven minutes at a cost of less than one dollar. It not only identifies and alleviates potential vulnerabilities but also rectifies potential hallucinations while maintaining commendable efficiency and cost-effectiveness. The potential of ChatDev unveils fresh possibilities for integrating LLMs into the realm of software development.Comment: 25 pages, 9 figures, 2 table

arXiv.org e-Print Archive

Large Multimodal Agents: A Survey

Author: Chen Zhihong
Li Guanbin
Wan Xiang
Xie Junlin
Zhang Ruifei
Publication venue
Publication date: 23/02/2024
Field of study

Large language models (LLMs) have achieved superior performance in powering text-based AI agents, endowing them with decision-making and reasoning abilities akin to humans. Concurrently, there is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain. This extension enables AI agents to interpret and respond to diverse multimodal user queries, thereby handling more intricate and nuanced tasks. In this paper, we conduct a systematic review of LLM-driven multimodal agents, which we refer to as large multimodal agents ( LMAs for short). First, we introduce the essential components involved in developing LMAs and categorize the current body of research into four distinct types. Subsequently, we review the collaborative frameworks integrating multiple LMAs , enhancing collective efficacy. One of the critical challenges in this field is the diverse evaluation methods used across existing studies, hindering effective comparison among different LMAs . Therefore, we compile these evaluation methodologies and establish a comprehensive framework to bridge the gaps. This framework aims to standardize evaluations, facilitating more meaningful comparisons. Concluding our review, we highlight the extensive applications of LMAs and propose possible future research directions. Our discussion aims to provide valuable insights and guidelines for future research in this rapidly evolving field. An up-to-date resource list is available at https://github.com/jun0wanan/awesome-large-multimodal-agents.Comment: 15 pages, 4 figure

arXiv.org e-Print Archive