Search CORE

67 research outputs found

Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks

Author: Corcodel Radu
Hori Chiori
Jain Siddarth
Jha Devesh K.
Romeres Diego
Sun Lingfeng
Tomizuka Masayoshi
Zhu Xinghao
Publication venue
Publication date: 11/12/2023
Field of study

Designing robotic agents to perform open vocabulary tasks has been the long-standing goal in robotics and AI. Recently, Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks. However, planning for these tasks in the presence of uncertainties is challenging as it requires \enquote{chain-of-thought} reasoning, aggregating information from the environment, updating state estimates, and generating actions based on the updated state estimates. In this paper, we present an interactive planning technique for partially observable tasks using LLMs. In the proposed method, an LLM is used to collect missing information from the environment using a robot and infer the state of the underlying problem from collected observations while guiding the robot to perform the required actions. We also use a fine-tuned Llama 2 model via self-instruct and compare its performance against a pre-trained LLM like GPT-4. Results are demonstrated on several tasks in simulation as well as real-world environments. A video describing our work along with some results could be found here.Comment: 22 pages, 4 figure

arXiv.org e-Print Archive

A survey of robot manipulation in contact

Author: Karayiannidis Yiannis
Kyrki Ville
Suomalainen Markku
Publication venue: 'Elsevier BV'
Publication date: 03/12/2021
Field of study

In this survey, we present the current status on robots performing manipulation tasks that require varying contact with the environment, such that the robot must either implicitly or explicitly control the contact force with the environment to complete the task. Robots can perform more and more manipulation tasks that are still done by humans, and there is a growing number of publications on the topics of (1) performing tasks that always require contact and (2) mitigating uncertainty by leveraging the environment in tasks that, under perfect information, could be performed without contact. The recent trends have seen robots perform tasks earlier left for humans, such as massage, and in the classical tasks, such as peg-in-hole, there is a more efficient generalization to other similar tasks, better error tolerance, and faster planning or learning of the tasks. Thus, in this survey we cover the current stage of robots performing such tasks, starting from surveying all the different in-contact tasks robots can perform, observing how these tasks are controlled and represented, and finally presenting the learning and planning of the skills required to complete these tasks

arXiv.org e-Print Archive

Lund University Publications

Aaltodoc Publication Archive

Chalmers Research

University of Oulu Repository - Jultika

Recommended from our members

Visual Dynamics Models for Robotic Planning and Control

Author: Lee Alex Xavier
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

For a robot to interact with its environment, it must perceive the world and understand how the world evolves as a consequence of its actions. This thesis studies a few methods that a robot can use to respond to its observations, with a focus on instances that can leverage visual dynamic models. In general, these are models of how the visual observations of a robot evolves as a consequence of its actions. This could be in the form of predictive models that directly predict the future in the space of image pixels, in the space of visual features extracted from these images, or in the space of compact learned latent representations. The three instances that this thesis studies are in the context of visual servoing, visual planning, and representation learning for reinforcement learning. In the first case, we combine learned visual features with learning single-step predictive dynamics models and reinforcement learning to learn visual servoing mechanisms. In the second case, we use a deterministic multi-step video prediction model to achieve various manipulation tasks through visual planning. In addition, we show that conventional video prediction models are unequipped to model uncertainty and multiple futures, which could limit the planning capabilities of the robot. To address this, we propose a stochastic video prediction model that is trained with a combination of variational losses, adversarial losses, and perceptual losses, and show that this model can predict futures that are more realistic, diverse, and accurate. Unlike the first two cases, in which the dynamics model is used to make predictions for decision-making, the third case learns the model solely for representation learning. We learn a stochastic sequential latent variable model to learn a latent representation, and then use it as an intermediate representation for reinforcement learning. We show that this approach improves final performance and sample efficiency

eScholarship - University of California

Deep Learning for Decision Making and Autonomous Complex Systems

Author: Lore Kin Gwn
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2016
Field of study

Deep learning consists of various machine learning algorithms that aim to learn multiple levels of abstraction from data in a hierarchical manner. It is a tool to construct models using the data that mimics a real world process without an exceedingly tedious modelling of the actual process. We show that deep learning is a viable solution to decision making in mechanical engineering problems and complex physical systems. In this work, we demonstrated the application of this data-driven method in the design of microfluidic devices to serve as a map between the user-defined cross-sectional shape of the flow and the corresponding arrangement of micropillars in the flow channel that contributed to the flow deformation. We also present how deep learning can be used in the early detection of combustion instability for prognostics and health monitoring of a combustion engine, such that appropriate measures can be taken to prevent detrimental effects as a result of unstable combustion. One of the applications in complex systems concerns robotic path planning via the systematic learning of policies and associated rewards. In this context, a deep architecture is implemented to infer the expected value of information gained by performing an action based on the states of the environment. We also applied deep learning-based methods to enhance natural low-light images in the context of a surveillance framework and autonomous robots. Further, we looked at how machine learning methods can be used to perform root-cause analysis in cyber-physical systems subjected to a wide variety of operation anomalies. In all studies, the proposed frameworks have been shown to demonstrate promising feasibility and provided credible results for large-scale implementation in the industry

Digital Repository @ Iowa State University (ISU)

Multi-Robot Symbolic Task and Motion Planning Leveraging Human Trust Models: Theory and Applications

Author: Zheng Huanfei
Publication venue: Clemson University Libraries
Publication date: 01/11/2022
Field of study

Multi-robot systems (MRS) can accomplish more complex tasks with two or more robots and have produced a broad set of applications. The presence of a human operator in an MRS can guarantee the safety of the task performing, but the human operators can be subject to heavier stress and cognitive workload in collaboration with the MRS than the single robot. It is significant for the MRS to have the provable correct task and motion planning solution for a complex task. That can reduce the human workload during supervising the task and improve the reliability of human-MRS collaboration. This dissertation relies on formal verification to provide the provable-correct solution for the robotic system. One of the challenges in task and motion planning under temporal logic task specifications is developing computationally efficient MRS frameworks. The dissertation first presents an automaton-based task and motion planning framework for MRS to satisfy finite words of linear temporal logic (LTL) task specifications in parallel and concurrently. Furthermore, the dissertation develops a computational trust model to improve the human-MRS collaboration for a motion task. Notably, the current works commonly underemphasize the environmental attributes when investigating the impacting factors of human trust in robots. Our computational trust model builds a linear state-space (LSS) equation to capture the influence of environment attributes on human trust in an MRS. A Bayesian optimization based experimental design (BOED) is proposed to sequentially learn the human-MRS trust model parameters in a data-efficient way. Finally, the dissertation shapes a reward function for the human-MRS collaborated complex task by referring to the above LTL task specification and computational trust model. A Bayesian active reinforcement learning (RL) algorithm is used to concurrently learn the shaped reward function and explore the most trustworthy task and motion planning solution

Clemson University: TigerPrints

Path planning and control of flying robots with account of human’s safety perception

Author: Yoon Hyung Jin
Publication venue
Publication date: 01/05/2019
Field of study

In this dissertation, a framework for planning and control of flying robot with the account of human’s safety perception is presented. The framework enables the flying robot to consider the human’s perceived safety in path planning. First, a data-driven model of the human’s safety perception is estimated from human’s test data using a virtual reality environment. A hidden Markov model (HMM) is considered for estimation of latent variables, as user’s attention, intention, and emotional state. Then, an optimal motion planner generates a trajectory, parameterized in Bernstein polynomials, which minimizes the cost related to the mission objectives while satisfying the constraints on the predicted human’s safety perception. Using Model Predictive Path Integral (MPPI) framework, the algorithm is possible to execute in real-time measuring the human’s spatial position and the changes in the environment. A HMM-based Q-learning is considered for computing the online optimal policy. The HMM-based Q-learning estimates the hidden state of the human in interactions with the robot. The state estimator in the HMM-based Q-learning infers the hidden states of the human based on past observations and actions. The convergence of the HMM-based Q-learning for a partially observable Markov decision process (POMDP) with finite state space is proved using stochastic approximation technique. As future research direction one can consider to use recurrent neural networks to estimate the hidden state in continuous state space. The analysis of the convergence of the HMM-based Q-learning algorithm suggests that the training of the recurrent neural network needs to consider both the state estimation accuracy and the optimality principle

Illinois Digital Environment for Access to Learning and Scholarship Repository

Vision-Language Foundation Models as Effective Robot Imitators

Author: Cheang Chilam
Jing Ya
Kong Tao
Li Hang
Li Xinghang
Liu Huaping
Liu Minghuan
Wu Hongtao
Xu Jie
Yu Cunjun
Zhang Hanbo
Zhang Weinan
Publication venue
Publication date: 04/02/2024
Field of study

Recent progress in vision language foundation models has shown their ability to understand multimodal data and resolve complicated vision language tasks, including robotics manipulation. We seek a straightforward way of making use of existing vision-language models (VLMs) with simple fine-tuning on robotics data. To this end, we derive a simple and novel vision-language manipulation framework, dubbed RoboFlamingo, built upon the open-source VLMs, OpenFlamingo. Unlike prior works, RoboFlamingo utilizes pre-trained VLMs for single-step vision-language comprehension, models sequential history information with an explicit policy head, and is slightly fine-tuned by imitation learning only on language-conditioned manipulation datasets. Such a decomposition provides RoboFlamingo the flexibility for open-loop control and deployment on low-performance platforms. By exceeding the state-of-the-art performance with a large margin on the tested benchmark, we show RoboFlamingo can be an effective and competitive alternative to adapt VLMs to robot control. Our extensive experimental results also reveal several interesting conclusions regarding the behavior of different pre-trained VLMs on manipulation tasks. We believe RoboFlamingo has the potential to be a cost-effective and easy-to-use solution for robotics manipulation, empowering everyone with the ability to fine-tune their own robotics policy.Comment: Fix typos. Project page: https://roboflamingo.github.i

arXiv.org e-Print Archive

Information-theoretic Reasoning in Distributed and Autonomous Systems

Author: Cliff Oliver
Publication venue: Faculty of Engineering and Information Technologies, School of Aerospace, Mechanical and Mechatronic Engineering
Publication date: 01/01/2019
Field of study

The increasing prevalence of distributed and autonomous systems is transforming decision making in industries as diverse as agriculture, environmental monitoring, and healthcare. Despite significant efforts, challenges remain in robustly planning under uncertainty. In this thesis, we present a number of information-theoretic decision rules for improving the analysis and control of complex adaptive systems. We begin with the problem of quantifying the data storage (memory) and transfer (communication) within information processing systems. We develop an information-theoretic framework to study nonlinear interactions within cooperative and adversarial scenarios, solely from observations of each agent's dynamics. This framework is applied to simulations of robotic soccer games, where the measures reveal insights into team performance, including correlations of the information dynamics to the scoreline. We then study the communication between processes with latent nonlinear dynamics that are observed only through a filter. By using methods from differential topology, we show that the information-theoretic measures commonly used to infer communication in observed systems can also be used in certain partially observed systems. For robotic environmental monitoring, the quality of data depends on the placement of sensors. These locations can be improved by either better estimating the quality of future viewpoints or by a team of robots operating concurrently. By robustly handling the uncertainty of sensor model measurements, we are able to present the first end-to-end robotic system for autonomously tracking small dynamic animals, with a performance comparable to human trackers. We then solve the issue of coordinating multi-robot systems through distributed optimisation techniques. These allow us to develop non-myopic robot trajectories for these tasks and, importantly, show that these algorithms provide guarantees for convergence rates to the optimal payoff sequence

Sydney eScholarship