67 research outputs found
Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks
Designing robotic agents to perform open vocabulary tasks has been the
long-standing goal in robotics and AI. Recently, Large Language Models (LLMs)
have achieved impressive results in creating robotic agents for performing open
vocabulary tasks. However, planning for these tasks in the presence of
uncertainties is challenging as it requires \enquote{chain-of-thought}
reasoning, aggregating information from the environment, updating state
estimates, and generating actions based on the updated state estimates. In this
paper, we present an interactive planning technique for partially observable
tasks using LLMs. In the proposed method, an LLM is used to collect missing
information from the environment using a robot and infer the state of the
underlying problem from collected observations while guiding the robot to
perform the required actions. We also use a fine-tuned Llama 2 model via
self-instruct and compare its performance against a pre-trained LLM like GPT-4.
Results are demonstrated on several tasks in simulation as well as real-world
environments. A video describing our work along with some results could be
found here.Comment: 22 pages, 4 figure
A survey of robot manipulation in contact
In this survey, we present the current status on robots performing manipulation tasks that require varying contact with the environment, such that the robot must either implicitly or explicitly control the contact force with the environment to complete the task. Robots can perform more and more manipulation tasks that are still done by humans, and there is a growing number of publications on the topics of (1) performing tasks that always require contact and (2) mitigating uncertainty by leveraging the environment in tasks that, under perfect information, could be performed without contact. The recent trends have seen robots perform tasks earlier left for humans, such as massage, and in the classical tasks, such as peg-in-hole, there is a more efficient generalization to other similar tasks, better error tolerance, and faster planning or learning of the tasks. Thus, in this survey we cover the current stage of robots performing such tasks, starting from surveying all the different in-contact tasks robots can perform, observing how these tasks are controlled and represented, and finally presenting the learning and planning of the skills required to complete these tasks
Recommended from our members
Visual Dynamics Models for Robotic Planning and Control
For a robot to interact with its environment, it must perceive the world and understand how the world evolves as a consequence of its actions. This thesis studies a few methods that a robot can use to respond to its observations, with a focus on instances that can leverage visual dynamic models. In general, these are models of how the visual observations of a robot evolves as a consequence of its actions. This could be in the form of predictive models that directly predict the future in the space of image pixels, in the space of visual features extracted from these images, or in the space of compact learned latent representations. The three instances that this thesis studies are in the context of visual servoing, visual planning, and representation learning for reinforcement learning. In the first case, we combine learned visual features with learning single-step predictive dynamics models and reinforcement learning to learn visual servoing mechanisms. In the second case, we use a deterministic multi-step video prediction model to achieve various manipulation tasks through visual planning. In addition, we show that conventional video prediction models are unequipped to model uncertainty and multiple futures, which could limit the planning capabilities of the robot. To address this, we propose a stochastic video prediction model that is trained with a combination of variational losses, adversarial losses, and perceptual losses, and show that this model can predict futures that are more realistic, diverse, and accurate. Unlike the first two cases, in which the dynamics model is used to make predictions for decision-making, the third case learns the model solely for representation learning. We learn a stochastic sequential latent variable model to learn a latent representation, and then use it as an intermediate representation for reinforcement learning. We show that this approach improves final performance and sample efficiency
Deep Learning for Decision Making and Autonomous Complex Systems
Deep learning consists of various machine learning algorithms that aim to learn multiple levels of abstraction from data in a hierarchical manner. It is a tool to construct models using the data that mimics a real world process without an exceedingly tedious modelling of the actual process. We show that deep learning is a viable solution to decision making in mechanical engineering problems and complex physical systems.
In this work, we demonstrated the application of this data-driven method in the design of microfluidic devices to serve as a map between the user-defined cross-sectional shape of the flow and the corresponding arrangement of micropillars in the flow channel that contributed to the flow deformation. We also present how deep learning can be used in the early detection of combustion instability for prognostics and health monitoring of a combustion engine, such that appropriate measures can be taken to prevent detrimental effects as a result of unstable combustion.
One of the applications in complex systems concerns robotic path planning via the systematic learning of policies and associated rewards. In this context, a deep architecture is implemented to infer the expected value of information gained by performing an action based on the states of the environment. We also applied deep learning-based methods to enhance natural low-light images in the context of a surveillance framework and autonomous robots. Further, we looked at how machine learning methods can be used to perform root-cause analysis in cyber-physical systems subjected to a wide variety of operation anomalies. In all studies, the proposed frameworks have been shown to demonstrate promising feasibility and provided credible results for large-scale implementation in the industry
Multi-Robot Symbolic Task and Motion Planning Leveraging Human Trust Models: Theory and Applications
Multi-robot systems (MRS) can accomplish more complex tasks with two or more robots and have produced a broad set of applications. The presence of a human operator in an MRS can guarantee the safety of the task performing, but the human operators can be subject to heavier stress and cognitive workload in collaboration with the MRS than the single robot. It is significant for the MRS to have the provable correct task and motion planning solution for a complex task. That can reduce the human workload during supervising the task and improve the reliability of human-MRS collaboration. This dissertation relies on formal verification to provide the provable-correct solution for the robotic system. One of the challenges in task and motion planning under temporal logic task specifications is developing computationally efficient MRS frameworks. The dissertation first presents an automaton-based task and motion planning framework for MRS to satisfy finite words of linear temporal logic (LTL) task specifications in parallel and concurrently. Furthermore, the dissertation develops a computational trust model to improve the human-MRS collaboration for a motion task. Notably, the current works commonly underemphasize the environmental attributes when investigating the impacting factors of human trust in robots. Our computational trust model builds a linear state-space (LSS) equation to capture the influence of environment attributes on human trust in an MRS. A Bayesian optimization based experimental design (BOED) is proposed to sequentially learn the human-MRS trust model parameters in a data-efficient way. Finally, the dissertation shapes a reward function for the human-MRS collaborated complex task by referring to the above LTL task specification and computational trust model. A Bayesian active reinforcement learning (RL) algorithm is used to concurrently learn the shaped reward function and explore the most trustworthy task and motion planning solution
Path planning and control of flying robots with account of human’s safety perception
In this dissertation, a framework for planning and control of flying robot with the account of human’s safety perception is presented. The framework enables the flying robot to consider the human’s perceived safety in path planning. First, a data-driven model of the human’s safety perception is estimated from human’s test data using a virtual reality environment. A hidden Markov model (HMM) is considered for estimation of latent variables, as user’s attention, intention, and emotional state. Then, an optimal motion planner generates a trajectory, parameterized in Bernstein polynomials, which minimizes the cost related to the mission objectives while satisfying the constraints on the predicted human’s safety perception. Using Model Predictive Path Integral (MPPI) framework, the algorithm is possible to execute in real-time measuring the human’s spatial position and the changes in the environment.
A HMM-based Q-learning is considered for computing the online optimal policy. The HMM-based Q-learning estimates the hidden state of the human in interactions with the robot. The state estimator in the HMM-based Q-learning infers the hidden states of the human based on past observations and actions. The convergence of the HMM-based Q-learning for a partially observable Markov decision process (POMDP) with finite state space is proved using stochastic approximation technique.
As future research direction one can consider to use recurrent neural networks to estimate the hidden state in continuous state space. The analysis of the convergence of the HMM-based Q-learning algorithm suggests that the training of the recurrent neural network needs to consider both the state estimation accuracy and the optimality principle
Vision-Language Foundation Models as Effective Robot Imitators
Recent progress in vision language foundation models has shown their ability
to understand multimodal data and resolve complicated vision language tasks,
including robotics manipulation. We seek a straightforward way of making use of
existing vision-language models (VLMs) with simple fine-tuning on robotics
data. To this end, we derive a simple and novel vision-language manipulation
framework, dubbed RoboFlamingo, built upon the open-source VLMs, OpenFlamingo.
Unlike prior works, RoboFlamingo utilizes pre-trained VLMs for single-step
vision-language comprehension, models sequential history information with an
explicit policy head, and is slightly fine-tuned by imitation learning only on
language-conditioned manipulation datasets. Such a decomposition provides
RoboFlamingo the flexibility for open-loop control and deployment on
low-performance platforms. By exceeding the state-of-the-art performance with a
large margin on the tested benchmark, we show RoboFlamingo can be an effective
and competitive alternative to adapt VLMs to robot control. Our extensive
experimental results also reveal several interesting conclusions regarding the
behavior of different pre-trained VLMs on manipulation tasks. We believe
RoboFlamingo has the potential to be a cost-effective and easy-to-use solution
for robotics manipulation, empowering everyone with the ability to fine-tune
their own robotics policy.Comment: Fix typos. Project page: https://roboflamingo.github.i
Information-theoretic Reasoning in Distributed and Autonomous Systems
The increasing prevalence of distributed and autonomous systems is transforming decision making in industries as diverse as agriculture, environmental monitoring, and healthcare. Despite significant efforts, challenges remain in robustly planning under uncertainty. In this thesis, we present a number of information-theoretic decision rules for improving the analysis and control of complex adaptive systems. We begin with the problem of quantifying the data storage (memory) and transfer (communication) within information processing systems. We develop an information-theoretic framework to study nonlinear interactions within cooperative and adversarial scenarios, solely from observations of each agent's dynamics. This framework is applied to simulations of robotic soccer games, where the measures reveal insights into team performance, including correlations of the information dynamics to the scoreline. We then study the communication between processes with latent nonlinear dynamics that are observed only through a filter. By using methods from differential topology, we show that the information-theoretic measures commonly used to infer communication in observed systems can also be used in certain partially observed systems. For robotic environmental monitoring, the quality of data depends on the placement of sensors. These locations can be improved by either better estimating the quality of future viewpoints or by a team of robots operating concurrently. By robustly handling the uncertainty of sensor model measurements, we are able to present the first end-to-end robotic system for autonomously tracking small dynamic animals, with a performance comparable to human trackers. We then solve the issue of coordinating multi-robot systems through distributed optimisation techniques. These allow us to develop non-myopic robot trajectories for these tasks and, importantly, show that these algorithms provide guarantees for convergence rates to the optimal payoff sequence
- …