59 research outputs found
Recommended from our members
Designing Weakly Coupled Mems Resonators with Machine Learning-Based Method
We demonstrate a design scheme for weakly coupled resonators (WCRs) by integrating the supervised learning (SL) with the genetic algorithm (GA). In this work, three distinctive achievements have been accomplished: 1) the precise prediction of coupling characteristics of WCRs with an accuracy of 98.7% via SL; 2) the stepwise evolutionary optimization of WCR geometries while maintaining their geometric connectivity via GA; and 3) the highly efficient generation of WCR designs with a mean coupling factor down to 0.0056, which outperforms 98% of random designs. The coupling behavior analysis and prediction are validated with experimental data of coupled microcantilevers from a published work. As such, this newly proposed scheme could shed light upon the structural optimization methods for high-performance MEMS devices with high degree of design freedom
Sample-efficient deep reinforcement learning from single agent to multiple agents
University of Technology Sydney. Faculty of Engineering and Information Technology.Deep reinforcement learning (DRL) has recently become a very popular topic in the academic field. However, it usually suffers the sample inefficiency problem due to the lack of effective exploration, instability, or temporal credit assignment issue. High sample complexity leads to a huge computation cost and adversely affects the employment of DRL techniques in practice. Despite many methods proposed to address this challenge, further improvements are still needed. This thesis contributes to developing sample-efficient DRL methods for continuous control from two perspectives: single agent and multiple agents. Specifically, the key contribution includes an uncertainty regularized policy learning method for single agent and two ensemble learning frameworks for multiple agents. Importantly, this thesis highlights that the multiple agents’ methods can be seen as bridging gaps among on-policy, off-policy RL, and evolutionary algorithms. Moreover, our approach achieves consistent improvements over the baseline methods and gives novel insight into effectively taking advantage of different methods to get the best of them
Reinforcement learning strategies using Monte-Carlo to solve the blackjack problem
Blackjack is a classic casino game in which the player attempts to outsmart the dealer by drawing a combination of cards with face values that add up to just under or equal to 21 but are more incredible than the hand of the dealer he manages to come up with. This study considers a simplified variation of blackjack, which has a dealer and plays no active role after the first two draws. A different game regime will be modeled for everyone to ten multiples of the conventional 52-card deck. Irrespective of the number of standard decks utilized, the game is played as a randomized discrete-time process. For determining the optimum course of action in terms of policy, we teach an agent-a decision maker-to optimize across the decision space of the game, considering the procedure as a finite Markov decision chain. To choose the most effective course of action, we mainly research Monte Carlo-based reinforcement learning approaches and compare them with q-learning, dynamic programming, and temporal difference. The performance of the distinct model-free policy iteration techniques is presented in this study, framing the game as a reinforcement learning problem
Learning to View: Decision Transformers for Active Object Detection
Active perception describes a broad class of techniques that couple planning
and perception systems to move the robot in a way to give the robot more
information about the environment. In most robotic systems, perception is
typically independent of motion planning. For example, traditional object
detection is passive: it operates only on the images it receives. However, we
have a chance to improve the results if we allow planning to consume detection
signals and move the robot to collect views that maximize the quality of the
results. In this paper, we use reinforcement learning (RL) methods to control
the robot in order to obtain images that maximize the detection quality.
Specifically, we propose using a Decision Transformer with online fine-tuning,
which first optimizes the policy with a pre-collected expert dataset and then
improves the learned policy by exploring better solutions in the environment.
We evaluate the performance of proposed method on an interactive dataset
collected from an indoor scenario simulator. Experimental results demonstrate
that our method outperforms all baselines, including expert policy and pure
offline RL methods. We also provide exhaustive analyses of the reward
distribution and observation space.Comment: Accepted to ICRA 202
Offline Skill Graph (OSG): A Framework for Learning and Planning using Offline Reinforcement Learning Skills
Reinforcement Learning has received wide interest due to its success in
competitive games. Yet, its adoption in everyday applications is limited (e.g.
industrial, home, healthcare, etc.). In this paper, we address this limitation
by presenting a framework for planning over offline skills and solving complex
tasks in real-world environments. Our framework is comprised of three modules
that together enable the agent to learn from previously collected data and
generalize over it to solve long-horizon tasks. We demonstrate our approach by
testing it on a robotic arm that is required to solve complex tasks
VAPOR: Legged Robot Navigation in Outdoor Vegetation Using Offline Reinforcement Learning
We present VAPOR, a novel method for autonomous legged robot navigation in
unstructured, densely vegetated outdoor environments using offline
Reinforcement Learning (RL). Our method trains a novel RL policy using an
actor-critic network and arbitrary data collected in real outdoor vegetation.
Our policy uses height and intensity-based cost maps derived from 3D LiDAR
point clouds, a goal cost map, and processed proprioception data as state
inputs, and learns the physical and geometric properties of the surrounding
obstacles such as height, density, and solidity/stiffness. The fully-trained
policy's critic network is then used to evaluate the quality of dynamically
feasible velocities generated from a novel context-aware planner. Our planner
adapts the robot's velocity space based on the presence of entrapment inducing
vegetation, and narrow passages in dense environments. We demonstrate our
method's capabilities on a Spot robot in complex real-world outdoor scenes,
including dense vegetation. We observe that VAPOR's actions improve success
rates by up to 40%, decrease the average current consumption by up to 2.9%, and
decrease the normalized trajectory length by up to 11.2% compared to existing
end-to-end offline RL and other outdoor navigation methods
Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning
The pre-train and fine-tune paradigm in machine learning has had dramatic
success in a wide range of domains because the use of existing data or
pre-trained models on the internet enables quick and easy learning of new
tasks. We aim to enable this paradigm in robotic reinforcement learning,
allowing a robot to learn a new task with little human effort by leveraging
data and models from the Internet. However, reinforcement learning often
requires significant human effort in the form of manual reward specification or
environment resets, even if the policy is pre-trained. We introduce RoboFuME, a
reset-free fine-tuning system that pre-trains a multi-task manipulation policy
from diverse datasets of prior experiences and self-improves online to learn a
target task with minimal human intervention. Our insights are to utilize
calibrated offline reinforcement learning techniques to ensure efficient online
fine-tuning of a pre-trained policy in the presence of distribution shifts and
leverage pre-trained vision language models (VLMs) to build a robust reward
classifier for autonomously providing reward signals during the online
fine-tuning process. In a diverse set of five real robot manipulation tasks, we
show that our method can incorporate data from an existing robot dataset
collected at a different institution and improve on a target task within as
little as 3 hours of autonomous real-world experience. We also demonstrate in
simulation experiments that our method outperforms prior works that use
different RL algorithms or different approaches for predicting rewards. Project
website: https://robofume.github.i
ToP-ToM: Trust-aware Robot Policy with Theory of Mind
Theory of Mind (ToM) is a fundamental cognitive architecture that endows humans with the ability to attribute mental states to others. Humans infer the desires, beliefs, and intentions of others by observing their behavior and, in turn, adjust their actions to facilitate better interpersonal communication and team collaboration. In this paper, we investigated trust-aware robot policy with the theory of mind in a multiagent setting where a human collaborates with a robot against another human opponent. We show that by only focusing on team performance, the robot may resort to the reverse psychology trick, which poses a significant threat to trust maintenance. The human's trust in the robot will collapse when they discover deceptive behavior by the robot. To mitigate this problem, we adopt the robot theory of mind model to infer the human's trust beliefs, including true belief and false belief (an essential element of ToM). We designed a dynamic trust-aware reward function based on different trust beliefs to guide the robot policy learning, which aims to balance between avoiding human trust collapse due to robot reverse psychology. The experimental results demonstrate the importance of the ToM-based robot policy for human-robot trust and the effectiveness of our robot ToM-based robot policy in multiagent interaction settings
Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning
Learning from demonstration (LfD) is a popular technique that uses expert
demonstrations to learn robot control policies. However, the difficulty in
acquiring expert-quality demonstrations limits the applicability of LfD
methods: real-world data collection is often costly, and the quality of the
demonstrations depends greatly on the demonstrator's abilities and safety
concerns. A number of works have leveraged data augmentation (DA) to
inexpensively generate additional demonstration data, but most DA works
generate augmented data in a random fashion and ultimately produce highly
suboptimal data. In this work, we propose Guided Data Augmentation (GuDA), a
human-guided DA framework that generates expert-quality augmented data. The key
insight of GuDA is that while it may be difficult to demonstrate the sequence
of actions required to produce expert data, a user can often easily identify
when an augmented trajectory segment represents task progress. Thus, the user
can impose a series of simple rules on the DA process to automatically generate
augmented samples that approximate expert behavior. To extract a policy from
GuDA, we use off-the-shelf offline reinforcement learning and behavior cloning
algorithms. We evaluate GuDA on a physical robot soccer task as well as
simulated D4RL navigation tasks, a simulated autonomous driving task, and a
simulated soccer task. Empirically, we find that GuDA enables learning from a
small set of potentially suboptimal demonstrations and substantially
outperforms a DA strategy that samples augmented data randomly
- …