Search CORE

63 research outputs found

Recommended from our members

Abstractions in Reasoning for Long-Term Autonomy

Author: Wray Kyle Hollins
Publication venue: ScholarWorks@UMass Amherst
Publication date: 02/07/2019
Field of study

The path to building adaptive, robust, intelligent agents has led researchers to develop a suite of powerful models and algorithms for agents with a single objective. However, in recent years, attempts to use this monolithic approach to solve an ever-expanding set of complex real-world problems, which increasingly include long-term autonomous deployments, have illuminated challenges in its ability to scale. Consequently, a fragmented collection of hierarchical and multi-objective models were developed. This trend continues into the algorithms as well, as each approximates an optimal solution in a different manner for scalability. These models and algorithms represent an attempt to solve pieces of an overarching problem: how can an agent explicitly model and integrate the necessary aspects of reasoning required to achieve long-term autonomy? This thesis presents a general hierarchical and multi-objective model called a policy network that unifies prior fragmented solutions into a single graphical decision-making structure. Policy networks are broadly useful to solve numerous real-world problems. This thesis focuses on autonomous vehicle (AV) problems: (1) route-planning with multiple objectives; (2) semi-autonomy with proactive transfer of control; and (3) intersection decision-making for reasoning online about any number of other vehicles and pedestrians. Formal models are presented for each of the distinct problems. Solutions are evaluated using real-world map data in simulation and demonstrated on a fully operational AV prototype driving on real public roads. Policy networks serve as a shared underlying framework for all three, enabling their seamless integration as parts of an overall solution for rich, real-world, scalable decision-making in agents with long-term autonomy

ScholarWorks@UMass Amherst

Problems with Using Evolutionary Theory in Philosophy

Author: A Bird
B Fraassen van
D Papineau
E Ruttkamp-Bloem
E Sober
H Cruz De
H Putnam
K Atkins
KB Wray
KB Wray
KB Wray
KB Wray
M Mizrahi
P Kyle Stanford
S Park
S Psillos
Seungbae Park
Publication venue
Publication date: 01/01/2017
Field of study

Does science move toward truths? Are present scientific theories (approximately) true? Should we invoke truths to explain the success of science? Do our cognitive faculties track truths? Some philosophers say yes, while others say no, to these questions. Interestingly, both groups use the same scientific theory, viz., evolutionary theory, to defend their positions. I argue that it begs the question for the former group to do so because their positive answers imply that evolutionary theory is warranted, whereas it is self-defeating for the latter group to do so because their negative answers imply that evolutionary theory is unwarranted

Active teacher selection for reinforcement learning from human feedback

Author: Freedman Rachel
Russell Stuart
Svegliato Justin
Wray Kyle
Publication venue
Publication date: 23/10/2023
Field of study

Reinforcement learning from human feedback (RLHF) enables machine learning systems to learn objectives from human feedback. A core limitation of these systems is their assumption that all feedback comes from a single human teacher, despite querying a range of distinct teachers. We propose the Hidden Utility Bandit (HUB) framework to model differences in teacher rationality, expertise, and costliness, formalizing the problem of learning from multiple teachers. We develop a variety of solution algorithms and apply them to two real-world domains: paper recommendation systems and COVID-19 vaccine testing. We find that the Active Teacher Selection (ATS) algorithm outperforms baseline algorithms by actively selecting when and which teacher to query. The HUB framework and ATS algorithm demonstrate the importance of leveraging differences between teachers to learn accurate reward models, facilitating future research on active teacher selection for robust reward modeling

arXiv.org e-Print Archive

Constrained Hierarchical Monte Carlo Belief-State Planning

Author: Buurmeijer Hugo
Corso Anthony
Jamgochian Arec
Kochenderfer Mykel J.
Wray Kyle H.
Publication venue
Publication date: 30/10/2023
Field of study

Optimal plans in Constrained Partially Observable Markov Decision Processes (CPOMDPs) maximize reward objectives while satisfying hard cost constraints, generalizing safe planning under state and transition uncertainty. Unfortunately, online CPOMDP planning is extremely difficult in large or continuous problem domains. In many large robotic domains, hierarchical decomposition can simplify planning by using tools for low-level control given high-level action primitives (options). We introduce Constrained Options Belief Tree Search (COBeTS) to leverage this hierarchy and scale online search-based CPOMDP planning to large robotic problems. We show that if primitive option controllers are defined to satisfy assigned constraint budgets, then COBeTS will satisfy constraints anytime. Otherwise, COBeTS will guide the search towards a safe sequence of option primitives, and hierarchical monitoring can be used to achieve runtime safety. We demonstrate COBeTS in several safety-critical, constrained partially observable robotic domains, showing that it can plan successfully in continuous CPOMDPs while non-hierarchical baselines cannot.Comment: Under review for the 2024 IEEE International Conference on Robotics and Automation (ICRA

arXiv.org e-Print Archive

Integrated cooperation and competition in multi-agent decision-making

Author: KUMAR Akshat
WRAY Kyle Hollins
ZILBERSTEIN Shlomo
Publication venue: AAAI Press
Publication date: 01/02/2018
Field of study

Institutional Knowledge at Singapore Management University

Decision Making in Non-Stationary Environments with Policy-Augmented Search

Author: Baier Hendrik
Dubey Abhishek
Laszka Aron
Luo Baiting
Mukhopadhyay Ayan
Pettet Ava
Wray Kyle
Zhang Yunuo
Publication venue
Publication date: 20/01/2024
Field of study

Sequential decision-making under uncertainty is present in many important problems. Two popular approaches for tackling such problems are reinforcement learning and online search (e.g., Monte Carlo tree search). While the former learns a policy by interacting with the environment (typically done before execution), the latter uses a generative model of the environment to sample promising action trajectories at decision time. Decision-making is particularly challenging in non-stationary environments, where the environment in which an agent operates can change over time. Both approaches have shortcomings in such settings -- on the one hand, policies learned before execution become stale when the environment changes and relearning takes both time and computational effort. Online search, on the other hand, can return sub-optimal actions when there are limitations on allowed runtime. In this paper, we introduce \textit{Policy-Augmented Monte Carlo tree search} (PA-MCTS), which combines action-value estimates from an out-of-date policy with an online search using an up-to-date model of the environment. We prove theoretical results showing conditions under which PA-MCTS selects the one-step optimal action and also bound the error accrued while following PA-MCTS as a policy. We compare and contrast our approach with AlphaZero, another hybrid planning approach, and Deep Q Learning on several OpenAI Gym environments. Through extensive experiments, we show that under non-stationary settings with limited time constraints, PA-MCTS outperforms these baselines.Comment: Extended Abstract accepted for presentation at AAMAS 202

arXiv.org e-Print Archive

Experience Filter: Using Past Experiences on Unseen Tasks or Environments

Author: Corso Anthony L.
Kochenderfer Mykel J.
Witwicki Stefan J.
Wray Kyle H.
Yel Esen
Yildiz Anil
Publication venue
Publication date: 29/05/2023
Field of study

One of the bottlenecks of training autonomous vehicle (AV) agents is the variability of training environments. Since learning optimal policies for unseen environments is often very costly and requires substantial data collection, it becomes computationally intractable to train the agent on every possible environment or task the AV may encounter. This paper introduces a zero-shot filtering approach to interpolate learned policies of past experiences to generalize to unseen ones. We use an experience kernel to correlate environments. These correlations are then exploited to produce policies for new tasks or environments from learned policies. We demonstrate our methods on an autonomous vehicle driving through T-intersections with different characteristics, where its behavior is modeled as a partially observable Markov decision process (POMDP). We first construct compact representations of learned policies for POMDPs with unknown transition functions given a dataset of sequential actions and observations. Then, we filter parameterized policies of previously visited environments to generate policies to new, unseen environments. We demonstrate our approaches on both an actual AV and a high-fidelity simulator. Results indicate that our experience filter offers a fast, low-effort, and near-optimal solution to create policies for tasks or environments never seen before. Furthermore, the generated new policies outperform the policy learned using the entire data collected from past environments, suggesting that the correlation among different environments can be exploited and irrelevant ones can be filtered out.Comment: Accepted at IEEE Intelligent Vehicles Symposium (IV) 202

arXiv.org e-Print Archive