11 research outputs found
Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning Systems
Recent successes combine reinforcement learning algorithms and deep neural
networks, despite reinforcement learning not being widely applied to robotics
and real world scenarios. This can be attributed to the fact that current
state-of-the-art, end-to-end reinforcement learning approaches still require
thousands or millions of data samples to converge to a satisfactory policy and
are subject to catastrophic failures during training. Conversely, in real world
scenarios and after just a few data samples, humans are able to either provide
demonstrations of the task, intervene to prevent catastrophic actions, or
simply evaluate if the policy is performing correctly. This research
investigates how to integrate these human interaction modalities to the
reinforcement learning loop, increasing sample efficiency and enabling
real-time reinforcement learning in robotics and real world scenarios. This
novel theoretical foundation is called Cycle-of-Learning, a reference to how
different human interaction modalities, namely, task demonstration,
intervention, and evaluation, are cycled and combined to reinforcement learning
algorithms. Results presented in this work show that the reward signal that is
learned based upon human interaction accelerates the rate of learning of
reinforcement learning algorithms and that learning from a combination of human
demonstrations and interventions is faster and more sample efficient when
compared to traditional supervised learning algorithms. Finally,
Cycle-of-Learning develops an effective transition between policies learned
using human demonstrations and interventions to reinforcement learning. The
theoretical foundation developed by this research opens new research paths to
human-agent teaming scenarios where autonomous agents are able to learn from
human teammates and adapt to mission performance metrics in real-time and in
real world scenarios.Comment: PhD thesis, Aerospace Engineering, Texas A&M (2020). For more
information, see https://vggoecks.com
COA-GPT: Generative Pre-trained Transformers for Accelerated Course of Action Development in Military Operations
The development of Courses of Action (COAs) in military operations is
traditionally a time-consuming and intricate process. Addressing this
challenge, this study introduces COA-GPT, a novel algorithm employing Large
Language Models (LLMs) for rapid and efficient generation of valid COAs.
COA-GPT incorporates military doctrine and domain expertise to LLMs through
in-context learning, allowing commanders to input mission information - in both
text and image formats - and receive strategically aligned COAs for review and
approval. Uniquely, COA-GPT not only accelerates COA development, producing
initial COAs within seconds, but also facilitates real-time refinement based on
commander feedback. This work evaluates COA-GPT in a military-relevant scenario
within a militarized version of the StarCraft II game, comparing its
performance against state-of-the-art reinforcement learning algorithms. Our
results demonstrate COA-GPT's superiority in generating strategically sound
COAs more swiftly, with added benefits of enhanced adaptability and alignment
with commander intentions. COA-GPT's capability to rapidly adapt and update
COAs during missions presents a transformative potential for military planning,
particularly in addressing planning discrepancies and capitalizing on emergent
windows of opportunities.Comment: Accepted at the NATO Science and Technology Organization Symposium
(ICMCIS) organized by the Information Systems Technology (IST) Panel,
IST-205-RSY - the ICMCIS, held in Koblenz, Germany, 23-24 April 202
Efficiently Combining Human Demonstrations and Interventions for Safe Training of Autonomous Systems in Real-Time
This paper investigates how to utilize different forms of human interaction
to safely train autonomous systems in real-time by learning from both human
demonstrations and interventions. We implement two components of the
Cycle-of-Learning for Autonomous Systems, which is our framework for combining
multiple modalities of human interaction. The current effort employs human
demonstrations to teach a desired behavior via imitation learning, then
leverages intervention data to correct for undesired behaviors produced by the
imitation learner to teach novel tasks to an autonomous agent safely, after
only minutes of training. We demonstrate this method in an autonomous perching
task using a quadrotor with continuous roll, pitch, yaw, and throttle commands
and imagery captured from a downward-facing camera in a high-fidelity simulated
environment. Our method improves task completion performance for the same
amount of human interaction when compared to learning from demonstrations
alone, while also requiring on average 32% less data to achieve that
performance. This provides evidence that combining multiple modes of human
interaction can increase both the training speed and overall performance of
policies for autonomous systems.Comment: 9 pages, 6 figure
Re-Envisioning Command and Control
Future warfare will require Command and Control (C2) decision-making to occur
in more complex, fast-paced, ill-structured, and demanding conditions. C2 will
be further complicated by operational challenges such as Denied, Degraded,
Intermittent, and Limited (DDIL) communications and the need to account for
many data streams, potentially across multiple domains of operation. Yet,
current C2 practices -- which stem from the industrial era rather than the
emerging intelligence era -- are linear and time-consuming. Critically, these
approaches may fail to maintain overmatch against adversaries on the future
battlefield. To address these challenges, we propose a vision for future C2
based on robust partnerships between humans and artificial intelligence (AI)
systems. This future vision is encapsulated in three operational impacts:
streamlining the C2 operations process, maintaining unity of effort, and
developing adaptive collective knowledge systems. This paper illustrates the
envisaged future C2 capabilities, discusses the assumptions that shaped them,
and describes how the proposed developments could transform C2 in future
warfare.Comment: Accepted at the NATO Science and Technology Organization Symposium
(ICMCIS) organized by the Information Systems Technology (IST) Panel,
IST-205-RSY - the ICMCIS, held in Koblenz, Germany, 23-24 April 202
DIP-RL: Demonstration-Inferred Preference Learning in Minecraft
In machine learning for sequential decision-making, an algorithmic agent
learns to interact with an environment while receiving feedback in the form of
a reward signal. However, in many unstructured real-world settings, such a
reward signal is unknown and humans cannot reliably craft a reward signal that
correctly captures desired behavior. To solve tasks in such unstructured and
open-ended environments, we present Demonstration-Inferred Preference
Reinforcement Learning (DIP-RL), an algorithm that leverages human
demonstrations in three distinct ways, including training an autoencoder,
seeding reinforcement learning (RL) training batches with demonstration data,
and inferring preferences over behaviors to learn a reward function to guide
RL. We evaluate DIP-RL in a tree-chopping task in Minecraft. Results suggest
that the method can guide an RL agent to learn a reward function that reflects
human preferences and that DIP-RL performs competitively relative to baselines.
DIP-RL is inspired by our previous work on combining demonstrations and
pairwise preferences in Minecraft, which was awarded a research prize at the
2022 NeurIPS MineRL BASALT competition, Learning from Human Feedback in
Minecraft. Example trajectory rollouts of DIP-RL and baselines are located at
https://sites.google.com/view/dip-rl.Comment: Paper accepted at The Many Facets of Preference Learning Workshop at
the International Conference on Machine Learning (ICML), Honolulu, Hawaii,
USA, 202
Scalable Interactive Machine Learning for Future Command and Control
Future warfare will require Command and Control (C2) personnel to make
decisions at shrinking timescales in complex and potentially ill-defined
situations. Given the need for robust decision-making processes and
decision-support tools, integration of artificial and human intelligence holds
the potential to revolutionize the C2 operations process to ensure adaptability
and efficiency in rapidly changing operational environments. We propose to
leverage recent promising breakthroughs in interactive machine learning, in
which humans can cooperate with machine learning algorithms to guide machine
learning algorithm behavior. This paper identifies several gaps in
state-of-the-art science and technology that future work should address to
extend these approaches to function in complex C2 contexts. In particular, we
describe three research focus areas that together, aim to enable scalable
interactive machine learning (SIML): 1) developing human-AI interaction
algorithms to enable planning in complex, dynamic situations; 2) fostering
resilient human-AI teams through optimizing roles, configurations, and trust;
and 3) scaling algorithms and human-AI teams for flexibility across a range of
potential contexts and situations.Comment: Accepted at the NATO Science and Technology Organization Symposium
(ICMCIS) organized by the Information Systems Technology (IST) Panel,
IST-205-RSY - the ICMCIS, held in Koblenz, Germany, 23-24 April 202