1,122 research outputs found
Recommended from our members
Intelligent and High-Performance Behavior Design of Autonomous Systems via Learning, Optimization and Control
Nowadays, great societal demands have rapidly boosted the development of autonomous systems that densely interact with humans in many application domains, from manufacturing to transportation and from workplaces to daily lives. The shift from isolated working environments to human-dominated space requires autonomous systems to be empowered to handle not only environmental uncertainties such as external vibrations but also interaction uncertainties arising from human behavior which is in nature probabilistic, causal but not strictly rational, internally hierarchical and socially compliant.This dissertation is concerned with the design of intelligent and high-performance behavior of such autonomous systems, leveraging the strength from control, optimization, learning, and cognitive science. The work consists of two parts. In Part I, the problem of high-level hybrid human-machine behavior design is addressed. The goal is to achieve safe, efficient and human-like interaction with people. A framework based on the theory of mind, utility theories and imitation learning is proposed to efficiently represent and learn the complicated behavior of humans. Built upon that, machine behaviors at three different levels - the perceptual level, the reasoning level, and the action level - are designed via imitation learning, optimization, and online adaptation, allowing the system to interpret, reason and behave as human, particularly when a variety of uncertainties exist. Applications to autonomous driving are considered throughout Part I. Part II is concerned with the design of high-performance low-level individual machine behavior in the presence of model uncertainties and external disturbances. Advanced control laws based on adaptation, iterative learning and the internal structures of uncertainties/disturbances are developed to assure that the high-level interactive behaviors can be reliably executed. Applications on robot manipulators and high-precision motion systems are discussed in this part
Quantum inspired algorithms for learning and control of stochastic systems
Motivated by the limitations of the current reinforcement learning and optimal control techniques, this dissertation proposes quantum theory inspired algorithms for learning and control of both single-agent and multi-agent stochastic systems.
A common problem encountered in traditional reinforcement learning techniques is the exploration-exploitation trade-off. To address the above issue an action selection procedure inspired by a quantum search algorithm called Grover\u27s iteration is developed. This procedure does not require an explicit design parameter to specify the relative frequency of explorative/exploitative actions.
The second part of this dissertation extends the powerful adaptive critic design methodology to solve finite horizon stochastic optimal control problems. To numerically solve the stochastic Hamilton Jacobi Bellman equation, which characterizes the optimal expected cost function, large number of trajectory samples are required. The proposed methodology overcomes the above difficulty by using the path integral control formulation to adaptively sample trajectories of importance.
The third part of this dissertation presents two quantum inspired coordination models to dynamically assign targets to agents operating in a stochastic environment. The first approach uses a quantum decision theory model that explains irrational action choices in human decision making. The second approach uses a quantum game theory model that exploits the quantum mechanical phenomena \u27entanglement\u27 to increase individual pay-off in multi-player games. The efficiency and scalability of the proposed coordination models are demonstrated through simulations of a large scale multi-agent system --Abstract, page iii
Is Behavioral Economics Doomed?
It is fashionable to criticize economic theory for focusing too much on rationality and ignoring the imperfect and emotional way in which real economic decisions are reached. All of us facing the global economic crisis wonder just how rational economic men and women can be. Behavioral economics—an effort to incorporate psychological ideas into economics—has become all the rage. This book by well-known economist David K. Levine questions the idea that behavioral economics is the answer to economic problems. It explores the successes and failures of contemporary economics both inside and outside the laboratory. It then asks whether popular behavioral theories of psychological biases are solutions to the failures. It not only provides an overview of popular behavioral theories and their history, but also gives the reader the tools for scrutinizing them. Levine’s book is essential reading for students and teachers of economic theory and anyone interested in the psychology of economics
The Hanabi Challenge: A New Frontier for AI Research
From the early days of computing, games have been important testbeds for
studying how well machines can do sophisticated decision making. In recent
years, machine learning has made dramatic advances with artificial agents
reaching superhuman performance in challenge domains like Go, Atari, and some
variants of poker. As with their predecessors of chess, checkers, and
backgammon, these game domains have driven research by providing sophisticated
yet well-defined challenges for artificial intelligence practitioners. We
continue this tradition by proposing the game of Hanabi as a new challenge
domain with novel problems that arise from its combination of purely
cooperative gameplay with two to five players and imperfect information. In
particular, we argue that Hanabi elevates reasoning about the beliefs and
intentions of other agents to the foreground. We believe developing novel
techniques for such theory of mind reasoning will not only be crucial for
success in Hanabi, but also in broader collaborative efforts, especially those
with human partners. To facilitate future research, we introduce the
open-source Hanabi Learning Environment, propose an experimental framework for
the research community to evaluate algorithmic advances, and assess the
performance of current state-of-the-art techniques.Comment: 32 pages, 5 figures, In Press (Artificial Intelligence
Recommended from our members
Game-Theoretic Safety Assurance for Human-Centered Robotic Systems
In order for autonomous systems like robots, drones, and self-driving cars to be reliably introduced into our society, they must have the ability to actively account for safety during their operation. While safety analysis has traditionally been conducted offline for controlled environments like cages on factory floors, the much higher complexity of open, human-populated spaces like our homes, cities, and roads makes it unviable to rely on common design-time assumptions, since these may be violated once the system is deployed. Instead, the next generation of robotic technologies will need to reason about safety online, constructing high-confidence assurances informed by ongoing observations of the environment and other agents, in spite of models of them being necessarily fallible.This dissertation aims to lay down the necessary foundations to enable autonomous systems to ensure their own safety in complex, changing, and uncertain environments, by explicitly reasoning about the gap between their models and the real world. It first introduces a suite of novel robust optimal control formulations and algorithmic tools that permit tractable safety analysis in time-varying, multi-agent systems, as well as safe real-time robotic navigation in partially unknown environments; these approaches are demonstrated on large-scale unmanned air traffic simulation and physical quadrotor platforms. After this, it draws on Bayesian machine learning methods to translate model-based guarantees into high-confidence assurances, monitoring the reliability of predictive models in light of changing evidence about the physical system and surrounding agents. This principle is first applied to a general safety framework allowing the use of learning-based control (e.g. reinforcement learning) for safety-critical robotic systems such as drones, and then combined with insights from cognitive science and dynamic game theory to enable safe human-centered navigation and interaction; these techniques are showcased on physical quadrotors—flying in unmodeled wind and among human pedestrians—and simulated highway driving. The dissertation ends with a discussion of challenges and opportunities ahead, including the bridging of safety analysis and reinforcement learning and the need to ``close the loop'' around learning and adaptation in order to deploy increasingly advanced autonomous systems with confidence
When Humans Aren't Optimal: Robots that Collaborate with Risk-Aware Humans
In order to collaborate safely and efficiently, robots need to anticipate how
their human partners will behave. Some of today's robots model humans as if
they were also robots, and assume users are always optimal. Other robots
account for human limitations, and relax this assumption so that the human is
noisily rational. Both of these models make sense when the human receives
deterministic rewards: i.e., gaining either 130 with certainty. But in
real world scenarios, rewards are rarely deterministic. Instead, we must make
choices subject to risk and uncertainty--and in these settings, humans exhibit
a cognitive bias towards suboptimal behavior. For example, when deciding
between gaining 130 only 80% of the time, people tend
to make the risk-averse choice--even though it leads to a lower expected gain!
In this paper, we adopt a well-known Risk-Aware human model from behavioral
economics called Cumulative Prospect Theory and enable robots to leverage this
model during human-robot interaction (HRI). In our user studies, we offer
supporting evidence that the Risk-Aware model more accurately predicts
suboptimal human behavior. We find that this increased modeling accuracy
results in safer and more efficient human-robot collaboration. Overall, we
extend existing rational human models so that collaborative robots can
anticipate and plan around suboptimal human behavior during HRI.Comment: ACM/IEEE International Conference on Human-Robot Interactio
Generalized asset integrity games
Generalized assets represent a class of multi-scale adaptive state-transition systems with domain-oblivious performance criteria. The governance of such assets must proceed without exact specifications, objectives, or constraints. Decision making must rapidly scale in the presence of uncertainty, complexity, and intelligent adversaries.
This thesis formulates an architecture for generalized asset planning. Assets are modelled as dynamical graph structures which admit topological performance indicators, such as dependability, resilience, and efficiency. These metrics are used to construct robust model configurations. A normalized compression distance (NCD) is computed between a given active/live asset model and a reference configuration to produce an integrity score. The utility derived from the asset is monotonically proportional to this integrity score, which represents the proximity to ideal conditions. The present work considers the situation between an asset manager and an intelligent adversary, who act within a stochastic environment to control the integrity state of the asset. A generalized asset integrity game engine (GAIGE) is developed, which implements anytime algorithms to solve a stochastically perturbed two-player zero-sum game. The resulting planning strategies seek to stabilize deviations from minimax trajectories of the integrity score.
Results demonstrate the performance and scalability of the GAIGE. This approach represents a first-step towards domain-oblivious architectures for complex asset governance and anytime planning
Intrinsic fluctuations of reinforcement learning promote cooperation
In this work, we ask for and answer what makes classical reinforcement
learning cooperative. Cooperating in social dilemma situations is vital for
animals, humans, and machines. While evolutionary theory revealed a range of
mechanisms promoting cooperation, the conditions under which agents learn to
cooperate are contested. Here, we demonstrate which and how individual elements
of the multi-agent learning setting lead to cooperation. Specifically, we
consider the widely used temporal-difference reinforcement learning algorithm
with epsilon-greedy exploration in the classic environment of an iterated
Prisoner's dilemma with one-period memory. Each of the two learning agents
learns a strategy that conditions the following action choices on both agents'
action choices of the last round. We find that next to a high caring for future
rewards, a low exploration rate, and a small learning rate, it is primarily
intrinsic stochastic fluctuations of the reinforcement learning process which
double the final rate of cooperation to up to 80\%. Thus, inherent noise is not
a necessary evil of the iterative learning process. It is a critical asset for
the learning of cooperation. However, we also point out the trade-off between a
high likelihood of cooperative behavior and achieving this in a reasonable
amount of time. Our findings are relevant for purposefully designing
cooperative algorithms and regulating undesired collusive effects.Comment: 9 pages, 4 figure
Towards Safe Artificial General Intelligence
The field of artificial intelligence has recently experienced a
number of breakthroughs thanks to progress in deep learning and
reinforcement learning. Computer algorithms now outperform humans
at Go, Jeopardy, image classification, and lip reading, and are
becoming very competent at driving cars and interpreting natural
language. The rapid development has led many to conjecture that
artificial intelligence with greater-than-human ability on a wide
range of tasks may not be far. This in turn raises concerns
whether we know how to control such systems, in case we were to
successfully build them.
Indeed, if humanity would find itself in conflict with a system
of much greater intelligence than itself, then human society
would likely lose. One way to make sure we avoid such a conflict
is to ensure that any future AI system with potentially
greater-than-human-intelligence has goals that are aligned with
the goals of the rest of humanity. For example, it should not
wish to kill humans or steal their resources.
The main focus of this thesis will therefore be goal alignment,
i.e. how to design artificially intelligent agents with goals
coinciding with the goals of their designers. Focus will mainly
be directed towards variants of reinforcement learning, as
reinforcement learning currently seems to be the most promising
path towards powerful artificial intelligence. We identify and
categorize goal misalignment problems in reinforcement learning
agents as designed today, and give examples of how these agents
may cause catastrophes in the future. We also suggest a number of
reasonably modest modifications that can be used to avoid or
mitigate each identified misalignment problem. Finally, we also
study various choices of decision algorithms, and conditions for
when a powerful reinforcement learning system will permit us to
shut it down.
The central conclusion is that while reinforcement learning
systems as designed today are inherently unsafe to scale to human
levels of intelligence, there are ways to potentially address
many of these issues without straying too far from the currently
so successful reinforcement learning paradigm. Much work remains
in turning the high-level proposals suggested in this thesis into
practical algorithms, however
- …