28,034 research outputs found
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
Energy efficiency optimization in MIMO interference channels: A successive pseudoconvex approximation approach
In this paper, we consider the (global and sum) energy efficiency
optimization problem in downlink multi-input multi-output multi-cell systems,
where all users suffer from multi-user interference. This is a challenging
problem due to several reasons: 1) it is a nonconvex fractional programming
problem, 2) the transmission rate functions are characterized by
(complex-valued) transmit covariance matrices, and 3) the processing-related
power consumption may depend on the transmission rate. We tackle this problem
by the successive pseudoconvex approximation approach, and we argue that
pseudoconvex optimization plays a fundamental role in designing novel iterative
algorithms, not only because every locally optimal point of a pseudoconvex
optimization problem is also globally optimal, but also because a descent
direction is easily obtained from every optimal point of a pseudoconvex
optimization problem. The proposed algorithms have the following advantages: 1)
fast convergence as the structure of the original optimization problem is
preserved as much as possible in the approximate problem solved in each
iteration, 2) easy implementation as each approximate problem is suitable for
parallel computation and its solution has a closed-form expression, and 3)
guaranteed convergence to a stationary point or a Karush-Kuhn-Tucker point. The
advantages of the proposed algorithm are also illustrated numerically.Comment: submitted to IEEE Transactions on Signal Processin
Inference-Based Strategy Alignment for General-Sum Differential Games
In many settings where multiple agents interact, the optimal choices for each
agent depend heavily on the choices of the others. These coupled interactions
are well-described by a general-sum differential game, in which players have
differing objectives, the state evolves in continuous time, and optimal play
may be characterized by one of many equilibrium concepts, e.g., a Nash
equilibrium. Often, problems admit multiple equilibria. From the perspective of
a single agent in such a game, this multiplicity of solutions can introduce
uncertainty about how other agents will behave. This paper proposes a general
framework for resolving ambiguity between equilibria by reasoning about the
equilibrium other agents are aiming for. We demonstrate this framework in
simulations of a multi-player human-robot navigation problem that yields two
main conclusions: First, by inferring which equilibrium humans are operating
at, the robot is able to predict trajectories more accurately, and second, by
discovering and aligning itself to this equilibrium the robot is able to reduce
the cost for all players
The 1990 progress report and future plans
This document describes the progress and plans of the Artificial Intelligence Research Branch (RIA) at ARC in 1990. Activities span a range from basic scientific research to engineering development and to fielded NASA applications, particularly those applications that are enabled by basic research carried out at RIA. Work is conducted in-house and through collaborative partners in academia and industry. Our major focus is on a limited number of research themes with a dual commitment to technical excellence and proven applicability to NASA short, medium, and long-term problems. RIA acts as the Agency's lead organization for research aspects of artificial intelligence, working closely with a second research laboratory at JPL and AI applications groups at all NASA centers
Action Assembly: Sparse Imitation Learning for Text Based Games with Combinatorial Action Spaces
We propose a computationally efficient algorithm that combines compressed
sensing with imitation learning to solve text-based games with combinatorial
action spaces. Specifically, we introduce a new compressed sensing algorithm,
named IK-OMP, which can be seen as an extension to the Orthogonal Matching
Pursuit (OMP). We incorporate IK-OMP into a supervised imitation learning
setting and show that the combined approach (Sparse Imitation Learning,
Sparse-IL) solves the entire text-based game of Zork1 with an action space of
approximately 10 million actions given both perfect and noisy demonstrations.Comment: Under review at IJCAI 202
Multi-Task Generative Adversarial Nets with Shared Memory for Cross-Domain Coordination Control
Generating sequential decision process from huge amounts of measured process
data is a future research direction for collaborative factory automation,
making full use of those online or offline process data to directly design
flexible make decisions policy, and evaluate performance. The key challenges
for the sequential decision process is to online generate sequential
decision-making policy directly, and transferring knowledge across tasks
domain. Most multi-task policy generating algorithms often suffer from
insufficient generating cross-task sharing structure at discrete-time nonlinear
systems with applications. This paper proposes the multi-task generative
adversarial nets with shared memory for cross-domain coordination control,
which can generate sequential decision policy directly from raw sensory input
of all of tasks, and online evaluate performance of system actions in
discrete-time nonlinear systems. Experiments have been undertaken using a
professional flexible manufacturing testbed deployed within a smart factory of
Weichai Power in China. Results on three groups of discrete-time nonlinear
control tasks show that our proposed model can availably improve the
performance of task with the help of other related tasks
Application of Market Models to Network Equilibrium Problems
We present a general two-side market model with divisible commodities and
price functions of participants. A general existence result on unbounded sets
is obtained from its variational inequality re-formulation. We describe an
extension of the network flow equilibrium problem with elastic demands and a
new equilibrium type model for resource allocation problems in wireless
communication networks, which appear to be particular cases of the general
market model. This enables us to obtain new existence results for these models
as some adjustments of that for the market model. Under certain additional
conditions the general market model can be reduced to a decomposable
optimization problem where the goal function is the sum of two functions and
one of them is convex separable, whereas the feasible set is the corresponding
Cartesian product. We discuss some versions of the partial linearization
method, which can be applied to these network equilibrium problems.Comment: 18 pages, 3 table
Reinforcement Learning
Reinforcement learning (RL) is a general framework for adaptive control,
which has proven to be efficient in many domains, e.g., board games, video
games or autonomous vehicles. In such problems, an agent faces a sequential
decision-making problem where, at every time step, it observes its state,
performs an action, receives a reward and moves to a new state. An RL agent
learns by trial and error a good policy (or controller) based on observations
and numeric reward feedback on the previously performed action. In this
chapter, we present the basic framework of RL and recall the two main families
of approaches that have been developed to learn a good policy. The first one,
which is value-based, consists in estimating the value of an optimal policy,
value from which a policy can be recovered, while the other, called policy
search, directly works in a policy space. Actor-critic methods can be seen as a
policy search technique where the policy value that is learned guides the
policy improvement. Besides, we give an overview of some extensions of the
standard RL framework, notably when risk-averse behavior needs to be taken into
account or when rewards are not available or not known.Comment: Chapter in "A Guided Tour of Artificial Intelligence Research",
Springe
MIST: Missing Person Intelligence Synthesis Toolkit
Each day, approximately 500 missing persons cases occur that go
unsolved/unresolved in the United States. The non-profit organization known as
the Find Me Group (FMG), led by former law enforcement professionals, is
dedicated to solving or resolving these cases. This paper introduces the
Missing Person Intelligence Synthesis Toolkit (MIST) which leverages a
data-driven variant of geospatial abductive inference. This system takes search
locations provided by a group of experts and rank-orders them based on the
probability assigned to areas based on the prior performance of the experts
taken as a group. We evaluate our approach compared to the current practices
employed by the Find Me Group and found it significantly reduces the search
area - leading to a reduction of 31 square miles over 24 cases we examined in
our experiments. Currently, we are using MIST to aid the Find Me Group in an
active missing person case.Comment: 10 pages, 12 figures, Accepted in CIKM 201
On the use of biased-randomized algorithms for solving non-smooth optimization problems
Soft constraints are quite common in real-life applications. For example, in freight transportation, the fleet size can be enlarged by outsourcing part of the distribution service and some deliveries to customers can be postponed as well; in inventory management, it is possible to consider stock-outs generated by unexpected demands; and in manufacturing processes and project management, it is frequent that some deadlines cannot be met due to delays in critical steps of the supply chain. However, capacity-, size-, and time-related limitations are included in many optimization problems as hard constraints, while it would be usually more realistic to consider them as soft ones, i.e., they can be violated to some extent by incurring a penalty cost. Most of the times, this penalty cost will be nonlinear and even noncontinuous, which might transform the objective function into a non-smooth one. Despite its many practical applications, non-smooth optimization problems are quite challenging, especially when the underlying optimization problem is NP-hard in nature. In this paper, we propose the use of biased-randomized algorithms as an effective methodology to cope with NP-hard and non-smooth optimization problems in many practical applications. Biased-randomized algorithms extend constructive heuristics by introducing a nonuniform randomization pattern into them. Hence, they can be used to explore promising areas of the solution space without the limitations of gradient-based approaches, which assume the existence of smooth objective functions. Moreover, biased-randomized algorithms can be easily parallelized, thus employing short computing times while exploring a large number of promising regions. This paper discusses these concepts in detail, reviews existing work in different application areas, and highlights current trends and open research lines
- …