Search CORE

28,034 research outputs found

Recommended from our members

Towards Informed Exploration for Deep Reinforcement Learning

Author: Tang Haoran
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact

eScholarship - University of California

Energy efficiency optimization in MIMO interference channels: A successive pseudoconvex approximation approach

Author: Chatzinotas Symeon
Ottersten Björn
Pesavento Marius
Yang Yang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/02/2018
Field of study

In this paper, we consider the (global and sum) energy efficiency optimization problem in downlink multi-input multi-output multi-cell systems, where all users suffer from multi-user interference. This is a challenging problem due to several reasons: 1) it is a nonconvex fractional programming problem, 2) the transmission rate functions are characterized by (complex-valued) transmit covariance matrices, and 3) the processing-related power consumption may depend on the transmission rate. We tackle this problem by the successive pseudoconvex approximation approach, and we argue that pseudoconvex optimization plays a fundamental role in designing novel iterative algorithms, not only because every locally optimal point of a pseudoconvex optimization problem is also globally optimal, but also because a descent direction is easily obtained from every optimal point of a pseudoconvex optimization problem. The proposed algorithms have the following advantages: 1) fast convergence as the structure of the original optimization problem is preserved as much as possible in the approximate problem solved in each iteration, 2) easy implementation as each approximate problem is suitable for parallel computation and its solution has a closed-form expression, and 3) guaranteed convergence to a stationary point or a Karush-Kuhn-Tucker point. The advantages of the proposed algorithm are also illustrated numerically.Comment: submitted to IEEE Transactions on Signal Processin

arXiv.org e-Print Archive

Inference-Based Strategy Alignment for General-Sum Differential Games

Author: Fridovich-Keil David
Peters Lasse
Sunberg Zachary N.
Tomlin Claire J.
Publication venue
Publication date: 06/05/2020
Field of study

In many settings where multiple agents interact, the optimal choices for each agent depend heavily on the choices of the others. These coupled interactions are well-described by a general-sum differential game, in which players have differing objectives, the state evolves in continuous time, and optimal play may be characterized by one of many equilibrium concepts, e.g., a Nash equilibrium. Often, problems admit multiple equilibria. From the perspective of a single agent in such a game, this multiplicity of solutions can introduce uncertainty about how other agents will behave. This paper proposes a general framework for resolving ambiguity between equilibria by reasoning about the equilibrium other agents are aiming for. We demonstrate this framework in simulations of a multi-player human-robot navigation problem that yields two main conclusions: First, by inferring which equilibrium humans are operating at, the robot is able to predict trajectories more accurately, and second, by discovering and aligning itself to this equilibrium the robot is able to reduce the cost for all players

arXiv.org e-Print Archive

The 1990 progress report and future plans

Author: Compton Michael
Friedland Peter
Zweben Monte
Publication venue
Publication date
Field of study

This document describes the progress and plans of the Artificial Intelligence Research Branch (RIA) at ARC in 1990. Activities span a range from basic scientific research to engineering development and to fielded NASA applications, particularly those applications that are enabled by basic research carried out at RIA. Work is conducted in-house and through collaborative partners in academia and industry. Our major focus is on a limited number of research themes with a dual commitment to technical excellence and proven applicability to NASA short, medium, and long-term problems. RIA acts as the Agency's lead organization for research aspects of artificial intelligence, working closely with a second research laboratory at JPL and AI applications groups at all NASA centers

Action Assembly: Sparse Imitation Learning for Text Based Games with Combinatorial Action Spaces

Author: Cohen Deborah
Mankowitz Daniel J.
Mannor Shie
Tessler Chen
Zahavy Tom
Publication venue
Publication date: 09/02/2020
Field of study

We propose a computationally efficient algorithm that combines compressed sensing with imitation learning to solve text-based games with combinatorial action spaces. Specifically, we introduce a new compressed sensing algorithm, named IK-OMP, which can be seen as an extension to the Orthogonal Matching Pursuit (OMP). We incorporate IK-OMP into a supervised imitation learning setting and show that the combined approach (Sparse Imitation Learning, Sparse-IL) solves the entire text-based game of Zork1 with an action space of approximately 10 million actions given both perfect and noisy demonstrations.Comment: Under review at IJCAI 202

arXiv.org e-Print Archive

Multi-Task Generative Adversarial Nets with Shared Memory for Cross-Domain Coordination Control

Author: Duan ShiHui
Shi YouKang
Thomas Ian
Wang JunPing
Zhang WenSheng
Publication venue
Publication date: 01/07/2018
Field of study

Generating sequential decision process from huge amounts of measured process data is a future research direction for collaborative factory automation, making full use of those online or offline process data to directly design flexible make decisions policy, and evaluate performance. The key challenges for the sequential decision process is to online generate sequential decision-making policy directly, and transferring knowledge across tasks domain. Most multi-task policy generating algorithms often suffer from insufficient generating cross-task sharing structure at discrete-time nonlinear systems with applications. This paper proposes the multi-task generative adversarial nets with shared memory for cross-domain coordination control, which can generate sequential decision policy directly from raw sensory input of all of tasks, and online evaluate performance of system actions in discrete-time nonlinear systems. Experiments have been undertaken using a professional flexible manufacturing testbed deployed within a smart factory of Weichai Power in China. Results on three groups of discrete-time nonlinear control tasks show that our proposed model can availably improve the performance of task with the help of other related tasks

arXiv.org e-Print Archive

Application of Market Models to Network Equilibrium Problems

Author: A Hayrapetyan
A Leshem
A Nagurney
D Bertsekas
G Iosifidis
H Mine
IV Konnov
IV Konnov
IV Konnov
IV Konnov
IV Konnov
IV Konnov
J Nash
K Bredies
M Patriksson
S Dafermos
SF McCormick
Publication venue
Publication date: 13/06/2017
Field of study

We present a general two-side market model with divisible commodities and price functions of participants. A general existence result on unbounded sets is obtained from its variational inequality re-formulation. We describe an extension of the network flow equilibrium problem with elastic demands and a new equilibrium type model for resource allocation problems in wireless communication networks, which appear to be particular cases of the general market model. This enables us to obtain new existence results for these models as some adjustments of that for the market model. Under certain additional conditions the general market model can be reduced to a decomposable optimization problem where the goal function is the sum of two functions and one of them is convex separable, whereas the feasible set is the corresponding Cartesian product. We discuss some versions of the partial linearization method, which can be applied to these network equilibrium problems.Comment: 18 pages, 3 table

arXiv.org e-Print Archive

Reinforcement Learning

Author: Buffet Olivier
Pietquin Olivier
Weng Paul
Publication venue
Publication date: 13/06/2020
Field of study

Reinforcement learning (RL) is a general framework for adaptive control, which has proven to be efficient in many domains, e.g., board games, video games or autonomous vehicles. In such problems, an agent faces a sequential decision-making problem where, at every time step, it observes its state, performs an action, receives a reward and moves to a new state. An RL agent learns by trial and error a good policy (or controller) based on observations and numeric reward feedback on the previously performed action. In this chapter, we present the basic framework of RL and recall the two main families of approaches that have been developed to learn a good policy. The first one, which is value-based, consists in estimating the value of an optimal policy, value from which a policy can be recovered, while the other, called policy search, directly works in a policy space. Actor-critic methods can be seen as a policy search technique where the policy value that is learned guides the policy improvement. Besides, we give an overview of some extensions of the standard RL framework, notably when risk-averse behavior needs to be taken into account or when rewards are not available or not known.Comment: Chapter in "A Guided Tour of Artificial Intelligence Research", Springe

arXiv.org e-Print Archive

MIST: Missing Person Intelligence Synthesis Toolkit

Author: Alvari Hamidreza
Shaabani Elham
Shakarian Paulo
Snyder J. E. Kelly
Publication venue
Publication date: 29/08/2016
Field of study

Each day, approximately 500 missing persons cases occur that go unsolved/unresolved in the United States. The non-profit organization known as the Find Me Group (FMG), led by former law enforcement professionals, is dedicated to solving or resolving these cases. This paper introduces the Missing Person Intelligence Synthesis Toolkit (MIST) which leverages a data-driven variant of geospatial abductive inference. This system takes search locations provided by a group of experts and rank-orders them based on the probability assigned to areas based on the prior performance of the experts taken as a group. We evaluate our approach compared to the current practices employed by the Find Me Group and found it significantly reduces the search area - leading to a reduction of 31 square miles over 24 cases we examined in our experiments. Currently, we are using MIST to aid the Find Me Group in an active missing person case.Comment: 10 pages, 12 figures, Accepted in CIKM 201

arXiv.org e-Print Archive

On the use of biased-randomized algorithms for solving non-smooth optimization problems

Author: Ferrer Biosca Albert
Gunes Corlu Canan
Juan Pérez Ángel Alejandro
Tordecilla Madera Rafael David
Torre Martínez Rocío de la
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

Soft constraints are quite common in real-life applications. For example, in freight transportation, the fleet size can be enlarged by outsourcing part of the distribution service and some deliveries to customers can be postponed as well; in inventory management, it is possible to consider stock-outs generated by unexpected demands; and in manufacturing processes and project management, it is frequent that some deadlines cannot be met due to delays in critical steps of the supply chain. However, capacity-, size-, and time-related limitations are included in many optimization problems as hard constraints, while it would be usually more realistic to consider them as soft ones, i.e., they can be violated to some extent by incurring a penalty cost. Most of the times, this penalty cost will be nonlinear and even noncontinuous, which might transform the objective function into a non-smooth one. Despite its many practical applications, non-smooth optimization problems are quite challenging, especially when the underlying optimization problem is NP-hard in nature. In this paper, we propose the use of biased-randomized algorithms as an effective methodology to cope with NP-hard and non-smooth optimization problems in many practical applications. Biased-randomized algorithms extend constructive heuristics by introducing a nonuniform randomization pattern into them. Hence, they can be used to explore promising areas of the solution space without the limitations of gradient-based approaches, which assume the existence of smooth objective functions. Moreover, biased-randomized algorithms can be easily parallelized, thus employing short computing times while exploring a large number of promising regions. This paper discusses these concepts in detail, reviews existing work in different application areas, and highlights current trends and open research lines

The Oberta in open access