102,628 research outputs found
Design and evaluation of a hybrid multi-task learning model for optimizing deep reinforcement learning agents
Driven by recent technological advancements within the artificial intelligence domain, deep learning has emerged as a promising representation learning technique. This in turn has given rise to the evolution of deep reinforcement learning that combines deep learning with reinforcement learning methods. Subsequently, performance optimization achieved by reinforcement learning intelligent agents designed with model-free based approaches were predominantly limited to systems with reinforcement learning algorithms learning single task. Such a model was found to be quite data inefficient, whenever agents needed to interact with more complex, rich data environments. This thesis introduces a hybrid multi-task learning-oriented approach for optimization of deep reinforcement learning agents operating within different but semantically similar environments with related tasks. Empirical results obtained with OpenAI Gym library-based Atari 2600 video gaming environment demonstrate that the proposed hybrid multi-task learning model is successful in addressing key challenges associated with the performance optimization of deep reinforcement learning agents
D4.2 Intelligent D-Band wireless systems and networks initial designs
This deliverable gives the results of the ARIADNE project's Task 4.2: Machine Learning based network intelligence. It presents the work conducted on various aspects of network management to deliver system level, qualitative solutions that leverage diverse machine learning techniques. The different chapters present system level, simulation and algorithmic models based on multi-agent reinforcement learning, deep reinforcement learning, learning automata for complex event forecasting, system level model for proactive handovers and resource allocation, model-driven deep learning-based channel estimation and feedbacks as well as strategies for deployment of machine learning based solutions. In short, the D4.2 provides results on promising AI and ML based methods along with their limitations and potentials that have been investigated in the ARIADNE project
DOP: Deep Optimistic Planning with Approximate Value Function Evaluation
Research on reinforcement learning has demonstrated promising results in manifold applications and domains. Still, efficiently learning effective robot behaviors is very difficult, due to unstructured scenarios, high uncertainties, and large state dimensionality (e.g. multi-agent systems or hyper-redundant robots). To alleviate this problem, we present DOP, a deep model-based reinforcement learning algorithm, which exploits action values to both (1) guide the exploration of the state space and (2) plan effective policies. Specifically, we exploit deep neural networks to learn Q-functions that are used to attack the curse of dimensionality during a Monte-Carlo tree search. Our algorithm, in fact, constructs upper confidence bounds on the learned value function to select actions optimistically. We implement and evaluate DOP on different scenarios: (1) a cooperative navigation problem, (2) a fetching task for a 7-DOF KUKA robot, and (3) a human-robot handover with a humanoid robot (both in simulation and real). The obtained results show the effectiveness of DOP in the chosen applications, where action values drive the exploration and reduce the computational demand of the planning process while achieving good performance
Integration of Reinforcement Learning Based Behavior Planning With Sampling Based Motion Planning for Automated Driving
Reinforcement learning has received high research interest for developing
planning approaches in automated driving. Most prior works consider the
end-to-end planning task that yields direct control commands and rarely deploy
their algorithm to real vehicles. In this work, we propose a method to employ a
trained deep reinforcement learning policy for dedicated high-level behavior
planning. By populating an abstract objective interface, established motion
planning algorithms can be leveraged, which derive smooth and drivable
trajectories. Given the current environment model, we propose to use a built-in
simulator to predict the traffic scene for a given horizon into the future. The
behavior of automated vehicles in mixed traffic is determined by querying the
learned policy. To the best of our knowledge, this work is the first to apply
deep reinforcement learning in this manner, and as such lacks a
state-of-the-art benchmark. Thus, we validate the proposed approach by
comparing an idealistic single-shot plan with cyclic replanning through the
learned policy. Experiments with a real testing vehicle on proving grounds
demonstrate the potential of our approach to shrink the simulation to real
world gap of deep reinforcement learning based planning approaches. Additional
simulative analyses reveal that more complex multi-agent maneuvers can be
managed by employing the cycling replanning approach.Comment: 8 pages, 10 figures, to be published in 34th IEEE Intelligent
Vehicles Symposium (IV
Efficient Meta Neural Heuristic for Multi-Objective Combinatorial Optimization
Recently, neural heuristics based on deep reinforcement learning have
exhibited promise in solving multi-objective combinatorial optimization
problems (MOCOPs). However, they are still struggling to achieve high learning
efficiency and solution quality. To tackle this issue, we propose an efficient
meta neural heuristic (EMNH), in which a meta-model is first trained and then
fine-tuned with a few steps to solve corresponding single-objective
subproblems. Specifically, for the training process, a (partial)
architecture-shared multi-task model is leveraged to achieve parallel learning
for the meta-model, so as to speed up the training; meanwhile, a scaled
symmetric sampling method with respect to the weight vectors is designed to
stabilize the training. For the fine-tuning process, an efficient hierarchical
method is proposed to systematically tackle all the subproblems. Experimental
results on the multi-objective traveling salesman problem (MOTSP),
multi-objective capacitated vehicle routing problem (MOCVRP), and
multi-objective knapsack problem (MOKP) show that, EMNH is able to outperform
the state-of-the-art neural heuristics in terms of solution quality and
learning efficiency, and yield competitive solutions to the strong traditional
heuristics while consuming much shorter time.Comment: Accepted at NeurIPS 202
- …