Search CORE

214 research outputs found

Measuring collaborative emergent behavior in multi-agent reinforcement learning

Author: E Rovira
G Klein
L Matignon
R Parasuraman
V Mnih
Publication venue
Publication date: 23/07/2018
Field of study

Multi-agent reinforcement learning (RL) has important implications for the future of human-agent teaming. We show that improved performance with multi-agent RL is not a guarantee of the collaborative behavior thought to be important for solving multi-agent tasks. To address this, we present a novel approach for quantitatively assessing collaboration in continuous spatial tasks with multi-agent RL. Such a metric is useful for measuring collaboration between computational agents and may serve as a training signal for collaboration in future RL paradigms involving humans.Comment: 1st International Conference on Human Systems Engineering and Design, 6 pages, 2 figures, 1 tabl

arXiv.org e-Print Archive

Crossref

Local Communication Protocols for Learning Complex Swarm Behaviors with Deep Reinforcement Learning

Author: A Martinoli
C Kube
C Moeslinger
F Arvin
FA Oliehoek
J Foerster
JK Gupta
L Bayındır
N Correll
P Basu
S Nouyan
V Mnih
Publication venue
Publication date: 01/01/2018
Field of study

Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. While it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building and building a communication link. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.Comment: 13 pages, 4 figures, version 2, accepted at ANTS 201

arXiv.org e-Print Archive

TUbiblio

Crossref

Self-Modification of Policy and Utility Function in Rational Agents

Author: B Hibbard
D Dewey
D Silver
J Schmidhuber
L Orseau
L Orseau
L Orseau
LP Kaelbling
M Hutter
M Hutter
M Ring
N Bostrom
R Sutton
RV Yampolskiy
S Legg
V Mnih
Publication venue
Publication date: 10/05/2016
Field of study

Any agent that is part of the environment it interacts with and has versatile actuators (such as arms and fingers), will in principle have the ability to self-modify -- for example by changing its own source code. As we continue to create more and more intelligent agents, chances increase that they will learn about this ability. The question is: will they want to use it? For example, highly intelligent systems may find ways to change their goals to something more easily achievable, thereby `escaping' the control of their designers. In an important paper, Omohundro (2008) argued that goal preservation is a fundamental drive of any intelligent system, since a goal is more likely to be achieved if future versions of the agent strive towards the same goal. In this paper, we formalise this argument in general reinforcement learning, and explore situations where it fails. Our conclusion is that the self-modification possibility is harmless if and only if the value function of the agent anticipates the consequences of self-modifications and use the current utility function when evaluating the future.Comment: Artificial General Intelligence (AGI) 201

arXiv.org e-Print Archive

Crossref

The Australian National University

Deep Reinforcement Learning for Surgical Gesture Segmentation and Classification

Author: B Varadarajan
C Lea
C Lea
D Silver
FC Ghesu
L Tao
L Tao
N Ahmidi
R DiPietro
RS Sutton
V Mnih
Publication venue
Publication date: 21/06/2018
Field of study

Recognition of surgical gesture is crucial for surgical skill assessment and efficient surgery training. Prior works on this task are based on either variant graphical models such as HMMs and CRFs, or deep learning models such as Recurrent Neural Networks and Temporal Convolutional Networks. Most of the current approaches usually suffer from over-segmentation and therefore low segment-level edit scores. In contrast, we present an essentially different methodology by modeling the task as a sequential decision-making process. An intelligent agent is trained using reinforcement learning with hierarchical features from a deep model. Temporal consistency is integrated into our action design and reward mechanism to reduce over-segmentation errors. Experiments on JIGSAWS dataset demonstrate that the proposed method performs better than state-of-the-art methods in terms of the edit score and on par in frame-wise accuracy. Our code will be released later.Comment: 8 pages, 2 figures, accepted for MICCAI 201

arXiv.org e-Print Archive

Crossref

The basic laparoscopic skills long-term survival: new prediction scale

Author: Artyomenko V.
Chumak Z.
Kozhukhar A.
Mnih L.
Nastradina N.
Shapoval N.
Publication venue
Publication date: 01/01/2019
Field of study

Introduction: The most important principle of pedagogy, including medical, is to correctly assess the knowledge and skills acquisition since they must be restored during the study and retained for use in further professional practice. Knowledge and skills in time survival are of particular importance in medicine since its determination allows to reveal the necessary time for repeated training and evaluate in general the medical education system efficiency. Objective: To develop a scale for predicting basic laparoscopy skills long-term survival in the medical education system. Materials and methods: The training results and assessment of 48 Odessa National Medical University medical students have been studied with the use of basic laparoscopy skills module of 3-D laparoscopy simulator trained as part of this module at the 5th year, forming a comparative group - CG (for obtaining initial mathematical prediction indicators), repeated it at the 6th year - forming the main group - MG (this group for the knowledge survival calculations and for reaching the main goal). They have passed all the module tasks, at least 10 trainings per module with the number of repetitions from 1 to 4. Time was recorded for the practical skills, security parameters, visual-motor coordination, the selection and release of devices, pedaling, diathermy, aspiration, irrigation, with a video camera viewing angle of 30° and 0°. The definition of the initial (1st training), intermediate (5th training) and the final (10th training) level of the CG and the MG students practical skills during two years of training, forming 6 groups accordingly, using the self-assessment coefficient (SAC) of the trainee, developed by us on the questionnaires and the practical skills coefficient (PSC) based on the evaluation sheets was determined, summarized and generalized for each group. The Lykert 6-point scale (0-5) of competence levels total assessments were used

Crossref

Odessa National Medical University Institutional Repository

Learning to Selectively Transfer: Reinforced Transfer Learning for Deep Text Matching

Author: Arulkumaran K.
Bahdanau D.
Chen M.
Fan Y.
Feng J.
Huang J.
III.
Khot T.
Kingma D. P.
Mnih V.
Mnih V.
Patel Y.
Puterman M. L.
Rummery G. A.
Shen J.
Silver D.
Socher IR.
Wang S.
Yang Z.
Yin NW.
Yosinski J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/12/2018
Field of study

Deep text matching approaches have been widely studied for many applications including question answering and information retrieval systems. To deal with a domain that has insufficient labeled data, these approaches can be used in a Transfer Learning (TL) setting to leverage labeled data from a resource-rich source domain. To achieve better performance, source domain data selection is essential in this process to prevent the "negative transfer" problem. However, the emerging deep transfer models do not fit well with most existing data selection methods, because the data selection policy and the transfer learning model are not jointly trained, leading to sub-optimal training efficiency. In this paper, we propose a novel reinforced data selector to select high-quality source domain data to help the TL model. Specifically, the data selector "acts" on the source domain data to find a subset for optimization of the TL model, and the performance of the TL model can provide "rewards" in turn to update the selector. We build the reinforced data selector based on the actor-critic framework and integrate it to a DNN based transfer learning model, resulting in a Reinforced Transfer Learning (RTL) method. We perform a thorough experimental evaluation on two major tasks for text matching, namely, paraphrase identification and natural language inference. Experimental results show the proposed RTL can significantly improve the performance of the TL model. We further investigate different settings of states, rewards, and policy optimization methods to examine the robustness of our method. Last, we conduct a case study on the selected data and find our method is able to select source domain data whose Wasserstein distance is close to the target domain data. This is reasonable and intuitive as such source domain data can provide more transferability power to the model.Comment: Accepted to WSDM 201

arXiv.org e-Print Archive

Crossref

Innovative methods efficiency in obstetricians-gynecologists’ postgraduate education

Author: Artyomenko V.
Chumak Z.
Kozhukhar A.
Mnih L.
Nastradina N.
Shapoval N.
Publication venue
Publication date: 01/01/2019
Field of study

Introduction: Important for the Ukraine was the creation of the new reformative governmental undergraduate and postgraduate medical education program. In Odessa State Medical University today it’s the main innovative guideline for the physician modern practical training especially in obstetrics and gynecology. Objective: To determine the efficiency of the innovative methods in the obstetricians-gynecologists postgraduate education. Materials and methods: The learning outcomes and assessment of 320 postgraduates of OB/GYN have been studied with the help of imitational virtual platforms for the obstetricians-gynecologists, birth simulators, virtual operation room and virtual labor room. Physicians’ average age was 39,4 ± 0,7 years; their average work experience - 12,6 ± 0,9 years. They underwent seminar (10%) and practical (90%) classes for the normal and pathological labor, obstetrical operations, urgent cases and emergencies in obstetrics with several different assessment protocols, including initial and final testing, anonymous self-assessment for their practical skills, structured check-lists with intermediate and final steps for each practical skill (from 12 to 18 points), team work with changing roles using video-monitoring and debriefing by case-study check-lists (29 positions; from 2 to 5 points for each). Results: According to the results of the final tests it was found that low initial assessment of complicated childbirth and obstetric surgery themes significantly increased one and a half times (р <0,001). After completion of the course the theoretical training was improved by one and a half times (р <0,05). Indicators of the output of the theoretical test of postgraduates with work experience of at least five years, compared with the postgraduates, the experience of more than fifteen years, were 1.3 times lower (р < 0,001). On the course completion the theoretical performance testing has increased one and a half times, and skills have increased twice. Initial assessment of the vacuum extraction newborn operation significantly increased almost twice (р <0,001). Teamwork evaluation at the end of the course has increased more than twice (р <0,001)

Crossref

Odessa National Medical University Institutional Repository

3D, haptics and virtual reality technologies implementation results in quality assessment and assurance of obstetrician-gynecologists’ hysteroscopy training

Author: Artyomenko V.
Kozhukhar A.
Mnih L.
Nastradina N.
Nosenko V.
Shapoval N.
Publication venue
Publication date: 01/01/2021
Field of study

Modern practical gynecology requires mandatory diagnostic and therapeutic hysteroscopic (HS) procedures. Their high-quality and safe performance is due to repeated trainings for mastering motor skills. This should ultimately lead to a tactile sensation and understanding of depth, fulcrum and force of impact. Simulation training is especially important, which gives effective theoretical and practical training of novice gynecologists for basic HS manipulations, the ability to conduct and evaluate their teamwork. The virtual simulator is widely used for objective assessment of clinical skills and abilities associated with the competence

Odessa National Medical University Institutional Repository

Safe Crossover of Neural Networks Through Neuron Alignment

Author: Darrell L
Glorot Xavier
Gomez Faustino
Gomez Faustino J
Goodfellow Ian J
Hotelling Harold
Mnih Volodymyr
Montana David J
Raghu Maithra
Schulman John
Wieland Alexis P
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/05/2020
Field of study

One of the main and largely unexplored challenges in evolving the weights of neural networks using genetic algorithms is to find a sensible crossover operation between parent networks. Indeed, naive crossover leads to functionally damaged offspring that do not retain information from the parents. This is because neural networks are invariant to permutations of neurons, giving rise to multiple ways of representing the same solution. This is often referred to as the competing conventions problem. In this paper, we propose a two-step safe crossover(SC) operator. First, the neurons of the parents are functionally aligned by computing how well they correlate, and only then are the parents recombined. We compare two ways of measuring relationships between neurons: Pairwise Correlation (PwC) and Canonical Correlation Analysis (CCA). We test our safe crossover operators (SC-PwC and SC-CCA) on MNIST and CIFAR-10 by performing arithmetic crossover on the weights of feed-forward neural network pairs. We show that it effectively transmits information from parents to offspring and significantly improves upon naive crossover. Our method is computationally fast,can serve as a way to explore the fitness landscape more efficiently and makes safe crossover a potentially promising operator in future neuroevolution research and applications

arXiv.org e-Print Archive

Crossref

Weighing Counts: Sequential Crowd Counting by Reinforcement Learning

Author: A Hussein
D Silver
H Idrees
H Lu
H Xiong
IH Laradji
L Liu
L Van Hove
M Riedmiller
O Vinyals
R Guerrero-Gómez-Olmedo
T Stahl
V Mnih
Publication venue
Publication date: 01/01/2020
Field of study

We formulate counting as a sequential decision problem and present a novel crowd counting model solvable by deep reinforcement learning. In contrast to existing counting models that directly output count values, we divide one-step estimation into a sequence of much easier and more tractable sub-decision problems. Such sequential decision nature corresponds exactly to a physical process in reality scale weighing. Inspired by scale weighing, we propose a novel 'counting scale' termed LibraNet where the count value is analogized by weight. By virtually placing a crowd image on one side of a scale, LibraNet (agent) sequentially learns to place appropriate weights on the other side to match the crowd count. At each step, LibraNet chooses one weight (action) from the weight box (the pre-defined action pool) according to the current crowd image features and weights placed on the scale pan (state). LibraNet is required to learn to balance the scale according to the feedback of the needle (Q values). We show that LibraNet exactly implements scale weighing by visualizing the decision process how LibraNet chooses actions. Extensive experiments demonstrate the effectiveness of our design choices and report state-of-the-art results on a few crowd counting benchmarks. We also demonstrate good cross-dataset generalization of LibraNet. Code and models are made available at: https://git.io/libranetComment: Accepted to Proc. Eur. Conf. Computer Vision (ECCV) 202

arXiv.org e-Print Archive

Crossref

Adelaide Research & Scholarship