214 research outputs found
Measuring collaborative emergent behavior in multi-agent reinforcement learning
Multi-agent reinforcement learning (RL) has important implications for the
future of human-agent teaming. We show that improved performance with
multi-agent RL is not a guarantee of the collaborative behavior thought to be
important for solving multi-agent tasks. To address this, we present a novel
approach for quantitatively assessing collaboration in continuous spatial tasks
with multi-agent RL. Such a metric is useful for measuring collaboration
between computational agents and may serve as a training signal for
collaboration in future RL paradigms involving humans.Comment: 1st International Conference on Human Systems Engineering and Design,
6 pages, 2 figures, 1 tabl
Local Communication Protocols for Learning Complex Swarm Behaviors with Deep Reinforcement Learning
Swarm systems constitute a challenging problem for reinforcement learning
(RL) as the algorithm needs to learn decentralized control policies that can
cope with limited local sensing and communication abilities of the agents.
While it is often difficult to directly define the behavior of the agents,
simple communication protocols can be defined more easily using prior knowledge
about the given task. In this paper, we propose a number of simple
communication protocols that can be exploited by deep reinforcement learning to
find decentralized control policies in a multi-robot swarm environment. The
protocols are based on histograms that encode the local neighborhood relations
of the agents and can also transmit task-specific information, such as the
shortest distance and direction to a desired target. In our framework, we use
an adaptation of Trust Region Policy Optimization to learn complex
collaborative tasks, such as formation building and building a communication
link. We evaluate our findings in a simulated 2D-physics environment, and
compare the implications of different communication protocols.Comment: 13 pages, 4 figures, version 2, accepted at ANTS 201
Self-Modification of Policy and Utility Function in Rational Agents
Any agent that is part of the environment it interacts with and has versatile
actuators (such as arms and fingers), will in principle have the ability to
self-modify -- for example by changing its own source code. As we continue to
create more and more intelligent agents, chances increase that they will learn
about this ability. The question is: will they want to use it? For example,
highly intelligent systems may find ways to change their goals to something
more easily achievable, thereby `escaping' the control of their designers. In
an important paper, Omohundro (2008) argued that goal preservation is a
fundamental drive of any intelligent system, since a goal is more likely to be
achieved if future versions of the agent strive towards the same goal. In this
paper, we formalise this argument in general reinforcement learning, and
explore situations where it fails. Our conclusion is that the self-modification
possibility is harmless if and only if the value function of the agent
anticipates the consequences of self-modifications and use the current utility
function when evaluating the future.Comment: Artificial General Intelligence (AGI) 201
Deep Reinforcement Learning for Surgical Gesture Segmentation and Classification
Recognition of surgical gesture is crucial for surgical skill assessment and
efficient surgery training. Prior works on this task are based on either
variant graphical models such as HMMs and CRFs, or deep learning models such as
Recurrent Neural Networks and Temporal Convolutional Networks. Most of the
current approaches usually suffer from over-segmentation and therefore low
segment-level edit scores. In contrast, we present an essentially different
methodology by modeling the task as a sequential decision-making process. An
intelligent agent is trained using reinforcement learning with hierarchical
features from a deep model. Temporal consistency is integrated into our action
design and reward mechanism to reduce over-segmentation errors. Experiments on
JIGSAWS dataset demonstrate that the proposed method performs better than
state-of-the-art methods in terms of the edit score and on par in frame-wise
accuracy. Our code will be released later.Comment: 8 pages, 2 figures, accepted for MICCAI 201
The basic laparoscopic skills long-term survival: new prediction scale
Introduction: The most important principle of pedagogy, including medical, is to correctly assess the
knowledge and skills acquisition since they must be restored during the study and retained for use in
further professional practice.
Knowledge and skills in time survival are of particular importance in medicine since its determination
allows to reveal the necessary time for repeated training and evaluate in general the medical
education system efficiency.
Objective: To develop a scale for predicting basic laparoscopy skills long-term survival in the medical
education system.
Materials and methods: The training results and assessment of 48 Odessa National Medical University
medical students have been studied with the use of basic laparoscopy skills module of 3-D
laparoscopy simulator trained as part of this module at the 5th year, forming a comparative group - CG
(for obtaining initial mathematical prediction indicators), repeated it at the 6th year - forming the main
group - MG (this group for the knowledge survival calculations and for reaching the main goal). They
have passed all the module tasks, at least 10 trainings per module with the number of repetitions from
1 to 4. Time was recorded for the practical skills, security parameters, visual-motor coordination, the
selection and release of devices, pedaling, diathermy, aspiration, irrigation, with a video camera
viewing angle of 30° and 0°.
The definition of the initial (1st training), intermediate (5th training) and the final (10th training) level of
the CG and the MG students practical skills during two years of training, forming 6 groups accordingly,
using the self-assessment coefficient (SAC) of the trainee, developed by us on the questionnaires and
the practical skills coefficient (PSC) based on the evaluation sheets was determined, summarized and
generalized for each group. The Lykert 6-point scale (0-5) of competence levels total assessments
were used
Learning to Selectively Transfer: Reinforced Transfer Learning for Deep Text Matching
Deep text matching approaches have been widely studied for many applications
including question answering and information retrieval systems. To deal with a
domain that has insufficient labeled data, these approaches can be used in a
Transfer Learning (TL) setting to leverage labeled data from a resource-rich
source domain. To achieve better performance, source domain data selection is
essential in this process to prevent the "negative transfer" problem. However,
the emerging deep transfer models do not fit well with most existing data
selection methods, because the data selection policy and the transfer learning
model are not jointly trained, leading to sub-optimal training efficiency.
In this paper, we propose a novel reinforced data selector to select
high-quality source domain data to help the TL model. Specifically, the data
selector "acts" on the source domain data to find a subset for optimization of
the TL model, and the performance of the TL model can provide "rewards" in turn
to update the selector. We build the reinforced data selector based on the
actor-critic framework and integrate it to a DNN based transfer learning model,
resulting in a Reinforced Transfer Learning (RTL) method. We perform a thorough
experimental evaluation on two major tasks for text matching, namely,
paraphrase identification and natural language inference. Experimental results
show the proposed RTL can significantly improve the performance of the TL
model. We further investigate different settings of states, rewards, and policy
optimization methods to examine the robustness of our method. Last, we conduct
a case study on the selected data and find our method is able to select source
domain data whose Wasserstein distance is close to the target domain data. This
is reasonable and intuitive as such source domain data can provide more
transferability power to the model.Comment: Accepted to WSDM 201
Innovative methods efficiency in obstetricians-gynecologists’ postgraduate education
Introduction: Important for the Ukraine was the creation of the new reformative governmental
undergraduate and postgraduate medical education program. In Odessa State Medical University
today it’s the main innovative guideline for the physician modern practical training especially in
obstetrics and gynecology.
Objective: To determine the efficiency of the innovative methods in the obstetricians-gynecologists
postgraduate education.
Materials and methods: The learning outcomes and assessment of 320 postgraduates of OB/GYN
have been studied with the help of imitational virtual platforms for the obstetricians-gynecologists, birth
simulators, virtual operation room and virtual labor room. Physicians’ average age was 39,4 ± 0,7
years; their average work experience - 12,6 ± 0,9 years. They underwent seminar (10%) and practical
(90%) classes for the normal and pathological labor, obstetrical operations, urgent cases and
emergencies in obstetrics with several different assessment protocols, including initial and final
testing, anonymous self-assessment for their practical skills, structured check-lists with intermediate
and final steps for each practical skill (from 12 to 18 points), team work with changing roles using
video-monitoring and debriefing by case-study check-lists (29 positions; from 2 to 5 points for each).
Results: According to the results of the final tests it was found that low initial assessment of
complicated childbirth and obstetric surgery themes significantly increased one and a half times (Ń€
<0,001). After completion of the course the theoretical training was improved by one and a half times
(Ń€ <0,05). Indicators of the output of the theoretical test of postgraduates with work experience of at
least five years, compared with the postgraduates, the experience of more than fifteen years, were 1.3
times lower (Ń€ < 0,001). On the course completion the theoretical performance testing has increased
one and a half times, and skills have increased twice. Initial assessment of the vacuum extraction
newborn operation significantly increased almost twice (Ń€ <0,001). Teamwork evaluation at the end of
the course has increased more than twice (Ń€ <0,001)
3D, haptics and virtual reality technologies implementation results in quality assessment and assurance of obstetrician-gynecologists’ hysteroscopy training
Modern practical gynecology requires mandatory diagnostic and therapeutic hysteroscopic (HS)
procedures. Their high-quality and safe performance is due to repeated trainings for mastering motor
skills. This should ultimately lead to a tactile sensation and understanding of depth, fulcrum and force
of impact. Simulation training is especially important, which gives effective theoretical and practical
training of novice gynecologists for basic HS manipulations, the ability to conduct and evaluate their
teamwork. The virtual simulator is widely used for objective assessment of clinical skills and abilities
associated with the competence
Safe Crossover of Neural Networks Through Neuron Alignment
One of the main and largely unexplored challenges in evolving the weights of
neural networks using genetic algorithms is to find a sensible crossover
operation between parent networks. Indeed, naive crossover leads to
functionally damaged offspring that do not retain information from the parents.
This is because neural networks are invariant to permutations of neurons,
giving rise to multiple ways of representing the same solution. This is often
referred to as the competing conventions problem. In this paper, we propose a
two-step safe crossover(SC) operator. First, the neurons of the parents are
functionally aligned by computing how well they correlate, and only then are
the parents recombined. We compare two ways of measuring relationships between
neurons: Pairwise Correlation (PwC) and Canonical Correlation Analysis (CCA).
We test our safe crossover operators (SC-PwC and SC-CCA) on MNIST and CIFAR-10
by performing arithmetic crossover on the weights of feed-forward neural
network pairs. We show that it effectively transmits information from parents
to offspring and significantly improves upon naive crossover. Our method is
computationally fast,can serve as a way to explore the fitness landscape more
efficiently and makes safe crossover a potentially promising operator in future
neuroevolution research and applications
Weighing Counts: Sequential Crowd Counting by Reinforcement Learning
We formulate counting as a sequential decision problem and present a novel
crowd counting model solvable by deep reinforcement learning. In contrast to
existing counting models that directly output count values, we divide one-step
estimation into a sequence of much easier and more tractable sub-decision
problems. Such sequential decision nature corresponds exactly to a physical
process in reality scale weighing. Inspired by scale weighing, we propose a
novel 'counting scale' termed LibraNet where the count value is analogized by
weight. By virtually placing a crowd image on one side of a scale, LibraNet
(agent) sequentially learns to place appropriate weights on the other side to
match the crowd count. At each step, LibraNet chooses one weight (action) from
the weight box (the pre-defined action pool) according to the current crowd
image features and weights placed on the scale pan (state). LibraNet is
required to learn to balance the scale according to the feedback of the needle
(Q values). We show that LibraNet exactly implements scale weighing by
visualizing the decision process how LibraNet chooses actions. Extensive
experiments demonstrate the effectiveness of our design choices and report
state-of-the-art results on a few crowd counting benchmarks. We also
demonstrate good cross-dataset generalization of LibraNet. Code and models are
made available at: https://git.io/libranetComment: Accepted to Proc. Eur. Conf. Computer Vision (ECCV) 202
- …