12,407 research outputs found
Path integral policy improvement with differential dynamic programming
Path Integral Policy Improvement with Covariance Matrix Adaptation (PI2-CMA) is a step-based model free reinforcement learning approach that combines statistical estimation techniques with fundamental results from Stochastic Optimal Control. Basically, a policy distribution is improved iteratively using reward weighted averaging of the corresponding rollouts. It was assumed that PI2-CMA somehow exploited gradient information that was contained by the reward weighted statistics. To our knowledge we are the first to expose the principle of this gradient extraction rigorously. Our findings reveal that PI2-CMA essentially obtains gradient information similar to the forward and backward passes in the Differential Dynamic Programming (DDP) method. It is then straightforward to extend the analogy with DDP by introducing a feedback term in the policy update. This suggests a novel algorithm which we coin Path Integral Policy Improvement with Differential Dynamic Programming (PI2-DDP). The resulting algorithm is similar to the previously proposed Sampled Differential Dynamic Programming (SaDDP) but we derive the method independently as a generalization of the framework of PI2-CMA. Our derivations suggest to implement some small variations to SaDDP so to increase performance. We validated our claims on a robot trajectory learning task
Learning to detect chest radiographs containing lung nodules using visual attention networks
Machine learning approaches hold great potential for the automated detection
of lung nodules in chest radiographs, but training the algorithms requires vary
large amounts of manually annotated images, which are difficult to obtain. Weak
labels indicating whether a radiograph is likely to contain pulmonary nodules
are typically easier to obtain at scale by parsing historical free-text
radiological reports associated to the radiographs. Using a repositotory of
over 700,000 chest radiographs, in this study we demonstrate that promising
nodule detection performance can be achieved using weak labels through
convolutional neural networks for radiograph classification. We propose two
network architectures for the classification of images likely to contain
pulmonary nodules using both weak labels and manually-delineated bounding
boxes, when these are available. Annotated nodules are used at training time to
deliver a visual attention mechanism informing the model about its localisation
performance. The first architecture extracts saliency maps from high-level
convolutional layers and compares the estimated position of a nodule against
the ground truth, when this is available. A corresponding localisation error is
then back-propagated along with the softmax classification error. The second
approach consists of a recurrent attention model that learns to observe a short
sequence of smaller image portions through reinforcement learning. When a
nodule annotation is available at training time, the reward function is
modified accordingly so that exploring portions of the radiographs away from a
nodule incurs a larger penalty. Our empirical results demonstrate the potential
advantages of these architectures in comparison to competing methodologies
Cross-relation Cross-bag Attention for Distantly-supervised Relation Extraction
Distant supervision leverages knowledge bases to automatically label
instances, thus allowing us to train relation extractor without human
annotations. However, the generated training data typically contain massive
noise, and may result in poor performances with the vanilla supervised
learning. In this paper, we propose to conduct multi-instance learning with a
novel Cross-relation Cross-bag Selective Attention (CSA), which leads to
noise-robust training for distant supervised relation extractor. Specifically,
we employ the sentence-level selective attention to reduce the effect of noisy
or mismatched sentences, while the correlation among relations were captured to
improve the quality of attention weights. Moreover, instead of treating all
entity-pairs equally, we try to pay more attention to entity-pairs with a
higher quality. Similarly, we adopt the selective attention mechanism to
achieve this goal. Experiments with two types of relation extractor demonstrate
the superiority of the proposed approach over the state-of-the-art, while
further ablation studies verify our intuitions and demonstrate the
effectiveness of our proposed two techniques.Comment: AAAI 201
Long-tail Relation Extraction via Knowledge Graph Embeddings and Graph Convolution Networks
We propose a distance supervised relation extraction approach for
long-tailed, imbalanced data which is prevalent in real-world settings. Here,
the challenge is to learn accurate "few-shot" models for classes existing at
the tail of the class distribution, for which little data is available.
Inspired by the rich semantic correlations between classes at the long tail and
those at the head, we take advantage of the knowledge from data-rich classes at
the head of the distribution to boost the performance of the data-poor classes
at the tail. First, we propose to leverage implicit relational knowledge among
class labels from knowledge graph embeddings and learn explicit relational
knowledge using graph convolution networks. Second, we integrate that
relational knowledge into relation extraction model by coarse-to-fine
knowledge-aware attention mechanism. We demonstrate our results for a
large-scale benchmark dataset which show that our approach significantly
outperforms other baselines, especially for long-tail relations.Comment: To be published in NAACL 201
- …