127,688 research outputs found
DDRprog: A CLEVR Differentiable Dynamic Reasoning Programmer
We present a novel Dynamic Differentiable Reasoning (DDR) framework for
jointly learning branching programs and the functions composing them; this
resolves a significant nondifferentiability inhibiting recent dynamic
architectures. We apply our framework to two settings in two highly compact and
data efficient architectures: DDRprog for CLEVR Visual Question Answering and
DDRstack for reverse Polish notation expression evaluation. DDRprog uses a
recurrent controller to jointly predict and execute modular neural programs
that directly correspond to the underlying question logic; it explicitly forks
subprocesses to handle logical branching. By effectively leveraging additional
structural supervision, we achieve a large improvement over previous approaches
in subtask consistency and a small improvement in overall accuracy. We further
demonstrate the benefits of structural supervision in the RPN setting: the
inclusion of a stack assumption in DDRstack allows our approach to generalize
to long expressions where an LSTM fails the task
Exploring the Syntactic Abilities of RNNs with Multi-task Learning
Recent work has explored the syntactic abilities of RNNs using the
subject-verb agreement task, which diagnoses sensitivity to sentence structure.
RNNs performed this task well in common cases, but faltered in complex
sentences (Linzen et al., 2016). We test whether these errors are due to
inherent limitations of the architecture or to the relatively indirect
supervision provided by most agreement dependencies in a corpus. We trained a
single RNN to perform both the agreement task and an additional task, either
CCG supertagging or language modeling. Multi-task training led to significantly
lower error rates, in particular on complex sentences, suggesting that RNNs
have the ability to evolve more sophisticated syntactic representations than
shown before. We also show that easily available agreement training data can
improve performance on other syntactic tasks, in particular when only a limited
amount of training data is available for those tasks. The multi-task paradigm
can also be leveraged to inject grammatical knowledge into language models.Comment: To appear in CoNLL 201
SwellShark: A Generative Model for Biomedical Named Entity Recognition without Labeled Data
We present SwellShark, a framework for building biomedical named entity
recognition (NER) systems quickly and without hand-labeled data. Our approach
views biomedical resources like lexicons as function primitives for
autogenerating weak supervision. We then use a generative model to unify and
denoise this supervision and construct large-scale, probabilistically labeled
datasets for training high-accuracy NER taggers. In three biomedical NER tasks,
SwellShark achieves competitive scores with state-of-the-art supervised
benchmarks using no hand-labeled training data. In a drug name extraction task
using patient medical records, one domain expert using SwellShark achieved
within 5.1% of a crowdsourced annotation approach -- which originally utilized
20 teams over the course of several weeks -- in 24 hours
Taskonomy: Disentangling Task Transfer Learning
Do visual tasks have a relationship, or are they unrelated? For instance,
could having surface normals simplify estimating the depth of an image?
Intuition answers these questions positively, implying existence of a structure
among visual tasks. Knowing this structure has notable values; it is the
concept underlying transfer learning and provides a principled way for
identifying redundancies across tasks, e.g., to seamlessly reuse supervision
among related tasks or solve many tasks in one system without piling up the
complexity.
We proposes a fully computational approach for modeling the structure of
space of visual tasks. This is done via finding (first and higher-order)
transfer learning dependencies across a dictionary of twenty six 2D, 2.5D, 3D,
and semantic tasks in a latent space. The product is a computational taxonomic
map for task transfer learning. We study the consequences of this structure,
e.g. nontrivial emerged relationships, and exploit them to reduce the demand
for labeled data. For example, we show that the total number of labeled
datapoints needed for solving a set of 10 tasks can be reduced by roughly 2/3
(compared to training independently) while keeping the performance nearly the
same. We provide a set of tools for computing and probing this taxonomical
structure including a solver that users can employ to devise efficient
supervision policies for their use cases.Comment: CVPR 2018 (Oral). See project website and live demos at
http://taskonomy.vision
Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale
Labeling training data is one of the most costly bottlenecks in developing
machine learning-based applications. We present a first-of-its-kind study
showing how existing knowledge resources from across an organization can be
used as weak supervision in order to bring development time and cost down by an
order of magnitude, and introduce Snorkel DryBell, a new weak supervision
management system for this setting. Snorkel DryBell builds on the Snorkel
framework, extending it in three critical aspects: flexible, template-based
ingestion of diverse organizational knowledge, cross-feature production
serving, and scalable, sampling-free execution. On three classification tasks
at Google, we find that Snorkel DryBell creates classifiers of comparable
quality to ones trained with tens of thousands of hand-labeled examples,
converts non-servable organizational resources to servable models for an
average 52% performance improvement, and executes over millions of data points
in tens of minutes
Learning the Structure of Generative Models without Labeled Data
Curating labeled training data has become the primary bottleneck in machine
learning. Recent frameworks address this bottleneck with generative models to
synthesize labels at scale from weak supervision sources. The generative
model's dependency structure directly affects the quality of the estimated
labels, but selecting a structure automatically without any labeled data is a
distinct challenge. We propose a structure estimation method that maximizes the
-regularized marginal pseudolikelihood of the observed data. Our
analysis shows that the amount of unlabeled data required to identify the true
structure scales sublinearly in the number of possible dependencies for a broad
class of models. Simulations show that our method is 100 faster than a
maximum likelihood approach and selects as many extraneous dependencies.
We also show that our method provides an average of 1.5 F1 points of
improvement over existing, user-developed information extraction applications
on real-world data such as PubMed journal abstracts
A Survey of Deep Learning Methods for Relation Extraction
Relation Extraction is an important sub-task of Information Extraction which
has the potential of employing deep learning (DL) models with the creation of
large datasets using distant supervision. In this review, we compare the
contributions and pitfalls of the various DL models that have been used for the
task, to help guide the path ahead
Indirect Supervision for Relation Extraction using Question-Answer Pairs
Automatic relation extraction (RE) for types of interest is of great
importance for interpreting massive text corpora in an efficient manner.
Traditional RE models have heavily relied on human-annotated corpus for
training, which can be costly in generating labeled data and become obstacles
when dealing with more relation types. Thus, more RE extraction systems have
shifted to be built upon training data automatically acquired by linking to
knowledge bases (distant supervision). However, due to the incompleteness of
knowledge bases and the context-agnostic labeling, the training data collected
via distant supervision (DS) can be very noisy. In recent years, as increasing
attention has been brought to tackling question-answering (QA) tasks, user
feedback or datasets of such tasks become more accessible. In this paper, we
propose a novel framework, ReQuest, to leverage question-answer pairs as an
indirect source of supervision for relation extraction, and study how to use
such supervision to reduce noise induced from DS. Our model jointly embeds
relation mentions, types, QA entity mention pairs and text features in two
low-dimensional spaces (RE and QA), where objects with same relation types or
semantically similar question-answer pairs have similar representations. Shared
features connect these two spaces, carrying clearer semantic knowledge from
both sources. ReQuest, then use these learned embeddings to estimate the types
of test relation mentions. We formulate a global objective function and adopt a
novel margin-based QA loss to reduce noise in DS by exploiting semantic
evidence from the QA dataset. Our experimental results achieve an average of
11% improvement in F1 score on two public RE datasets combined with TREC QA
dataset.Comment: 9 pages + 1 page reference. Accepted to WSDM 201
Machine Learning with World Knowledge: The Position and Survey
Machine learning has become pervasive in multiple domains, impacting a wide
variety of applications, such as knowledge discovery and data mining, natural
language processing, information retrieval, computer vision, social and health
informatics, ubiquitous computing, etc. Two essential problems of machine
learning are how to generate features and how to acquire labels for machines to
learn. Particularly, labeling large amount of data for each domain-specific
problem can be very time consuming and costly. It has become a key obstacle in
making learning protocols realistic in applications. In this paper, we will
discuss how to use the existing general-purpose world knowledge to enhance
machine learning processes, by enriching the features or reducing the labeling
work. We start from the comparison of world knowledge with domain-specific
knowledge, and then introduce three key problems in using world knowledge in
learning processes, i.e., explicit and implicit feature representation,
inference for knowledge linking and disambiguation, and learning with direct or
indirect supervision. Finally we discuss the future directions of this research
topic
Exploiting Syntactic Structure for Better Language Modeling: A Syntactic Distance Approach
It is commonly believed that knowledge of syntactic structure should improve
language modeling. However, effectively and computationally efficiently
incorporating syntactic structure into neural language models has been a
challenging topic. In this paper, we make use of a multi-task objective, i.e.,
the models simultaneously predict words as well as ground truth parse trees in
a form called "syntactic distances", where information between these two
separate objectives shares the same intermediate representation. Experimental
results on the Penn Treebank and Chinese Treebank datasets show that when
ground truth parse trees are provided as additional training signals, the model
is able to achieve lower perplexity and induce trees with better quality.Comment: ACL2
- …