1,317 research outputs found
When Are Tree Structures Necessary for Deep Learning of Representations?
Recursive neural models, which use syntactic parse trees to recursively
generate representations bottom-up, are a popular architecture. But there have
not been rigorous evaluations showing for exactly which tasks this syntax-based
method is appropriate. In this paper we benchmark {\bf recursive} neural models
against sequential {\bf recurrent} neural models (simple recurrent and LSTM
models), enforcing apples-to-apples comparison as much as possible. We
investigate 4 tasks: (1) sentiment classification at the sentence level and
phrase level; (2) matching questions to answer-phrases; (3) discourse parsing;
(4) semantic relation extraction (e.g., {\em component-whole} between nouns).
Our goal is to understand better when, and why, recursive models can
outperform simpler models. We find that recursive models help mainly on tasks
(like semantic relation extraction) that require associating headwords across a
long distance, particularly on very long sequences. We then introduce a method
for allowing recurrent models to achieve similar performance: breaking long
sentences into clause-like units at punctuation and processing them separately
before combining. Our results thus help understand the limitations of both
classes of models, and suggest directions for improving recurrent models
Named Entity Sequence Classification
Named Entity Recognition (NER) aims at locating and classifying named
entities in text. In some use cases of NER, including cases where detected
named entities are used in creating content recommendations, it is crucial to
have a reliable confidence level for the detected named entities. In this work
we study the problem of finding confidence levels for detected named entities.
We refer to this problem as Named Entity Sequence Classification (NESC). We
frame NESC as a binary classification problem and we use NER as well as
recurrent neural networks to find the probability of candidate named entity is
a real named entity. We apply this approach to Tweet texts and we show how we
could find named entities with high confidence levels from Tweets
On Generalization and Regularization in Deep Learning
Why do large neural network generalize so well on complex tasks such as image
classification or speech recognition? What exactly is the role regularization
for them? These are arguably among the most important open questions in machine
learning today. In a recent and thought provoking paper [C. Zhang et al.]
several authors performed a number of numerical experiments that hint at the
need for novel theoretical concepts to account for this phenomenon. The paper
stirred quit a lot of excitement among the machine learning community but at
the same time it created some confusion as discussions on OpenReview.net
testifies. The aim of this pedagogical paper is to make this debate accessible
to a wider audience of data scientists without advanced theoretical knowledge
in statistical learning. The focus here is on explicit mathematical definitions
and on a discussion of relevant concepts, not on proofs for which we provide
references.Comment: 11 pages, 3 figures pedagogical pape
Fast Reading Comprehension with ConvNets
State-of-the-art deep reading comprehension models are dominated by recurrent
neural nets. Their sequential nature is a natural fit for language, but it also
precludes parallelization within an instances and often becomes the bottleneck
for deploying such models to latency critical scenarios. This is particularly
problematic for longer texts. Here we present a convolutional architecture as
an alternative to these recurrent architectures. Using simple dilated
convolutional units in place of recurrent ones, we achieve results comparable
to the state of the art on two question answering tasks, while at the same time
achieving up to two orders of magnitude speedups for question answering.Comment: 15 pages, 10 figures, submitted to ICLR 201
Mining fine-grained opinions on closed captions of YouTube videos with an attention-RNN
Video reviews are the natural evolution of written product reviews. In this
paper we target this phenomenon and introduce the first dataset created from
closed captions of YouTube product review videos as well as a new attention-RNN
model for aspect extraction and joint aspect extraction and sentiment
classification. Our model provides state-of-the-art performance on aspect
extraction without requiring the usage of hand-crafted features on the SemEval
ABSA corpus, while it outperforms the baseline on the joint task. In our
dataset, the attention-RNN model outperforms the baseline for both tasks, but
we observe important performance drops for all models in comparison to SemEval.
These results, as well as further experiments on domain adaptation for aspect
extraction, suggest that differences between speech and written text, which
have been discussed extensively in the literature, also extend to the domain of
product reviews, where they are relevant for fine-grained opinion mining.Comment: 8th Workshop on Computational Approaches to Subjectivity, Sentiment &
Social Media Analysis (WASSA
AppLP: A Dialogue on Applications of Logic Programming
This document describes the contributions of the 2016 Applications of Logic
Programming Workshop (AppLP), which was held on October 17 and associated with
the International Conference on Logic Programming (ICLP) in Flushing, New York
City.Comment: David S. Warren and Yanhong A. Liu (Editors). 33 pages. Including
summaries by Christopher Kane and abstracts or position papers by M. Aref, J.
Rosenwald, I. Cervesato, E.S.L. Lam, M. Balduccini, J. Lobo, A. Russo, E.
Lupu, N. Leone, F. Ricca, G. Gupta, K. Marple, E. Salazar, Z. Chen, A. Sobhi,
S. Srirangapalli, C.R. Ramakrishnan, N. Bj{\o}rner, N.P. Lopes, A.
Rybalchenko, and P. Tara
Harry Potter and the Action Prediction Challenge from Natural Language
We explore the challenge of action prediction from textual descriptions of
scenes, a testbed to approximate whether text inference can be used to predict
upcoming actions. As a case of study, we consider the world of the Harry Potter
fantasy novels and inferring what spell will be cast next given a fragment of a
story. Spells act as keywords that abstract actions (e.g. 'Alohomora' to open a
door) and denote a response to the environment. This idea is used to
automatically build HPAC, a corpus containing 82,836 samples and 85 actions. We
then evaluate different baselines. Among the tested models, an LSTM-based
approach obtains the best performance for frequent actions and large scene
descriptions, but approaches such as logistic regression behave well on
infrequent actions.Comment: NAACL 2019 (short papers
Flexible Operator Embeddings via Deep Learning
Integrating machine learning into the internals of database management
systems requires significant feature engineering, a human effort-intensive
process to determine the best way to represent the pieces of information that
are relevant to a task. In addition to being labor intensive, the process of
hand-engineering features must generally be repeated for each data management
task, and may make assumptions about the underlying database that are not
universally true. We introduce flexible operator embeddings, a deep learning
technique for automatically transforming query operators into feature vectors
that are useful for a multiple data management tasks and is custom-tailored to
the underlying database. Our approach works by taking advantage of an
operator's context, resulting in a neural network that quickly transforms
sparse representations of query operators into dense, information-rich feature
vectors. Experimentally, we show that our flexible operator embeddings perform
well across a number of data management tasks, using both synthetic and
real-world datasets
Green Machine Learning via Augmented Gaussian Processes and Multi-Information Source Optimization
Searching for accurate Machine and Deep Learning models is a computationally
expensive and awfully energivorous process. A strategy which has been gaining
recently importance to drastically reduce computational time and energy
consumed is to exploit the availability of different information sources, with
different computational costs and different "fidelity", typically smaller
portions of a large dataset. The multi-source optimization strategy fits into
the scheme of Gaussian Process based Bayesian Optimization. An Augmented
Gaussian Process method exploiting multiple information sources (namely,
AGP-MISO) is proposed. The Augmented Gaussian Process is trained using only
"reliable" information among available sources. A novel acquisition function is
defined according to the Augmented Gaussian Process. Computational results are
reported related to the optimization of the hyperparameters of a Support Vector
Machine (SVM) classifier using two sources: a large dataset - the most
expensive one - and a smaller portion of it. A comparison with a traditional
Bayesian Optimization approach to optimize the hyperparameters of the SVM
classifier on the large dataset only is reported.Comment: 22 pages, 4 figures, submitted to Soft computing - Special Issue on
"Optimization methods for decision making: advances and applications
- …