12,553 research outputs found
A Systematic Assessment of Syntactic Generalization in Neural Language Models
While state-of-the-art neural network models continue to achieve lower
perplexity scores on language modeling benchmarks, it remains unknown whether
optimizing for broad-coverage predictive performance leads to human-like
syntactic knowledge. Furthermore, existing work has not provided a clear
picture about the model properties required to produce proper syntactic
generalizations. We present a systematic evaluation of the syntactic knowledge
of neural language models, testing 20 combinations of model types and data
sizes on a set of 34 English-language syntactic test suites. We find
substantial differences in syntactic generalization performance by model
architecture, with sequential models underperforming other architectures.
Factorially manipulating model architecture and training dataset size (1M--40M
words), we find that variability in syntactic generalization performance is
substantially greater by architecture than by dataset size for the corpora
tested in our experiments. Our results also reveal a dissociation between
perplexity and syntactic generalization performance.Comment: To appear in the Proceedings of the Association for Computational
Linguistics (ACL 2020
Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks
Natural language is hierarchically structured: smaller units (e.g., phrases)
are nested within larger units (e.g., clauses). When a larger constituent ends,
all of the smaller constituents that are nested within it must also be closed.
While the standard LSTM architecture allows different neurons to track
information at different time scales, it does not have an explicit bias towards
modeling a hierarchy of constituents. This paper proposes to add such an
inductive bias by ordering the neurons; a vector of master input and forget
gates ensures that when a given neuron is updated, all the neurons that follow
it in the ordering are also updated. Our novel recurrent architecture, ordered
neurons LSTM (ON-LSTM), achieves good performance on four different tasks:
language modeling, unsupervised parsing, targeted syntactic evaluation, and
logical inference.Comment: Published as a conference paper at ICLR 201
How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?
Long short-term memory (LSTM) networks and their variants are capable of
encapsulating long-range dependencies, which is evident from their performance
on a variety of linguistic tasks. On the other hand, simple recurrent networks
(SRNs), which appear more biologically grounded in terms of synaptic
connections, have generally been less successful at capturing long-range
dependencies as well as the loci of grammatical errors in an unsupervised
setting. In this paper, we seek to develop models that bridge the gap between
biological plausibility and linguistic competence. We propose a new
architecture, the Decay RNN, which incorporates the decaying nature of neuronal
activations and models the excitatory and inhibitory connections in a
population of neurons. Besides its biological inspiration, our model also shows
competitive performance relative to LSTMs on subject-verb agreement, sentence
grammaticality, and language modeling tasks. These results provide some
pointers towards probing the nature of the inductive biases required for RNN
architectures to model linguistic phenomena successfully.Comment: 11 pages, 5 figures (including appendix); to appear at ACL SRW 202
Use Privacy in Data-Driven Systems: Theory and Experiments with Machine Learnt Programs
This paper presents an approach to formalizing and enforcing a class of use
privacy properties in data-driven systems. In contrast to prior work, we focus
on use restrictions on proxies (i.e. strong predictors) of protected
information types. Our definition relates proxy use to intermediate
computations that occur in a program, and identify two essential properties
that characterize this behavior: 1) its result is strongly associated with the
protected information type in question, and 2) it is likely to causally affect
the final output of the program. For a specific instantiation of this
definition, we present a program analysis technique that detects instances of
proxy use in a model, and provides a witness that identifies which parts of the
corresponding program exhibit the behavior. Recognizing that not all instances
of proxy use of a protected information type are inappropriate, we make use of
a normative judgment oracle that makes this inappropriateness determination for
a given witness. Our repair algorithm uses the witness of an inappropriate
proxy use to transform the model into one that provably does not exhibit proxy
use, while avoiding changes that unduly affect classification accuracy. Using a
corpus of social datasets, our evaluation shows that these algorithms are able
to detect proxy use instances that would be difficult to find using existing
techniques, and subsequently remove them while maintaining acceptable
classification performance.Comment: extended CCS 2017 camera-ready: several new discussions, and
complexity results added to appendi
Online Deception Detection Refueled by Real World Data Collection
The lack of large realistic datasets presents a bottleneck in online
deception detection studies. In this paper, we apply a data collection method
based on social network analysis to quickly identify high-quality deceptive and
truthful online reviews from Amazon. The dataset contains more than 10,000
deceptive reviews and is diverse in product domains and reviewers. Using this
dataset, we explore effective general features for online deception detection
that perform well across domains. We demonstrate that with generalized features
- advertising speak and writing complexity scores - deception detection
performance can be further improved by adding additional deceptive reviews from
assorted domains in training. Finally, reviewer level evaluation gives an
interesting insight into different deceptive reviewers' writing styles.Comment: 10 pages, Accepted to Recent Advances in Natural Language Processing
(RANLP) 201
What can linguistics and deep learning contribute to each other?
Joe Pater's target article calls for greater interaction between neural
network research and linguistics. I expand on this call and show how such
interaction can benefit both fields. Linguists can contribute to research on
neural networks for language technologies by clearly delineating the linguistic
capabilities that can be expected of such systems, and by constructing
controlled experimental paradigms that can determine whether those desiderata
have been met. In the other direction, neural networks can benefit the
scientific study of language by providing infrastructure for modeling human
sentence processing and for evaluating the necessity of particular innate
constraints on language acquisition.Comment: Response to Joe Pater, "Generative linguistics and neural networks at
60: foundation, friction, and fusion". To appear in Languag
Semantic Frame Parsing for Information Extraction : the CALOR corpus
This paper presents a publicly available corpus of French encyclopedic
history texts annotated according to the Berkeley FrameNet formalism. The main
difference in our approach compared to previous works on semantic parsing with
FrameNet is that we are not interested here in full text parsing but rather on
partial parsing. The goal is to select from the FrameNet resources the minimal
set of frames that are going to be useful for the applicative framework
targeted, in our case Information Extraction from encyclopedic documents. Such
an approach leverages the manual annotation of larger corpora than those
obtained through full text parsing and therefore opens the door to alternative
methods for Frame parsing than those used so far on the FrameNet 1.5 benchmark
corpus. The approaches compared in this study rely on an integrated sequence
labeling model which jointly optimizes frame identification and semantic role
segmentation and identification. The models compared are CRFs and multitasks
bi-LSTMs
Unsupervised Neural Text Simplification
The paper presents a first attempt towards unsupervised neural text
simplification that relies only on unlabeled text corpora. The core framework
is composed of a shared encoder and a pair of attentional-decoders and gains
knowledge of simplification through discrimination based-losses and denoising.
The framework is trained using unlabeled text collected from en-Wikipedia dump.
Our analysis (both quantitative and qualitative involving human evaluators) on
a public test data shows that the proposed model can perform
text-simplification at both lexical and syntactic levels, competitive to
existing supervised methods. Addition of a few labelled pairs also improves the
performance further.Comment: ACL 201
Multitask Parsing Across Semantic Representations
The ability to consolidate information of different types is at the core of
intelligence, and has tremendous practical value in allowing learning for one
task to benefit from generalizations learned for others. In this paper we
tackle the challenging task of improving semantic parsing performance, taking
UCCA parsing as a test case, and AMR, SDP and Universal Dependencies (UD)
parsing as auxiliary tasks. We experiment on three languages, using a uniform
transition-based system and learning architecture for all parsing tasks.
Despite notable conceptual, formal and domain differences, we show that
multitask learning significantly improves UCCA parsing in both in-domain and
out-of-domain settings.Comment: Accepted to ACL 201
Catering to Your Concerns: Automatic Generation of Personalised Security-Centric Descriptions for Android Apps
Android users are increasingly concerned with the privacy of their data and
security of their devices. To improve the security awareness of users, recent
automatic techniques produce security-centric descriptions by performing
program analysis. However, the generated text does not always address users'
concerns as they are generally too technical to be understood by ordinary
users. Moreover, different users have varied linguistic preferences, which do
not match the text. Motivated by this challenge, we develop an innovative
scheme to help users avoid malware and privacy-breaching apps by generating
security descriptions that explain the privacy and security related aspects of
an Android app in clear and understandable terms. We implement a prototype
system, PERSCRIPTION, to generate personalised security-centric descriptions
that automatically learn users' security concerns and linguistic preferences to
produce user-oriented descriptions. We evaluate our scheme through experiments
and user studies. The results clearly demonstrate the improvement on
readability and users' security awareness of PERSCRIPTION's descriptions
compared to existing description generators
- …