1,677 research outputs found
Crossings as a side effect of dependency lengths
The syntactic structure of sentences exhibits a striking regularity:
dependencies tend to not cross when drawn above the sentence. We investigate
two competing explanations. The traditional hypothesis is that this trend
arises from an independent principle of syntax that reduces crossings
practically to zero. An alternative to this view is the hypothesis that
crossings are a side effect of dependency lengths, i.e. sentences with shorter
dependency lengths should tend to have fewer crossings. We are able to reject
the traditional view in the majority of languages considered. The alternative
hypothesis can lead to a more parsimonious theory of language.Comment: the discussion section has been expanded significantly; in press in
Complexity (Wiley
An attentive neural architecture for joint segmentation and parsing and its application to real estate ads
In processing human produced text using natural language processing (NLP)
techniques, two fundamental subtasks that arise are (i) segmentation of the
plain text into meaningful subunits (e.g., entities), and (ii) dependency
parsing, to establish relations between subunits. In this paper, we develop a
relatively simple and effective neural joint model that performs both
segmentation and dependency parsing together, instead of one after the other as
in most state-of-the-art works. We will focus in particular on the real estate
ad setting, aiming to convert an ad to a structured description, which we name
property tree, comprising the tasks of (1) identifying important entities of a
property (e.g., rooms) from classifieds and (2) structuring them into a tree
format. In this work, we propose a new joint model that is able to tackle the
two tasks simultaneously and construct the property tree by (i) avoiding the
error propagation that would arise from the subtasks one after the other in a
pipelined fashion, and (ii) exploiting the interactions between the subtasks.
For this purpose, we perform an extensive comparative study of the pipeline
methods and the new proposed joint model, reporting an improvement of over
three percentage points in the overall edge F1 score of the property tree.
Also, we propose attention methods, to encourage our model to focus on salient
tokens during the construction of the property tree. Thus we experimentally
demonstrate the usefulness of attentive neural architectures for the proposed
joint model, showcasing a further improvement of two percentage points in edge
F1 score for our application.Comment: Preprint - Accepted for publication in Expert Systems with
Application
Nightmare at test time: How punctuation prevents parsers from generalizing
Punctuation is a strong indicator of syntactic structure, and parsers trained
on text with punctuation often rely heavily on this signal. Punctuation is a
diversion, however, since human language processing does not rely on
punctuation to the same extent, and in informal texts, we therefore often leave
out punctuation. We also use punctuation ungrammatically for emphatic or
creative purposes, or simply by mistake. We show that (a) dependency parsers
are sensitive to both absence of punctuation and to alternative uses; (b)
neural parsers tend to be more sensitive than vintage parsers; (c) training
neural parsers without punctuation outperforms all out-of-the-box parsers
across all scenarios where punctuation departs from standard punctuation. Our
main experiments are on synthetically corrupted data to study the effect of
punctuation in isolation and avoid potential confounds, but we also show
effects on out-of-domain data.Comment: Analyzing and interpreting neural networks for NLP, EMNLP 2018
worksho
- …