9,525 research outputs found
Incremental generation of plural descriptions : similarity and partitioning
Approaches to plural reference generation
emphasise descriptive brevity, but often lack
empirical backing. This paper describes
a corpus-based study of plural descriptions,
and proposes a psycholinguisticallymotivated
algorithm for plural reference
generation. The descriptive strategy is based
on partitioning and incorporates corpusderived
heuristics. An exhaustive evaluation
shows that the output closely matches human
data.peer-reviewe
Memory-Based Shallow Parsing
We present a memory-based learning (MBL) approach to shallow parsing in which
POS tagging, chunking, and identification of syntactic relations are formulated
as memory-based modules. The experiments reported in this paper show
competitive results, the F-value for the Wall Street Journal (WSJ) treebank is:
93.8% for NP chunking, 94.7% for VP chunking, 77.1% for subject detection and
79.0% for object detection.Comment: 8 pages, to appear in: Proceedings of the EACL'99 workshop on
Computational Natural Language Learning (CoNLL-99), Bergen, Norway, June 199
The CoNLL 2007 shared task on dependency parsing
The Conference on Computational Natural Language Learning features a shared task, in which participants train and test their learning systems on the same data sets. In 2007, as in 2006, the shared task has been devoted to dependency parsing, this year with both a multilingual track and a domain adaptation track. In this paper, we define the tasks of the different tracks and describe how the data sets were created from existing treebanks for ten languages. In addition, we characterize the different approaches of the participating systems, report the test results, and provide a first analysis of these results
Predefined Sparseness in Recurrent Sequence Models
Inducing sparseness while training neural networks has been shown to yield
models with a lower memory footprint but similar effectiveness to dense models.
However, sparseness is typically induced starting from a dense model, and thus
this advantage does not hold during training. We propose techniques to enforce
sparseness upfront in recurrent sequence models for NLP applications, to also
benefit training. First, in language modeling, we show how to increase hidden
state sizes in recurrent layers without increasing the number of parameters,
leading to more expressive models. Second, for sequence labeling, we show that
word embeddings with predefined sparseness lead to similar performance as dense
embeddings, at a fraction of the number of trainable parameters.Comment: the SIGNLL Conference on Computational Natural Language Learning
(CoNLL, 2018
Polyglot: Distributed Word Representations for Multilingual NLP
Distributed word representations (word embeddings) have recently contributed
to competitive performance in language modeling and several NLP tasks. In this
work, we train word embeddings for more than 100 languages using their
corresponding Wikipedias. We quantitatively demonstrate the utility of our word
embeddings by using them as the sole features for training a part of speech
tagger for a subset of these languages. We find their performance to be
competitive with near state-of-art methods in English, Danish and Swedish.
Moreover, we investigate the semantic features captured by these embeddings
through the proximity of word groupings. We will release these embeddings
publicly to help researchers in the development and enhancement of multilingual
applications.Comment: 10 pages, 2 figures, Proceedings of Conference on Computational
Natural Language Learning CoNLL'201
- …