2,222 research outputs found
Cross-lingual and cross-domain discourse segmentation of entire documents
Discourse segmentation is a crucial step in building end-to-end discourse
parsers. However, discourse segmenters only exist for a few languages and
domains. Typically they only detect intra-sentential segment boundaries,
assuming gold standard sentence and token segmentation, and relying on
high-quality syntactic parses and rich heuristics that are not generally
available across languages and domains. In this paper, we propose statistical
discourse segmenters for five languages and three domains that do not rely on
gold pre-annotations. We also consider the problem of learning discourse
segmenters when no labeled data is available for a language. Our fully
supervised system obtains 89.5% F1 for English newswire, with slight drops in
performance on other domains, and we report supervised and unsupervised
(cross-lingual) results for five languages in total.Comment: To appear in Proceedings of ACL 201
The Parallel Meaning Bank: Towards a Multilingual Corpus of Translations Annotated with Compositional Meaning Representations
The Parallel Meaning Bank is a corpus of translations annotated with shared,
formal meaning representations comprising over 11 million words divided over
four languages (English, German, Italian, and Dutch). Our approach is based on
cross-lingual projection: automatically produced (and manually corrected)
semantic annotations for English sentences are mapped onto their word-aligned
translations, assuming that the translations are meaning-preserving. The
semantic annotation consists of five main steps: (i) segmentation of the text
in sentences and lexical items; (ii) syntactic parsing with Combinatory
Categorial Grammar; (iii) universal semantic tagging; (iv) symbolization; and
(v) compositional semantic analysis based on Discourse Representation Theory.
These steps are performed using statistical models trained in a semi-supervised
manner. The employed annotation models are all language-neutral. Our first
results are promising.Comment: To appear at EACL 201
Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need!
Argumentation mining (AM) requires the identification of complex discourse
structures and has lately been applied with success monolingually. In this
work, we show that the existing resources are, however, not adequate for
assessing cross-lingual AM, due to their heterogeneity or lack of complexity.
We therefore create suitable parallel corpora by (human and machine)
translating a popular AM dataset consisting of persuasive student essays into
German, French, Spanish, and Chinese. We then compare (i) annotation projection
and (ii) bilingual word embeddings based direct transfer strategies for
cross-lingual AM, finding that the former performs considerably better and
almost eliminates the loss from cross-lingual transfer. Moreover, we find that
annotation projection works equally well when using either costly human or
cheap machine translations. Our code and data are available at
\url{http://github.com/UKPLab/coling2018-xling_argument_mining}.Comment: Accepted at Coling 201
Language-based multimedia information retrieval
This paper describes various methods and approaches for language-based multimedia information retrieval, which have been developed in the projects POP-EYE and OLIVE and which will be developed further in the MUMIS project. All of these project aim at supporting automated indexing of video material by use of human language technologies. Thus, in contrast to image or sound-based retrieval methods, where both the query language and the indexing methods build on non-linguistic data, these methods attempt to exploit advanced text retrieval technologies for the retrieval of non-textual material. While POP-EYE was building on subtitles or captions as the prime language key for disclosing video fragments, OLIVE is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which then serve as the basis for text-based retrieval functionality
Does syntax help discourse segmentation? Not so much
International audienceDiscourse segmentation is the first step in building discourse parsers. Most work on discourse segmentation does not scale to real-world discourse parsing across languages , for two reasons: (i) models rely on constituent trees, and (ii) experiments have relied on gold standard identification of sentence and token boundaries. We therefore investigate to what extent constituents can be replaced with universal dependencies , or left out completely, as well as how state-of-the-art segmenters fare in the absence of sentence boundaries. Our results show that dependency information is less useful than expected, but we provide a fully scalable, robust model that only relies on part-of-speech information, and show that it performs well across languages in the absence of any gold-standard annotation
Does syntax help discourse segmentation? Not so much
International audienceDiscourse segmentation is the first step in building discourse parsers. Most work on discourse segmentation does not scale to real-world discourse parsing across languages , for two reasons: (i) models rely on constituent trees, and (ii) experiments have relied on gold standard identification of sentence and token boundaries. We therefore investigate to what extent constituents can be replaced with universal dependencies , or left out completely, as well as how state-of-the-art segmenters fare in the absence of sentence boundaries. Our results show that dependency information is less useful than expected, but we provide a fully scalable, robust model that only relies on part-of-speech information, and show that it performs well across languages in the absence of any gold-standard annotation
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
- …