2,783 research outputs found
Cross-lingual and cross-domain discourse segmentation of entire documents
Discourse segmentation is a crucial step in building end-to-end discourse
parsers. However, discourse segmenters only exist for a few languages and
domains. Typically they only detect intra-sentential segment boundaries,
assuming gold standard sentence and token segmentation, and relying on
high-quality syntactic parses and rich heuristics that are not generally
available across languages and domains. In this paper, we propose statistical
discourse segmenters for five languages and three domains that do not rely on
gold pre-annotations. We also consider the problem of learning discourse
segmenters when no labeled data is available for a language. Our fully
supervised system obtains 89.5% F1 for English newswire, with slight drops in
performance on other domains, and we report supervised and unsupervised
(cross-lingual) results for five languages in total.Comment: To appear in Proceedings of ACL 201
Cross-lingual RST Discourse Parsing
Discourse parsing is an integral part of understanding information flow and
argumentative structure in documents. Most previous research has focused on
inducing and evaluating models from the English RST Discourse Treebank.
However, discourse treebanks for other languages exist, including Spanish,
German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same
underlying linguistic theory, but differ slightly in the way documents are
annotated. In this paper, we present (a) a new discourse parser which is
simpler, yet competitive (significantly better on 2/3 metrics) to state of the
art for English, (b) a harmonization of discourse treebanks across languages,
enabling us to present (c) what to the best of our knowledge are the first
experiments on cross-lingual discourse parsing.Comment: To be published in EACL 2017, 13 page
Evaluating Scoped Meaning Representations
Semantic parsing offers many opportunities to improve natural language
understanding. We present a semantically annotated parallel corpus for English,
German, Italian, and Dutch where sentences are aligned with scoped meaning
representations in order to capture the semantics of negation, modals,
quantification, and presupposition triggers. The semantic formalism is based on
Discourse Representation Theory, but concepts are represented by WordNet
synsets and thematic roles by VerbNet relations. Translating scoped meaning
representations to sets of clauses enables us to compare them for the purpose
of semantic parser evaluation and checking translations. This is done by
computing precision and recall on matching clauses, in a similar way as is done
for Abstract Meaning Representations. We show that our matching tool for
evaluating scoped meaning representations is both accurate and efficient.
Applying this matching tool to three baseline semantic parsers yields F-scores
between 43% and 54%. A pilot study is performed to automatically find changes
in meaning by comparing meaning representations of translations. This
comparison turns out to be an additional way of (i) finding annotation mistakes
and (ii) finding instances where our semantic analysis needs to be improved.Comment: Camera-ready for LREC 201
An example-based approach to translating sign language
Users of sign languages are often forced to use a language in which they have reduced competence simply because documentation in their preferred format is not available. While some research exists on translating between natural and sign languages, we present here what we believe to be the first attempt to tackle this problem using an example-based (EBMT) approach.
Having obtained a set of English–Dutch Sign Language examples, we employ an approach to EBMT using the ‘Marker Hypothesis’ (Green, 1979), analogous to the successful system of (Way & Gough, 2003), (Gough & Way, 2004a) and (Gough & Way, 2004b). In a set of experiments, we show that
encouragingly good translation quality may be obtained using such an approach
Proceedings
Proceedings of the Workshop on Annotation and
Exploitation of Parallel Corpora AEPC 2010.
Editors: Lars Ahrenberg, Jörg Tiedemann and Martin Volk.
NEALT Proceedings Series, Vol. 10 (2010), 98 pages.
© 2010 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/15893
The Parallel Meaning Bank:A Framework for Semantically Annotating Multiple Languages
This paper gives a general description of the ideas behind the Parallel
Meaning Bank, a framework with the aim to provide an easy way to annotate
compositional semantics for texts written in languages other than English. The
annotation procedure is semi-automatic, and comprises seven layers of
linguistic information: segmentation, symbolisation, semantic tagging, word
sense disambiguation, syntactic structure, thematic role labelling, and
co-reference. New languages can be added to the meaning bank as long as the
documents are based on translations from English, but also introduce new
interesting challenges on the linguistics assumptions underlying the Parallel
Meaning Bank.Comment: 13 pages, 5 figures, 1 tabl
Automated speech and audio analysis for semantic access to multimedia
The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to increased granularity of automatically extracted metadata. A number of techniques will be presented, including the alignment of speech and text resources, large vocabulary speech recognition, key word spotting and speaker classification. The applicability of techniques will be discussed from a media crossing perspective. The added value of the techniques and their potential contribution to the content value chain will be illustrated by the description of two (complementary) demonstrators for browsing broadcast news archives
- …