399 research outputs found
Refining Implicit Argument Annotation for UCCA
Predicate-argument structure analysis is a central component in meaning
representations of text. The fact that some arguments are not explicitly
mentioned in a sentence gives rise to ambiguity in language understanding, and
renders it difficult for machines to interpret text correctly. However, only
few resources represent implicit roles for NLU, and existing studies in NLP
only make coarse distinctions between categories of arguments omitted from
linguistic form. This paper proposes a typology for fine-grained implicit
argument annotation on top of Universal Conceptual Cognitive Annotation's
foundational layer. The proposed implicit argument categorisation is driven by
theories of implicit role interpretation and consists of six types: Deictic,
Generic, Genre-based, Type-identifiable, Non-specific, and Iterated-set. We
exemplify our design by revisiting part of the UCCA EWT corpus, providing a new
dataset annotated with the refinement layer, and making a comparative analysis
with other schemes.Comment: DMR 202
Irish treebanking and parsing: a preliminary evaluation
Language resources are essential for linguistic research and the development of NLP applications. Low- density languages, such as Irish, therefore lack significant research in this area. This paper describes the early stages in the development of new language resources for Irish – namely the first Irish dependency treebank and the first Irish statistical dependency parser. We present the methodology behind building our new treebank and the steps we take to leverage upon the few existing resources. We discuss language specific choices made when defining our dependency labelling scheme, and describe interesting Irish language characteristics such as prepositional attachment, copula and clefting. We manually develop a small treebank of 300 sentences based on an existing POS-tagged corpus and report an inter-annotator agreement of 0.7902. We train MaltParser to achieve preliminary parsing results for Irish and describe a bootstrapping approach for further stages of development
Combining ontologies and neural networks for analyzing historical language varieties: a case study in Middle Low German
In this paper, we describe experiments on the morphosyntactic annotation of historical language varieties for the example of Middle Low German (MLG), the official language of the German Hanse during the Middle Ages and a dominant language around the Baltic Sea by the time. To our best knowledge, this is the first experiment in automatically producing morphosyntactic annotations for Middle Low German, and accordingly, no part-of-speech (POS) tagset is currently agreed upon. In our experiment, we illustrate how ontology-based specifications of projected annotations can be employed to circumvent this issue: Instead of training and evaluating against a given tagset, we decomponse it into independent features which are predicted independently by a neural network. Using consistency constraints (axioms) from an ontology, then, the predicted feature probabilities are decoded into a sound ontological representation. Using these representations, we can finally bootstrap a POS tagset capturing only morphosyntactic features which could be reliably predicted. In this way, our approach is capable to optimize precision and recall of morphosyntactic annotations simultaneously with bootstrapping a tagset rather than performing iterative cycles
References to graphical objects in interactive multimodel queries
This thesis describes a computational model for interpreting natural language expressions in an interactive multimodal query system integrating both natural language text
and graphic displays. The primary concern of the model is to interpret expressions that
might involve graphical attributes, and expressions whose referents could be objects
on the screen.Graphical objects on the screen are used to visualise entities in the application domain
and their attributes (in short, domain entities and domain attributes). This is why
graphical objects are treated as descriptions of those domain entities/attributes in
the literature. However, graphical objects and their attributes are visible during the
interaction, and are thus known by the participants of the interaction. Therefore, they
themselves should be part of the mutual knowledge of the interaction.This poses some interesting problems in language processing. As part of the mutual
knowledge, graphical attributes could be used in expressions, and graphical objects
could be referred to by expressions. In consequence, there could be ambiguities about
whether an attribute in an expression belongs to a graphical object or to a domain
entity. There could also be ambiguities about whether the referent of an expression is
a graphical object or a domain entity.The main contributions of this thesis consist of analysing the above ambiguities, de¬
signing, implementing and testing a computational model and a demonstration system
for resolving these ambiguities. Firstly, a structure and corresponding terminology are
set up, so these ambiguities can be clarified as ambiguities derived from referring to
different databases, the screen or the application domain (source ambiguities). Secondly, a meaning representation language is designed which explicitly represents the
information about which database an attribute/entity comes from. Several linguistic
regularities inside and among referring expressions are described so that they can be
used as heuristics in the ambiguity resolution. Thirdly, a computational model based
on constraint satisfaction is constructed to resolve simultaneously some reference ambiguities and source ambiguities. Then, a demonstration system integrating natural
language text and graphics is implemented, whose core is the computational model.This thesis ends with an evaluation of the computational model. It provides some
concrete evidence about the advantages and disadvantages of the above approach
Proceedings
Proceedings of the NODALIDA 2011 Workshop
Constraint Grammar Applications.
Editors: Eckhard Bick, Kristin Hagen, Kaili Müürisep, Trond Trosterud.
NEALT Proceedings Series, Vol. 14 (2011), vi+69 pp.
© 2011 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/19231
- …