45 research outputs found
The Parallel Meaning Bank:A Framework for Semantically Annotating Multiple Languages
This paper gives a general description of the ideas behind the Parallel
Meaning Bank, a framework with the aim to provide an easy way to annotate
compositional semantics for texts written in languages other than English. The
annotation procedure is semi-automatic, and comprises seven layers of
linguistic information: segmentation, symbolisation, semantic tagging, word
sense disambiguation, syntactic structure, thematic role labelling, and
co-reference. New languages can be added to the meaning bank as long as the
documents are based on translations from English, but also introduce new
interesting challenges on the linguistics assumptions underlying the Parallel
Meaning Bank.Comment: 13 pages, 5 figures, 1 tabl
DRS at MRP 2020:Dressing up Discourse Representation Structures as Graphs
Discourse Representation Theory (DRT) is a formal account for representing
the meaning of natural language discourse. Meaning in DRT is modeled via a
Discourse Representation Structure (DRS), a meaning representation with a
model-theoretic interpretation, which is usually depicted as nested boxes. In
contrast, a directed labeled graph is a common data structure used to encode
semantics of natural language texts. The paper describes the procedure of
dressing up DRSs as directed labeled graphs to include DRT as a new framework
in the 2020 shared task on Cross-Framework and Cross-Lingual Meaning
Representation Parsing. Since one of the goals of the shared task is to
encourage unified models for several semantic graph frameworks, the conversion
procedure was biased towards making the DRT graph framework somewhat similar to
other graph-based meaning representation frameworks.Comment: 10 pages, 4 figures, 4 tables, CoNLL 2020 Shared Tas
Global and Local Hierarchy-aware Contrastive Framework for Implicit Discourse Relation Recognition
Due to the absence of explicit connectives, implicit discourse relation
recognition (IDRR) remains a challenging task in discourse analysis. The
critical step for IDRR is to learn high-quality discourse relation
representations between two arguments. Recent methods tend to integrate the
whole hierarchical information of senses into discourse relation
representations for multi-level sense recognition. Nevertheless, they
insufficiently incorporate the static hierarchical structure containing all
senses (defined as global hierarchy), and ignore the hierarchical sense label
sequence corresponding to each instance (defined as local hierarchy). For the
purpose of sufficiently exploiting global and local hierarchies of senses to
learn better discourse relation representations, we propose a novel GLobal and
LOcal Hierarchy-aware Contrastive Framework (GLOF), to model two kinds of
hierarchies with the aid of contrastive learning. Experimental results on the
PDTB dataset demonstrate that our method remarkably outperforms the current
state-of-the-art model at all hierarchical levels.Comment: 13 pages, 10 figure
Character-based Neural Semantic Parsing
Humans and computers do not speak the same language. A lot of day-to-day tasks would be vastly more efficient if we could communicate with computers using natural language instead of relying on an interface. It is necessary, then, that the computer does not see a sentence as a collection of individual words, but instead can understand the deeper, compositional meaning of the sentence. A way to tackle this problem is to automatically assign a formal, structured meaning representation to each sentence, which are easy for computers to interpret. There have been quite a few attempts at this before, but these approaches were usually heavily reliant on predefined rules, word lists or representations of the syntax of the text. This made the general usage of these methods quite complicated. In this thesis we employ an algorithm that can learn to automatically assign meaning representations to texts, without using any such external resource. Specifically, we use a type of artificial neural network called a sequence-to-sequence model, in a process that is often referred to as deep learning. The devil is in the details, but we find that this type of algorithm can produce high quality meaning representations, with better performance than the more traditional methods. Moreover, a main finding of the thesis is that, counter intuitively, it is often better to represent the text as a sequence of individual characters, and not words. This is likely the case because it helps the model in dealing with spelling errors, unknown words and inflections
Discovering multiword expressions
In this paper, we provide an overview of research on multiword expressions (MWEs), from a natural lan- guage processing perspective. We examine methods developed for modelling MWEs that capture some of their linguistic properties, discussing their use for MWE discovery and for idiomaticity detection. We con- centrate on their collocational and contextual preferences, along with their fixedness in terms of canonical forms and their lack of word-for-word translatatibility. We also discuss a sample of the MWE resources that have been used in intrinsic evaluation setups for these methods
VALSE: A Task-independent benchmark for Vision and Language models centered on linguistic phenomena
We propose VALSE (Vision And Language Structured Evaluation), a novel benchmark designed for testing general-purpose pretrained vision and language (V&L) models for their visio-linguistic grounding capabilities on specific linguistic phenomena. VALSE offers a suite of six tests covering various linguistic constructs. Solving these requires models to ground linguistic phenomena in the visual modality, allowing more fine-grained evaluations than hitherto possible. We build VALSE using methods that support the construction of valid foils, and report results from evaluating five widely-used V&L models. Our experiments suggest that current models have considerable difficulty addressing most phenomena. Hence, we expect VALSE to serve as an important benchmark to measure future progress of pretrained V&L models from a linguistic perspective, complementing the canonical task-centred V&L evaluations