998 research outputs found
The Parallel Meaning Bank: Towards a Multilingual Corpus of Translations Annotated with Compositional Meaning Representations
The Parallel Meaning Bank is a corpus of translations annotated with shared,
formal meaning representations comprising over 11 million words divided over
four languages (English, German, Italian, and Dutch). Our approach is based on
cross-lingual projection: automatically produced (and manually corrected)
semantic annotations for English sentences are mapped onto their word-aligned
translations, assuming that the translations are meaning-preserving. The
semantic annotation consists of five main steps: (i) segmentation of the text
in sentences and lexical items; (ii) syntactic parsing with Combinatory
Categorial Grammar; (iii) universal semantic tagging; (iv) symbolization; and
(v) compositional semantic analysis based on Discourse Representation Theory.
These steps are performed using statistical models trained in a semi-supervised
manner. The employed annotation models are all language-neutral. Our first
results are promising.Comment: To appear at EACL 201
Learning to embed semantic similarity for joint image-text retrieval
We present a deep learning approach for learning the joint semantic
embeddings of images and captions in a Euclidean space, such that the semantic
similarity is approximated by the L2 distances in the embedding space. For
that, we introduce a metric learning scheme that utilizes multitask learning to
learn the embedding of identical semantic concepts using a center loss. By
introducing a differentiable quantization scheme into the end-to-end trainable
network, we derive a semantic embedding of semantically similar concepts in
Euclidean space. We also propose a novel metric learning formulation using an
adaptive margin hinge loss, that is refined during the training phase. The
proposed scheme was applied to the MS-COCO, Flicke30K and Flickr8K datasets,
and was shown to compare favorably with contemporary state-of-the-art
approaches.Comment: in IEEE Transactions on Pattern Analysis and Machine Intelligence,
202
Thirty Musts for Meaning Banking
Meaning banking--creating a semantically annotated corpus for the purpose of
semantic parsing or generation--is a challenging task. It is quite simple to
come up with a complex meaning representation, but it is hard to design a
simple meaning representation that captures many nuances of meaning. This paper
lists some lessons learned in nearly ten years of meaning annotation during the
development of the Groningen Meaning Bank (Bos et al., 2017) and the Parallel
Meaning Bank (Abzianidze et al., 2017). The paper's format is rather
unconventional: there is no explicit related work, no methodology section, no
results, and no discussion (and the current snippet is not an abstract but
actually an introductory preface). Instead, its structure is inspired by work
of Traum (2000) and Bender (2013). The list starts with a brief overview of the
existing meaning banks (Section 1) and the rest of the items are roughly
divided into three groups: corpus collection (Section 2 and 3, annotation
methods (Section 4-11), and design of meaning representations (Section 12-30).
We hope this overview will give inspiration and guidance in creating improved
meaning banks in the future.Comment: https://www.aclweb.org/anthology/W19-3302
Text Mining the History of Medicine
Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while the processing pipeline and its modules may be used and configured within the Argo TM platform
Crowdsourcing Question-Answer Meaning Representations
We introduce Question-Answer Meaning Representations (QAMRs), which represent
the predicate-argument structure of a sentence as a set of question-answer
pairs. We also develop a crowdsourcing scheme to show that QAMRs can be labeled
with very little training, and gather a dataset with over 5,000 sentences and
100,000 questions. A detailed qualitative analysis demonstrates that the
crowd-generated question-answer pairs cover the vast majority of
predicate-argument relationships in existing datasets (including PropBank,
NomBank, QA-SRL, and AMR) along with many previously under-resourced ones,
including implicit arguments and relations. The QAMR data and annotation code
is made publicly available to enable future work on how best to model these
complex phenomena.Comment: 8 pages, 6 figures, 2 table
Automatic Rule Generation for Time Expression Normalization
The understanding of time expressions includes two sub-tasks: recognition and
normalization. In recent years, significant progress has been made in the
recognition of time expressions while research on normalization has lagged
behind. Existing SOTA normalization methods highly rely on rules or grammars
designed by experts, which limits their performance on emerging corpora, such
as social media texts. In this paper, we model time expression normalization as
a sequence of operations to construct the normalized temporal value, and we
present a novel method called ARTime, which can automatically generate
normalization rules from training data without expert interventions.
Specifically, ARTime automatically captures possible operation sequences from
annotated data and generates normalization rules on time expressions with
common surface forms. The experimental results show that ARTime can
significantly surpass SOTA methods on the Tweets benchmark, and achieves
competitive results with existing expert-engineered rule methods on the
TempEval-3 benchmark.Comment: Accepted to Findings of EMNLP 202
The Sigma-Semantics: A Comprehensive Semantics for Functional Programs
A comprehensive semantics for functional programs is presented, which generalizes the well-known call-by-value and call-by-name semantics. By permitting a separate choice between call-by value and call-by-name for every argument position of every function and parameterizing the semantics by this choice we abstract from the parameter-passing mechanism. Thus common and distinguishing features of all instances of the sigma-semantics, especially call-by-value and call-by-name semantics, are highlighted. Furthermore, a property can be validated for all instances of the sigma-semantics by a single proof. This is employed for proving the equivalence of the given denotational (fixed-point based) and two operational (reduction based) definitions of the sigma-semantics. We present and apply means for very simple proofs of equivalence with the denotational sigma-semantics for a large class of reduction-based sigma-semantics. Our basis are simple first-order constructor-based functional programs with patterns
- …