270 research outputs found
Automatic Accuracy Prediction for AMR Parsing
Abstract Meaning Representation (AMR) represents sentences as directed,
acyclic and rooted graphs, aiming at capturing their meaning in a machine
readable format. AMR parsing converts natural language sentences into such
graphs. However, evaluating a parser on new data by means of comparison to
manually created AMR graphs is very costly. Also, we would like to be able to
detect parses of questionable quality, or preferring results of alternative
systems by selecting the ones for which we can assess good quality. We propose
AMR accuracy prediction as the task of predicting several metrics of
correctness for an automatically generated AMR parse - in absence of the
corresponding gold parse. We develop a neural end-to-end multi-output
regression model and perform three case studies: firstly, we evaluate the
model's capacity of predicting AMR parse accuracies and test whether it can
reliably assign high scores to gold parses. Secondly, we perform parse
selection based on predicted parse accuracies of candidate parses from
alternative systems, with the aim of improving overall results. Finally, we
predict system ranks for submissions from two AMR shared tasks on the basis of
their predicted parse accuracy averages. All experiments are carried out across
two different domains and show that our method is effective.Comment: accepted at *SEM 201
C-Pack: Packaged Resources To Advance General Chinese Embedding
We introduce C-Pack, a package of resources that significantly advance the
field of general Chinese embeddings. C-Pack includes three critical resources.
1) C-MTEB is a comprehensive benchmark for Chinese text embeddings covering 6
tasks and 35 datasets. 2) C-MTP is a massive text embedding dataset curated
from labeled and unlabeled Chinese corpora for training embedding models. 3)
C-TEM is a family of embedding models covering multiple sizes. Our models
outperform all prior Chinese text embeddings on C-MTEB by up to +10% upon the
time of the release. We also integrate and optimize the entire suite of
training methods for C-TEM. Along with our resources on general Chinese
embedding, we release our data and models for English text embeddings. The
English models achieve state-of-the-art performance on MTEB benchmark;
meanwhile, our released English data is 2 times larger than the Chinese data.
All these resources are made publicly available at
https://github.com/FlagOpen/FlagEmbedding
Conception: Multilingually-Enhanced, Human-Readable Concept Vector Representations
To date, the most successful word, word sense, and concept modelling techniques have used large corpora and knowledge resources to produce dense vector representations that capture semantic similarities in a relatively low-dimensional space. Most current approaches, however, suffer from a monolingual bias, with their strength depending on the amount of data available across languages. In this paper we address this issue and propose Conception, a novel technique for building language-independent vector representations of concepts which places multilinguality at its core while retaining explicit relationships between concepts. Our approach results in high-coverage representations that outperform the state of the art in multilingual and cross-lingual Semantic Word Similarity and Word Sense Disambiguation, proving particularly robust on low-resource languages. Conception – its software and the complete set of representations – is available at https://github.com/SapienzaNLP/conception
An empirical evaluation of AMR parsing for legal documents
Many approaches have been proposed to tackle the problem of Abstract Meaning
Representation (AMR) parsing, helps solving various natural language processing
issues recently. In our paper, we provide an overview of different methods in
AMR parsing and their performances when analyzing legal documents. We conduct
experiments of different AMR parsers on our annotated dataset extracted from
the English version of Japanese Civil Code. Our results show the limitations as
well as open a room for improvements of current parsing techniques when
applying in this complicated domain
- …