53 research outputs found
The alignment of formal, structured and unstructured process descriptions
Nowadays organizations are experimenting a drift on the way processes are managed. On the one hand, formal notations like Petri nets or Business Process Model and Notation (BPMN) enable the unambiguous reasoning and automation of designed processes. This way of eliciting processes by manual design, which stemmed decades ago, will still be an important actor in the future. On the other hand, regulations require organizations to store their process executions in structured representations, so that they are known and can be analyzed. Finally, due to the different nature of stakeholders within an organization (ranging from the most technical members, e.g., developers, to less technical), textual descriptions of processes are also maintained to enable that everyone in the organization understands their processes.
In this paper I will describe techniques for facilitating the interconnection between these three process representations. This requires interdisciplinary research to connect several fields: business process management, formal methods, natural language processing and process mining.Peer ReviewedPostprint (author's final draft
Lessons learned in multilingual grounded language learning
Recent work has shown how to learn better visual-semantic embeddings by
leveraging image descriptions in more than one language. Here, we investigate
in detail which conditions affect the performance of this type of grounded
language learning model. We show that multilingual training improves over
bilingual training, and that low-resource languages benefit from training with
higher-resource languages. We demonstrate that a multilingual model can be
trained equally well on either translations or comparable sentence pairs, and
that annotating the same set of images in multiple language enables further
improvements via an additional caption-caption ranking objective.Comment: CoNLL 201
Semantic sentence similarity: size does not always matter
This study addresses the question whether visually grounded speech
recognition (VGS) models learn to capture sentence semantics without access to
any prior linguistic knowledge. We produce synthetic and natural spoken
versions of a well known semantic textual similarity database and show that our
VGS model produces embeddings that correlate well with human semantic
similarity judgements. Our results show that a model trained on a small
image-caption database outperforms two models trained on much larger databases,
indicating that database size is not all that matters. We also investigate the
importance of having multiple captions per image and find that this is indeed
helpful even if the total number of images is lower, suggesting that
paraphrasing is a valuable learning signal. While the general trend in the
field is to create ever larger datasets to train models on, our findings
indicate other characteristics of the database can just as important important.Comment: This paper has been accepted at Interspeech 2021 where it will be
presented and appear in the conference proceedings in September 202
Automatic detection of parallel sentences from comparable biomedical texts
International audienceParallel sentences provide semantically similar information which can vary on a given dimension, such as language or register. Parallel sentences with register variation (like expert and non-expert documents) can be exploited for the automatic text simplification. The aim of automatic text simplification is to better access and understand a given information. In the biomedical field, simplification may permit patients to understand medical and health texts. Yet, there is currently no such available resources. We propose to exploit comparable corpora which are distinguished by their registers (specialized and simplified versions) to detect and align parallel sentences. These corpora are in French and are related to the biomedical area. Our purpose is to state whether a given pair of specialized and simplified sentences is to be aligned or not. Manually created reference data show 0.76 inter-annotator agreement. We treat this task as binary classification (alignment/non-alignment). We perform experiments on balanced and imbalanced data. The results on balanced data reach up to 0.96 F-Measure. On imbalanced data, the results are lower but remain competitive when using classification models train on balanced data. Besides, among the three datasets exploited (se-mantic equivalence and inclusions), the detection of equivalence pairs is more efficient
Generative or Contrastive? Phrase Reconstruction for Better Sentence Representation Learning
Though offering amazing contextualized token-level representations, current
pre-trained language models actually take less attention on acquiring
sentence-level representation during its self-supervised pre-training. If
self-supervised learning can be distinguished into two subcategories,
generative and contrastive, then most existing studies show that sentence
representation learning may more benefit from the contrastive methods but not
the generative methods. However, contrastive learning cannot be well compatible
with the common token-level generative self-supervised learning, and does not
guarantee good performance on downstream semantic retrieval tasks. Thus, to
alleviate such obvious inconveniences, we instead propose a novel generative
self-supervised learning objective based on phrase reconstruction. Empirical
studies show that our generative learning may yield powerful enough sentence
representation and achieve performance in Sentence Textual Similarity (STS)
tasks on par with contrastive learning. Further, in terms of unsupervised
setting, our generative method outperforms previous state-of-the-art SimCSE on
the benchmark of downstream semantic retrieval tasks.Comment: Preprin
- …