7 research outputs found

    The alignment of formal, structured and unstructured process descriptions

    Get PDF
    Nowadays organizations are experimenting a drift on the way processes are managed. On the one hand, formal notations like Petri nets or Business Process Model and Notation (BPMN) enable the unambiguous reasoning and automation of designed processes. This way of eliciting processes by manual design, which stemmed decades ago, will still be an important actor in the future. On the other hand, regulations require organizations to store their process executions in structured representations, so that they are known and can be analyzed. Finally, due to the different nature of stakeholders within an organization (ranging from the most technical members, e.g., developers, to less technical), textual descriptions of processes are also maintained to enable that everyone in the organization understands their processes. In this paper I will describe techniques for facilitating the interconnection between these three process representations. This requires interdisciplinary research to connect several fields: business process management, formal methods, natural language processing and process mining.Peer ReviewedPostprint (author's final draft

    Creació d’un corpus d’entailment en espanyol

    Get PDF
    Treballs Finals de Grau de Lingüística. Facultat de Filologia. Universitat de Barcelona, Curs: 2019-2020, Tutora: Mariona Taulé Delor[cat] L’estudi de la inferència en el llenguatge natural i la seva detecció pot suposar un avenç important en les tecnologies del llenguatge. Per aquest motiu, s’han creat corpus per a tasca de Natural Language Inference, que estudien l’entailment i la contradicció, si bé han estat centrats en l’anglès. Aquest treball presenta la metodologia duta a terme per a la creació i anotació manual de frases d’un corpus d’entailment en espanyol. Especialment, s’ha descrit el procés de creació de frases inferides de textos a través d’uns criteris que n’asseguren la seva riquesa, que hi hagi diferents nivells de complexitat i eviten que hi hagi informació esbiaixada. S’han creat 940 hipòtesis inferides a partir de 470 frases inicials, que van ser extretes de 6 articles de la Viquipèdia. El corpus d’entailment en espanyol forma part d’un corpus més gran de Natural Language Inference que s’està duent a terme en el marc d’un projecte que desenvolupa el grup de recerca CLiC (Centre de Llenguatge i Computació) de la Universitat de Barcelona.[eng] The study of Natural Language Inference and its detection may suppose an important advance in language technology. For this reason, many corpora regarding entailment and contradiction have been created, even though most of them have been written for English. This work presents the methodology for the creation and annotation of a corpus of entailed sentences in Spanish made by humans. Especially, the process of creation of entailed sentences from texts through some criteria that ensure its richness, different levels of complexity and lack of bias. 940 hypotheses have been entailed from 470 texts, which were taken from 6 Wikipedia articles. The corpus of entailment in Spanish is part of a larger corpus about Natural Language Inference, which is being developed in the project of the research group CLiC (Centre de Llenguatge i Computació) of the University of Barcelona

    Semeval-2012 Task 8: Cross-lingual Textual Entailment for Content Synchronization.

    No full text
    This paper presents the first round of the task on Cross-lingual Textual Entailment for Content Synchronization, organized within SemEval-2012. The task was designed to promote research on semantic inference over texts written in different languages, targeting at the same time a real application scenario. Participants were presented with datasets for different language pairs, where multi-directional entailment relations (“forward”, “backward”, “bidirectional”, “no entailment”) had to be identified. We report on the training and test data used for evaluation, the process of their creation, the participating systems (10 teams, 92 runs), the approaches adopted and the results achieved

    Quantifying Cross-lingual Semantic Similarity for Natural Language Processing Applications

    Get PDF
    Translation and cross-lingual access to information are key technologies in a global economy. Even though the quality of machine translation (MT) output is still far from the level of human translations, many real-world applications have emerged, for which MT can be employed. Machine translation supports human translators in computer-assisted translation (CAT), providing the opportunity to improve translation systems based on human interaction and feedback. Besides, many tasks that involve natural language processing operate in a cross-lingual setting, where there is no need for perfectly fluent translations and the transfer of meaning can be modeled by employing MT technology. This thesis describes cumulative work in the field of cross-lingual natural language processing in a user-oriented setting. A common denominator of the presented approaches is their anchoring in an alignment between texts in two different languages to quantify the similarity of their content

    Recognizing Textual Entailment Using Description Logic And Semantic Relatedness

    Get PDF
    Textual entailment (TE) is a relation that holds between two pieces of text where one reading the first piece can conclude that the second is most likely true. Accurate approaches for textual entailment can be beneficial to various natural language processing (NLP) applications such as: question answering, information extraction, summarization, and even machine translation. For this reason, research on textual entailment has attracted a significant amount of attention in recent years. A robust logical-based meaning representation of text is very hard to build, therefore the majority of textual entailment approaches rely on syntactic methods or shallow semantic alternatives. In addition, approaches that do use a logical-based meaning representation, require a large knowledge base of axioms and inference rules that are rarely available. The goal of this thesis is to design an efficient description logic based approach for recognizing textual entailment that uses semantic relatedness information as an alternative to large knowledge base of axioms and inference rules. In this thesis, we propose a description logic and semantic relatedness approach to textual entailment, where the type of semantic relatedness axioms employed in aligning the description logic representations are used as indicators of textual entailment. In our approach, the text and the hypothesis are first represented in description logic. The representations are enriched with additional semantic knowledge acquired by using the web as a corpus. The hypothesis is then merged into the text representation by learning semantic relatedness axioms on demand and a reasoner is then used to reason over the aligned representation. Finally, the types of axioms employed by the reasoner are used to learn if the text entails the hypothesis or not. To validate our approach we have implemented an RTE system named AORTE, and evaluated its performance on recognizing textual entailment using the fourth recognizing textual entailment challenge. Our approach achieved an accuracy of 68.8 on the two way task and 61.6 on the three way task which ranked the approach as 2nd when compared to the other participating runs in the same challenge. These results show that our description logical based approach can effectively be used to recognize textual entailment

    Language-Independent Methods for Identifying Cross-Lingual Similarity in Wikipedia

    Get PDF
    The diversity and richness of multilingual information available in Wikipedia have increased its significance as a language resource. The information extracted from Wikipedia has been utilised for many tasks, such as Statistical Machine Translation (SMT) and supporting multilingual information access. These tasks often rely on gathering data from articles that describe the same topic in different languages with the assumption that the contents are equivalent to each other. However, studies have shown that this might not be the case. Given the scale and use of Wikipedia, there is a need to develop an approach to measure cross-lingual similarity across Wikipedia. Many existing similarity measures, however, require the availability of "language-dependent" resources, such as dictionaries or Machine Translation (MT) systems, to translate documents into the same language prior to comparison. This presents some challenges for some language pairs, particularly those involving "under-resourced" languages where the required linguistic resources are not widely available. This study aims to present a solution to this problem by first, investigating cross-lingual similarity in Wikipedia, and secondly, developing "language-independent" approaches to measure cross-lingual similarity in Wikipedia. Two main contributions were provided in this work to identify cross-lingual similarity in Wikipedia. The first key contribution of this work is the development of a Wikipedia similarity corpus to understand the similarity characteristics of Wikipedia articles and to evaluate and compare various approaches for measuring cross-lingual similarity. The author elicited manual judgments from people with the appropriate language skills to assess similarities between a set of 800 pairs of interlanguage-linked articles. This corpus contains Wikipedia articles for eight language pairs (all pairs involving English and including well-resourced and under-resourced languages) of varying degrees of similarity. The second contribution of this work is the development of language-independent approaches to measure cross-lingual similarity in Wikipedia. The author investigated the utility of a number of "lightweight" language-independent features in four different experiments. The first experiment investigated the use of Wikipedia links to identify and align similar sentences, prior to aggregating the scores of the aligned sentences to represent the similarity of the document pair. The second experiment investigated the usefulness of content similarity features (such as char-n-gram overlap, links overlap, word overlap and word length ratio). The third experiment focused on analysing the use of structure similarity features (such as the ratio of section length, and similarity between the section headings). And finally, the fourth experiment investigates a combination of these features in a classification and a regression approach. Most of these features are language-independent whilst others utilised freely available resources (Wikipedia and Wiktionary) to assist in identifying overlapping information across languages. The approaches proposed are lightweight and can be applied to any languages written in Latin script; non-Latin script languages need to be transliterated prior to using these approaches. The performances of these approaches were evaluated against the human judgments in the similarity corpus. Overall, the proposed language-independent approaches achieved promising results. The best performance is achieved with the combination of all features in a classification and a regression approach. The results show that the Random Forest classifier was able to classify 81.38% document pairs correctly (F1 score=0.79) in a binary classification problem, 50.88% document pairs correctly (F1 score=0.71) in a 5-class classification problem, and RMSE of 0.73 in a regression approach. These results are significantly higher compared to a classifier utilising machine translation and cosine similarity of the tf-idf scores. These findings showed that language-independent approaches can be used to measure cross-lingual similarity between Wikipedia articles. Future work is needed to evaluate these approaches in more languages and to incorporate more features
    corecore