Search CORE

41 research outputs found

Bivariate Beta-LSTM

Author: Jang JoonHo
Moon Il-Chul
Shin Seung jae
Song Kyungwoo
Publication venue
Publication date: 16/11/2019
Field of study

Long Short-Term Memory (LSTM) infers the long term dependency through a cell state maintained by the input and the forget gate structures, which models a gate output as a value in [0,1] through a sigmoid function. However, due to the graduality of the sigmoid function, the sigmoid gate is not flexible in representing multi-modality or skewness. Besides, the previous models lack modeling on the correlation between the gates, which would be a new method to adopt inductive bias for a relationship between previous and current input. This paper proposes a new gate structure with the bivariate Beta distribution. The proposed gate structure enables probabilistic modeling on the gates within the LSTM cell so that the modelers can customize the cell state flow with priors and distributions. Moreover, we theoretically show the higher upper bound of the gradient compared to the sigmoid function, and we empirically observed that the bivariate Beta distribution gate structure provides higher gradient values in training. We demonstrate the effectiveness of bivariate Beta gate structure on the sentence classification, image classification, polyphonic music modeling, and image caption generation.Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

The alignment of formal, structured and unstructured process descriptions

Author: Carmona Vargas Josep
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Nowadays organizations are experimenting a drift on the way processes are managed. On the one hand, formal notations like Petri nets or Business Process Model and Notation (BPMN) enable the unambiguous reasoning and automation of designed processes. This way of eliciting processes by manual design, which stemmed decades ago, will still be an important actor in the future. On the other hand, regulations require organizations to store their process executions in structured representations, so that they are known and can be analyzed. Finally, due to the different nature of stakeholders within an organization (ranging from the most technical members, e.g., developers, to less technical), textual descriptions of processes are also maintained to enable that everyone in the organization understands their processes. In this paper I will describe techniques for facilitating the interconnection between these three process representations. This requires interdisciplinary research to connect several fields: business process management, formal methods, natural language processing and process mining.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Copy mechanism and tailored training for character-based data-to-text generation

Author: Bonetta Giovanni
Cancelliere Rossella
Gallinari Patrick
Roberti Marco
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/05/2020
Field of study

In the last few years, many different methods have been focusing on using deep recurrent neural networks for natural language generation. The most widely used sequence-to-sequence neural methods are word-based: as such, they need a pre-processing step called delexicalization (conversely, relexicalization) to deal with uncommon or unknown words. These forms of processing, however, give rise to models that depend on the vocabulary used and are not completely neural. In this work, we present an end-to-end sequence-to-sequence model with attention mechanism which reads and generates at a character level, no longer requiring delexicalization, tokenization, nor even lowercasing. Moreover, since characters constitute the common "building blocks" of every text, it also allows a more general approach to text generation, enabling the possibility to exploit transfer learning for training. These skills are obtained thanks to two major features: (i) the possibility to alternate between the standard generation mechanism and a copy one, which allows to directly copy input facts to produce outputs, and (ii) the use of an original training pipeline that further improves the quality of the generated texts. We also introduce a new dataset called E2E+, designed to highlight the copying capabilities of character-based models, that is a modified version of the well-known E2E dataset used in the E2E Challenge. We tested our model according to five broadly accepted metrics (including the widely used BLEU), showing that it yields competitive performance with respect to both character-based and word-based approaches.Comment: ECML-PKDD 2019 (Camera ready version

arXiv.org e-Print Archive

Crossref

Extended Parallel Corpus for Amharic-English Machine Translation

Author: Bati Tesfaye Bayu
Gezmu Andargachew Mekonnen
Nürnberger Andreas
Publication venue
Publication date: 25/06/2021
Field of study

This paper describes the acquisition, preprocessing, segmentation, and alignment of an Amharic-English parallel corpus. It will be useful for machine translation of an under-resourced language, Amharic. The corpus is larger than previously compiled corpora; it is released for research purposes. We trained neural machine translation and phrase-based statistical machine translation models using the corpus. In the automatic evaluation, neural machine translation models outperform phrase-based statistical machine translation models.Comment: Accepted to 2nd AfricanNLP workshop at EACL 202

arXiv.org e-Print Archive

Root-Weighted Tree Automata and their Applications to Tree Kernels

Author: Mignot Ludovic
Ouali-Sebti Nadia
Ziadi Djelloul
Publication venue
Publication date: 01/01/2015
Field of study

In this paper, we define a new kind of weighted tree automata where the weights are only supported by final states. We show that these automata are sequentializable and we study their closures under classical regular and algebraic operations. We then use these automata to compute the subtree kernel of two finite tree languages in an efficient way. Finally, we present some perspectives involving the root-weighted tree automata

arXiv.org e-Print Archive

HAL - Normandie Université