41 research outputs found
Bivariate Beta-LSTM
Long Short-Term Memory (LSTM) infers the long term dependency through a cell
state maintained by the input and the forget gate structures, which models a
gate output as a value in [0,1] through a sigmoid function. However, due to the
graduality of the sigmoid function, the sigmoid gate is not flexible in
representing multi-modality or skewness. Besides, the previous models lack
modeling on the correlation between the gates, which would be a new method to
adopt inductive bias for a relationship between previous and current input.
This paper proposes a new gate structure with the bivariate Beta distribution.
The proposed gate structure enables probabilistic modeling on the gates within
the LSTM cell so that the modelers can customize the cell state flow with
priors and distributions. Moreover, we theoretically show the higher upper
bound of the gradient compared to the sigmoid function, and we empirically
observed that the bivariate Beta distribution gate structure provides higher
gradient values in training. We demonstrate the effectiveness of bivariate Beta
gate structure on the sentence classification, image classification, polyphonic
music modeling, and image caption generation.Comment: AAAI 202
The alignment of formal, structured and unstructured process descriptions
Nowadays organizations are experimenting a drift on the way processes are managed. On the one hand, formal notations like Petri nets or Business Process Model and Notation (BPMN) enable the unambiguous reasoning and automation of designed processes. This way of eliciting processes by manual design, which stemmed decades ago, will still be an important actor in the future. On the other hand, regulations require organizations to store their process executions in structured representations, so that they are known and can be analyzed. Finally, due to the different nature of stakeholders within an organization (ranging from the most technical members, e.g., developers, to less technical), textual descriptions of processes are also maintained to enable that everyone in the organization understands their processes.
In this paper I will describe techniques for facilitating the interconnection between these three process representations. This requires interdisciplinary research to connect several fields: business process management, formal methods, natural language processing and process mining.Peer ReviewedPostprint (author's final draft
Copy mechanism and tailored training for character-based data-to-text generation
In the last few years, many different methods have been focusing on using
deep recurrent neural networks for natural language generation. The most widely
used sequence-to-sequence neural methods are word-based: as such, they need a
pre-processing step called delexicalization (conversely, relexicalization) to
deal with uncommon or unknown words. These forms of processing, however, give
rise to models that depend on the vocabulary used and are not completely
neural.
In this work, we present an end-to-end sequence-to-sequence model with
attention mechanism which reads and generates at a character level, no longer
requiring delexicalization, tokenization, nor even lowercasing. Moreover, since
characters constitute the common "building blocks" of every text, it also
allows a more general approach to text generation, enabling the possibility to
exploit transfer learning for training. These skills are obtained thanks to two
major features: (i) the possibility to alternate between the standard
generation mechanism and a copy one, which allows to directly copy input facts
to produce outputs, and (ii) the use of an original training pipeline that
further improves the quality of the generated texts.
We also introduce a new dataset called E2E+, designed to highlight the
copying capabilities of character-based models, that is a modified version of
the well-known E2E dataset used in the E2E Challenge. We tested our model
according to five broadly accepted metrics (including the widely used BLEU),
showing that it yields competitive performance with respect to both
character-based and word-based approaches.Comment: ECML-PKDD 2019 (Camera ready version
Extended Parallel Corpus for Amharic-English Machine Translation
This paper describes the acquisition, preprocessing, segmentation, and
alignment of an Amharic-English parallel corpus. It will be useful for machine
translation of an under-resourced language, Amharic. The corpus is larger than
previously compiled corpora; it is released for research purposes. We trained
neural machine translation and phrase-based statistical machine translation
models using the corpus. In the automatic evaluation, neural machine
translation models outperform phrase-based statistical machine translation
models.Comment: Accepted to 2nd AfricanNLP workshop at EACL 202
Root-Weighted Tree Automata and their Applications to Tree Kernels
In this paper, we define a new kind of weighted tree automata where the
weights are only supported by final states. We show that these automata are
sequentializable and we study their closures under classical regular and
algebraic operations. We then use these automata to compute the subtree kernel
of two finite tree languages in an efficient way. Finally, we present some
perspectives involving the root-weighted tree automata