41 research outputs found
Exact Hard Monotonic Attention for Character-Level Transduction
Many common character-level, string-to string transduction tasks, e.g.,
grapheme-tophoneme conversion and morphological inflection, consist almost
exclusively of monotonic transductions. However, neural sequence-to sequence
models that use non-monotonic soft attention often outperform popular monotonic
models. In this work, we ask the following question: Is monotonicity really a
helpful inductive bias for these tasks? We develop a hard attention
sequence-to-sequence model that enforces strict monotonicity and learns a
latent alignment jointly while learning to transduce. With the help of dynamic
programming, we are able to compute the exact marginalization over all
monotonic alignments. Our models achieve state-of-the-art performance on
morphological inflection. Furthermore, we find strong performance on two other
character-level transduction tasks. Code is available at
https://github.com/shijie-wu/neural-transducer.Comment: ACL 201
Hard Non-Monotonic Attention for Character-Level Transduction
Character-level string-to-string transduction is an important component of
various NLP tasks. The goal is to map an input string to an output string,
where the strings may be of different lengths and have characters taken from
different alphabets. Recent approaches have used sequence-to-sequence models
with an attention mechanism to learn which parts of the input string the model
should focus on during the generation of the output string. Both soft attention
and hard monotonic attention have been used, but hard non-monotonic attention
has only been used in other sequence modeling tasks such as image captioning
and has required a stochastic approximation to compute the gradient. In this
work, we introduce an exact, polynomial-time algorithm for marginalizing over
the exponential number of non-monotonic alignments between two strings, showing
that hard attention models can be viewed as neural reparameterizations of the
classical IBM Model 1. We compare soft and hard non-monotonic attention
experimentally and find that the exact algorithm significantly improves
performance over the stochastic approximation and outperforms soft attention.Comment: Published in EMNLP 201
Applying the Transformer to Character-level Transduction
The transformer has been shown to outperform recurrent neural network-based
sequence-to-sequence models in various word-level NLP tasks. Yet for
character-level transduction tasks, e.g. morphological inflection generation
and historical text normalization, there are few works that outperform
recurrent models using the transformer. In an empirical study, we uncover that,
in contrast to recurrent sequence-to-sequence models, the batch size plays a
crucial role in the performance of the transformer on character-level tasks,
and we show that with a large enough batch size, the transformer does indeed
outperform recurrent models. We also introduce a simple technique to handle
feature-guided character-level transduction that further improves performance.
With these insights, we achieve state-of-the-art performance on morphological
inflection and historical text normalization. We also show that the transformer
outperforms a strong baseline on two other character-level transduction tasks:
grapheme-to-phoneme conversion and transliteration.Comment: EACL 202
On Biasing Transformer Attention Towards Monotonicity
Many sequence-to-sequence tasks in natural language processing are roughly
monotonic in the alignment between source and target sequence, and previous
work has facilitated or enforced learning of monotonic attention behavior via
specialized attention functions or pretraining. In this work, we introduce a
monotonicity loss function that is compatible with standard attention
mechanisms and test it on several sequence-to-sequence tasks:
grapheme-to-phoneme conversion, morphological inflection, transliteration, and
dialect normalization. Experiments show that we can achieve largely monotonic
behavior. Performance is mixed, with larger gains on top of RNN baselines.
General monotonicity does not benefit transformer multihead attention, however,
we see isolated improvements when only a subset of heads is biased towards
monotonic behavior.Comment: To be published in: Proceedings of the 2021 Conference of the North
American Chapter of the Association for Computational Linguistics: Human
Language Technologies (NAACL-HLT 2021
Character-level and syntax-level models for low-resource and multilingual natural language processing
There are more than 7000 languages in the world, but only a small portion of them benefit from Natural Language Processing resources and models. Although languages generally present different characteristics, âcross-lingual bridgesâ can be exploited, such as transliteration signals and word alignment links. Such information, together with the availability of multiparallel corpora and the urge to overcome language barriers, motivates us to build models that represent more of the worldâs languages.
This thesis investigates cross-lingual links for improving the processing of low-resource languages with language-agnostic models at the character and syntax level. Specifically, we propose to (i) use orthographic similarities and transliteration between Named Entities and rare words in different languages to improve the construction of Bilingual Word Embeddings (BWEs) and named entity resources, and (ii) exploit multiparallel corpora for projecting labels from high- to low-resource languages, thereby gaining access to weakly supervised processing methods for the latter.
In the first publication, we describe our approach for improving the translation of rare words and named entities for the Bilingual Dictionary Induction (BDI) task, using orthography and transliteration information. In our second work, we tackle BDI by enriching BWEs with orthography embeddings and a number of other features, using our classification-based system to overcome script differences among languages. The third publication describes cheap cross-lingual signals that should be considered when building mapping approaches for BWEs since they are simple to extract, effective for bootstrapping the mapping of BWEs, and overcome the failure of unsupervised methods. The fourth paper shows our approach for extracting a named entity resource for 1340 languages, including very low-resource languages from all major areas of linguistic diversity. We exploit parallel corpus statistics and transliteration models and obtain improved performance over prior work. Lastly, the fifth work models annotation projection as a graph-based label propagation problem for the part of speech tagging task. Part of speech models trained on our labeled sets outperform prior work for low-resource languages like Bambara (an African language spoken in Mali), Erzya (a Uralic language spoken in Russiaâs Republic of Mordovia), Manx (the Celtic language of the Isle of Man), and Yoruba (a Niger-Congo language spoken in Nigeria and surrounding countries)
Use of Transformer-Based Models for Word-Level Transliteration of the Book of the Dean of Lismore
The Book of the Dean of Lismore (BDL) is a 16th-century Scottish Gaelic manuscript written in a non-standard orthography. In this work, we outline the problem of transliterating the text of the BDL into a standardised orthography, and perform exploratory experiments using Transformer-based models for this task. In particular, we focus on the task of word-level transliteration, and achieve a character-level BLEU score of 54.15 with our best model, a BART architecture pre-trained on the text of Scottish Gaelic Wikipedia and then fine-tuned on around 2,000 word-level parallel examples. Our initial experiments give promising results, but we highlight the shortcomings of our model, and discuss directions for future work
Smart Compliance or How New Technologies Change Customer Identification Mechanisms in Banking
Modern banking undoubtedly engages with the world of high technologies. The potential of big data, artificial intelligence and blockchain technology is realized more and more as an opportunity by credit institutions in order to remain competitive against fast entering FinTech sector. Apart from their commercial application modern technologies appear to be an important factor for improvement of banks` systems for customer identification. The article examines the possibilities for adoption of smart technologies in different customer identification activities outlining several perspectives for their future development