2,330 research outputs found
Unfolding and Shrinking Neural Machine Translation Ensembles
Ensembling is a well-known technique in neural machine translation (NMT) to
improve system performance. Instead of a single neural net, multiple neural
nets with the same topology are trained separately, and the decoder generates
predictions by averaging over the individual models. Ensembling often improves
the quality of the generated translations drastically. However, it is not
suitable for production systems because it is cumbersome and slow. This work
aims to reduce the runtime to be on par with a single system without
compromising the translation quality. First, we show that the ensemble can be
unfolded into a single large neural network which imitates the output of the
ensemble system. We show that unfolding can already improve the runtime in
practice since more work can be done on the GPU. We proceed by describing a set
of techniques to shrink the unfolded network by reducing the dimensionality of
layers. On Japanese-English we report that the resulting network has the size
and decoding speed of a single NMT network but performs on the level of a
3-ensemble system.Comment: Accepted at EMNLP 201
The Role of Surprise in Hindsight Bias – A Metacognitive Model of Reduced and Reversed Hindsight Bias
Hindsight bias is the well researched phenomenon that people falsely believe that they would have correctly predicted the outcome of an event once it is known. In recent years, several authors have doubted the ubiquity of the effect and have reported a reversal under certain conditions. This article presents an integrative model on the role of surprise as one factor explaining the malleability of the hindsight bias. Three ways in which surprise influences the reconstruction of pre-outcome predictions are assumed: (1) Surprise is used as direct metacognitive heuristic to estimate the distance between outcome and prediction. (2) Surprise triggers a deliberate sense-making process, and (3) also biases this process by enhancing the retrieval of surprise-congruent information and expectancy-based hypothesis testing.
Recommended from our members
The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction
With the advent of deep learning, research in many areas of machine learning is converging towards the same set of methods and models. For example, long short-term memory networks are not only popular for various tasks in natural language processing (NLP) such as speech recognition, machine translation, handwriting recognition, syntactic parsing, etc., but they are also applicable to seemingly unrelated fields such as robot control, time series prediction, and bioinformatics. Recent advances in contextual word embeddings like BERT boast with achieving state-of-the-art results on 11 NLP tasks with the same model. Before deep learning, a speech recognizer and a syntactic parser used to have little in common as systems were much more tailored towards the task at hand.
At the core of this development is the tendency to view each task as yet another data mapping problem, neglecting the particular characteristics and (soft) requirements tasks often have in practice. This often goes along with a sharp break of deep learning methods with previous research in the specific area. This work can be understood as an antithesis to this paradigm. We show how traditional symbolic statistical machine translation models can still improve neural machine translation (NMT) while reducing the risk for common pathologies of NMT such as hallucinations and neologisms. Other external symbolic models such as spell checkers and morphology databases help neural grammatical error correction. We also focus on language models that often do not play a role in vanilla end-to-end approaches and apply them in different ways to word reordering, grammatical error correction, low-resource NMT, and document-level NMT. Finally, we demonstrate the benefit of hierarchical models in sequence-to-sequence prediction. Hand-engineered covering grammars are effective in preventing catastrophic errors in neural text normalization systems. Our operation sequence model for interpretable NMT represents translation as a series of actions that modify the translation state, and can also be seen as derivation in a formal grammar.EPSRC grant EP/L027623/1
EPSRC Tier-2 capital grant EP/P020259/
Foreword: Tatars in Finland in the Transnational Context of the Baltic Sea Region
The Tatar diaspora in Finland has attracted researchers for over a century, but studies traditionally focus on topics such as origins and general Tatar history, religion, identity or language. One of the most important aspects of research on Tatars both historically and today, however, is the transnational context. Migrating from villages in Nizhny Novgorod province, often via the Russian capitalSaint Petersburg at the end of the nineteenth century, the forming Tatar diaspora communities in the Baltic Sea region maintained, developed and extended their previous networks and also created new connections over national borders despite periods of political difficulties. New research about Tatars in the Baltic Sea region – with the focal point of the Tatars in Finland and their connectionschiefly in Estonia, Russia and Sweden – was presented during a seminar called Tatars in Finland in the Transnational Context of the Baltic Sea Region at the University of Helsinki in October 2018. Scholars from Finland, Sweden, Russia, Estonia and Hungary spoke about the past and present of the diaspora. A result of the seminar, this special issue of Studia Orientalia Electronica is dedicated to new research on Tatars in a transnational context
Folk Knowledge in Southern Siberia in the 1770s: Johan Peter Falck’s Ethnobiological Observations
The southern Siberian Turkic groups were mostly unknown to outsiders when the Swedish scientist Johan Peter Falck (1732–1774) visited their settlements in the early 1770s. Falck led one of the expeditions dispatched between 1768 and 1774 by the Russian Academy of Sciences to different parts of the Russian Empire. As a botanist, zoologist, ethnographer and linguist, during his jourÂneys he recorded information not only about the environment but also about the peoples he met and their political and social organisation, as well as ethnographic data. Falck’s rich and detailed travelogue was published posthumously and soon forgotten, while the rich data remained unatÂtended for almost two centuries. In recent years, mainly biologists have rediscovered the materials, yet ethnobiological data is also plentiful. Knowledge about the environment is crucial for survival, and the complex relationship between humans and their environment is often reflected in names given to living organisms and places or in perceptions of the surroundings. This article focuses on Siberian Turkic folk knowledge among the Chulym Tatars, Kacha, Soyan, and Teleut, based on the observations by Johan Peter Falck in the 1770s. Ethnobiological and linguistic materials are used in an effort to at least partly reconstruct the cognitive world in which these peoples lived and created their concepts of the environment. The article is a preliminary contribution to the study of historical ethnoecology and ethnobiology
The role of surprise in hindsight bias : a metacognitive model of reduced and reversed hindsight bias
Hindsight bias is the well researched phenomenon that people falsely believe that they would have correctly predicted the outcome of an event once it is known. In recent years, several authors have doubted the ubiquity of the effect and have reported a reversal under certain conditions. This article presents an integrative model on the role of surprise as one factor explaining the malleability of the hindsight bias. Three ways in which surprise influences the reconstruction of pre-outcome predictions are assumed: (1) Surprise is used as direct metacognitive heuristic to estimate the distance between outcome and prediction. (2) Surprise triggers a deliberate sense-making process, and (3) also biases this process by enhancing the retrieval of surprise-congruent information and expectancy-based hypothesis testing
- …