32 research outputs found
The Boundaries of Meaning: A Case Study in Neural Machine Translation
The success of deep learning in natural language processing raises intriguing questions about the nature of linguistic meaning and ways in which it can be processed by natural and artificial systems. One such question has to do with subword segmentation algorithms widely employed in language modeling, machine translation, and other tasks since 2016. These algorithms often cut words into semantically opaque pieces, such as ‘period’, ‘on’, ‘t’, and ‘ist’ in ‘period|on|t|ist’. The system then represents the resulting segments in a dense vector space, which is expected to model grammatical relations among them. This representation may in turn be used to map ‘period|on|t|ist’ (English) to ‘par|od|ont|iste’ (French). Thus, instead of being modeled at the lexical level, translation is reformulated more generally as the task of learning the best bilingual mapping between the sequences of subword segments of two languages; and sometimes even between pure character sequences: ‘p|e|r|i|o|d|o|n|t|i|s|t’ → ‘p|a|r|o|d|o|n|t|i|s|t|e’. Such subword segmentations and alignments are at work in highly efficient end-to-end machine translation systems, despite their allegedly opaque nature. The computational value of such processes is unquestionable. But do they have any linguistic or philosophical plausibility? I attempt to cast light on this question by reviewing the relevant details of the subword segmentation algorithms and by relating them to important philosophical and linguistic debates, in the spirit of making artificial intelligence more transparent and explainable
Recommended from our members
The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction
With the advent of deep learning, research in many areas of machine learning is converging towards the same set of methods and models. For example, long short-term memory networks are not only popular for various tasks in natural language processing (NLP) such as speech recognition, machine translation, handwriting recognition, syntactic parsing, etc., but they are also applicable to seemingly unrelated fields such as robot control, time series prediction, and bioinformatics. Recent advances in contextual word embeddings like BERT boast with achieving state-of-the-art results on 11 NLP tasks with the same model. Before deep learning, a speech recognizer and a syntactic parser used to have little in common as systems were much more tailored towards the task at hand.
At the core of this development is the tendency to view each task as yet another data mapping problem, neglecting the particular characteristics and (soft) requirements tasks often have in practice. This often goes along with a sharp break of deep learning methods with previous research in the specific area. This work can be understood as an antithesis to this paradigm. We show how traditional symbolic statistical machine translation models can still improve neural machine translation (NMT) while reducing the risk for common pathologies of NMT such as hallucinations and neologisms. Other external symbolic models such as spell checkers and morphology databases help neural grammatical error correction. We also focus on language models that often do not play a role in vanilla end-to-end approaches and apply them in different ways to word reordering, grammatical error correction, low-resource NMT, and document-level NMT. Finally, we demonstrate the benefit of hierarchical models in sequence-to-sequence prediction. Hand-engineered covering grammars are effective in preventing catastrophic errors in neural text normalization systems. Our operation sequence model for interpretable NMT represents translation as a series of actions that modify the translation state, and can also be seen as derivation in a formal grammar.EPSRC grant EP/L027623/1
EPSRC Tier-2 capital grant EP/P020259/
Recent Trends in Computational Intelligence
Traditional models struggle to cope with complexity, noise, and the existence of a changing environment, while Computational Intelligence (CI) offers solutions to complicated problems as well as reverse problems. The main feature of CI is adaptability, spanning the fields of machine learning and computational neuroscience. CI also comprises biologically-inspired technologies such as the intellect of swarm as part of evolutionary computation and encompassing wider areas such as image processing, data collection, and natural language processing. This book aims to discuss the usage of CI for optimal solving of various applications proving its wide reach and relevance. Bounding of optimization methods and data mining strategies make a strong and reliable prediction tool for handling real-life applications
Recommended from our members
Modality Bridging and Unified Multimodal Understanding
Multimodal understanding is a vast realm of research that covers multiple disciplines. Hence, it requires a correct understanding of the goal in a generic multimodal understanding research study. The definition of modalities of interest is important since each modality requires its own considerations. On the other hand, it is important to understand whether these modalities should be complimentary to each other or have significant overlap in terms of the information they carry. For example, most of the modalities in biological signals do not have significant overlap with each other, yet they can be used together to improve the range and accuracy of diagnoses. An extreme example of two modalities that have significant overlap is an instructional video and its corresponding instructions in detailed texts. In this study, we focus on multimedia, which includes image, video, audio, and text about real world everyday events, mostly focused on human activities.
We narrow our study to the important direction of common space learning since we want to bridge between different modalities using the overlap that a given pair of modalities have.There are multiple applications which require a strong common space to be able to perform desirably. We choose image-text grounding, video-audio autoencoding, video-conditioned text generation, and video-audio-text common space learning for semantic encoding. We examine multiple ideas in each direction and achieve important conclusions. In image-text grounding, we learn that different levels of semantic representations are helpful to achieve a thorough common space that is representative of two modalities. In video-audio autoencoding, we observe that reconstruction objectives can help with a representative common space. Moreover, there is an inherent problem when dealing with multiple modalities at the same time, and that is different levels of granularity. For example, the sampling rate and granularity of video is much higher and more complicated compared to audio. Hence, it might be more helpful to find a more semantically abstracted common space which does not carry redundant details, especially considering the temporal aspect of video and audio modalities. In video-conditioned text generation, we examine the possibility of encoding a video sequence using a Transformer (and later decoding the captions using a Transformer decoder). We further explore the possibility of learning latent states for storing real-world concepts without supervision.
Using the observations from these three directions, we propose a unified pipeline based on the Transformer architecture to examine whether it is possible to train a (true) unified pipeline on raw multimodal data without supervision in an end-to-end fashion. This pipeline eliminates ad-hoc feature extraction methods and is independent of any previously trained network, making it simpler and easier to use. Furthermore, since it only utilizes one architecture, which enables us to move towards even more simplicity. Hence, we take an ambitious step forward and further unify this pipeline by sharing only one backbone among four major modalities: image, video, audio, and text. We show that it is not only possible to achieve this goal, but we further show the inherent benefits of such pipeline. We propose a new research direction under multimodal understanding and that is Unified Multimodal Understanding. This study is the first that examines this idea and further pushes its limit by scaling up to multiple tasks, modalities, and datasets.
In a nutshell, we examine different possibilities for bridging between a pair of modalities in different applications and observe several limitations and propose solutions for them. Using these observations, we provide a unified and strong pipeline for learning a common space which could be used for many applications. We show that our approaches perform desirably and significantly outperform state-of-the-art in different downstream tasks. We set a new baseline with competitive performance for our proposed research direction, Unified Multimodal Understanding
On understanding character-level models for representing morphology
Morphology is the study of how words are composed of smaller units of meaning
(morphemes). It allows humans to create, memorize, and understand words in their
language. To process and understand human languages, we expect our computational
models to also learn morphology. Recent advances in neural network models provide
us with models that compose word representations from smaller units like word segments,
character n-grams, or characters. These so-called subword unit models do not
explicitly model morphology yet they achieve impressive performance across many
multilingual NLP tasks, especially on languages with complex morphological processes.
This thesis aims to shed light on the following questions: (1) What do subword
unit models learn about morphology? (2) Do we still need prior knowledge about
morphology? (3) How do subword unit models interact with morphological typology?
First, we systematically compare various subword unit models and study their performance
across language typologies. We show that models based on characters are
particularly effective because they learn orthographic regularities which are consistent
with morphology. To understand which aspects of morphology are not captured by
these models, we compare them with an oracle with access to explicit morphological
analysis. We show that in the case of dependency parsing, character-level models
are still poor in representing words with ambiguous analyses. We then demonstrate
how explicit modeling of morphology is helpful in such cases. Finally, we study how
character-level models perform in low resource, cross-lingual NLP scenarios, whether
they can facilitate cross-linguistic transfer of morphology across related languages.
While we show that cross-lingual character-level models can improve low-resource
NLP performance, our analysis suggests that it is mostly because of the structural
similarities between languages and we do not yet find any strong evidence of crosslinguistic
transfer of morphology. This thesis presents a careful, in-depth study and
analyses of character-level models and their relation to morphology, providing insights
and future research directions on building morphologically-aware computational NLP
models
One Model to Rule them all: Multitask and Multilingual Modelling for Lexical Analysis
When learning a new skill, you take advantage of your preexisting skills and
knowledge. For instance, if you are a skilled violinist, you will likely have
an easier time learning to play cello. Similarly, when learning a new language
you take advantage of the languages you already speak. For instance, if your
native language is Norwegian and you decide to learn Dutch, the lexical overlap
between these two languages will likely benefit your rate of language
acquisition. This thesis deals with the intersection of learning multiple tasks
and learning multiple languages in the context of Natural Language Processing
(NLP), which can be defined as the study of computational processing of human
language. Although these two types of learning may seem different on the
surface, we will see that they share many similarities.
The traditional approach in NLP is to consider a single task for a single
language at a time. However, recent advances allow for broadening this
approach, by considering data for multiple tasks and languages simultaneously.
This is an important approach to explore further as the key to improving the
reliability of NLP, especially for low-resource languages, is to take advantage
of all relevant data whenever possible. In doing so, the hope is that in the
long term, low-resource languages can benefit from the advances made in NLP
which are currently to a large extent reserved for high-resource languages.
This, in turn, may then have positive consequences for, e.g., language
preservation, as speakers of minority languages will have a lower degree of
pressure to using high-resource languages. In the short term, answering the
specific research questions posed should be of use to NLP researchers working
towards the same goal.Comment: PhD thesis, University of Groninge
Artificial Intelligence for Multimedia Signal Processing
Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining