1,924 research outputs found
Handling Homographs in Neural Machine Translation
Homographs, words with different meanings but the same surface form, have
long caused difficulty for machine translation systems, as it is difficult to
select the correct translation based on the context. However, with the advent
of neural machine translation (NMT) systems, which can theoretically take into
account global sentential context, one may hypothesize that this problem has
been alleviated. In this paper, we first provide empirical evidence that
existing NMT systems in fact still have significant problems in properly
translating ambiguous words. We then proceed to describe methods, inspired by
the word sense disambiguation literature, that model the context of the input
word with context-aware word embeddings that help to differentiate the word
sense be- fore feeding it into the encoder. Experiments on three language pairs
demonstrate that such models improve the performance of NMT systems both in
terms of BLEU score and in the accuracy of translating homographs.Comment: NAACL201
Context Mover's Distance & Barycenters: Optimal Transport of Contexts for Building Representations
We present a framework for building unsupervised representations of entities
and their compositions, where each entity is viewed as a probability
distribution rather than a vector embedding. In particular, this distribution
is supported over the contexts which co-occur with the entity and are embedded
in a suitable low-dimensional space. This enables us to consider representation
learning from the perspective of Optimal Transport and take advantage of its
tools such as Wasserstein distance and barycenters. We elaborate how the method
can be applied for obtaining unsupervised representations of text and
illustrate the performance (quantitatively as well as qualitatively) on tasks
such as measuring sentence similarity, word entailment and similarity, where we
empirically observe significant gains (e.g., 4.1% relative improvement over
Sent2vec, GenSen).
The key benefits of the proposed approach include: (a) capturing uncertainty
and polysemy via modeling the entities as distributions, (b) utilizing the
underlying geometry of the particular task (with the ground cost), (c)
simultaneously providing interpretability with the notion of optimal transport
between contexts and (d) easy applicability on top of existing point embedding
methods. The code, as well as prebuilt histograms, are available under
https://github.com/context-mover/.Comment: AISTATS 2020. Also, accepted previously at ICLR 2019 DeepGenStruct
Worksho
Recommended from our members
Continually improving grounded natural language understanding through human-robot dialog
As robots become ubiquitous in homes and workplaces such as hospitals and factories, they must be able to communicate with humans. Several kinds of knowledge are required to understand and respond to a human's natural language commands and questions. If a person requests an assistant robot to take me to Alice's office, the robot must know that Alice is a person who owns some unique office, and that take me means it should navigate there. Similarly, if a person requests bring me the heavy, green mug, the robot must have accurate mental models of the physical concepts heavy, green, and mug. To avoid forcing humans to use key phrases or words robots already know, this thesis focuses on helping robots understanding new language constructs through interactions with humans and with the world around them. To understand a command in natural language, a robot must first convert that command to an internal representation that it can reason with. Semantic parsing is a method for performing this conversion, and the target representation is often semantic forms represented as predicate logic with lambda calculus. Traditional semantic parsing relies on hand-crafted resources from a human expert: an ontology of concepts, a lexicon connecting language to those concepts, and training examples of language with abstract meanings. One thrust of this thesis is to perform semantic parsing with sparse initial data. We use the conversations between a robot and human users to induce pairs of natural language utterances with the target semantic forms a robot discovers through its questions, reducing the annotation effort of creating training examples for parsing. We use this data to build more dialog-capable robots in new domains with much less expert human effort (Thomason et al., 2015; Padmakumar et al., 2017). Meanings of many language concepts are bound to the physical world. Understanding object properties and categories, such as heavy, green, and mug requires interacting with and perceiving the physical world. Embodied robots can use manipulation capabilities, such as pushing, picking up, and dropping objects to gather sensory data about them. This data can be used to understand non-visual concepts like heavy and empty (e.g. get the empty carton of milk from the fridge), and assist with concepts that have both visual and non-visual expression (e.g. tall things look big and also exert force sooner than short things when pressed down on). A second thrust of this thesis focuses on strategies for learning these concepts using multi-modal sensory information. We use human-in-the-loop learning to get labels between concept words and actual objects in the environment (Thomason et al., 2016, 2017). We also explore ways to tease out polysemy and synonymy in concept words (Thomason and Mooney, 2017) such as light, which can refer to a weight or a color, the latter sense being synonymous with pale. Additionally, pushing, picking up, and dropping objects to gather sensory information is prohibitively time-consuming, so we investigate strategies for using linguistic information and human input to expedite exploration when learning a new concept (Thomason et al., 2018). Finally, we build an integrated agent with both parsing and perception capabilities that learns from conversations with users to improve both components over time. We demonstrate that parser learning from conversations (Thomason et al., 2015) can be combined with multi-modal perception (Thomason et al., 2016) using predicate-object labels gathered through opportunistic active learning (Thomason et al., 2017) during those conversations to improve performance for understanding natural language commands from humans. Human users also qualitatively rate this integrated learning agent as more usable after it has improved from conversation-based learning.Computer Science
Recommended from our members
Acquiring and Harnessing Verb Knowledge for Multilingual Natural Language Processing
Advances in representation learning have enabled natural language processing models to derive non-negligible linguistic information directly from text corpora in an unsupervised fashion. However, this signal is underused in downstream tasks, where they tend to fall back on superficial cues and heuristics to solve the problem at hand. Further progress relies on identifying and filling the gaps in linguistic knowledge captured in their parameters. The objective of this thesis is to address these challenges focusing on the issues of resource scarcity, interpretability, and lexical knowledge injection, with an emphasis on the category of verbs.
To this end, I propose a novel paradigm for efficient acquisition of lexical knowledge leveraging native speakers’ intuitions about verb meaning to support development and downstream performance of NLP models across languages. First, I investigate the potential of acquiring semantic verb classes from non-experts through manual clustering. This subsequently informs the development of a two-phase semantic dataset creation methodology, which combines semantic clustering with fine-grained semantic similarity judgments collected through spatial arrangements of lexical stimuli. The method is tested on English and then applied to a typologically diverse sample of languages to produce the first large-scale multilingual verb dataset of this kind. I demonstrate its utility as a diagnostic tool by carrying out a comprehensive evaluation of state-of-the-art NLP models, probing representation quality across languages and domains of verb meaning, and shedding light on their deficiencies. Subsequently, I directly address these shortcomings by injecting lexical knowledge into large pretrained language models. I demonstrate that external manually curated information about verbs’ lexical properties can support data-driven models in tasks where accurate verb processing is key. Moreover, I examine the potential of extending these benefits from resource-rich to resource-poor languages through translation-based transfer. The results emphasise the usefulness of human-generated lexical knowledge in supporting NLP models and suggest that time-efficient construction of lexicons similar to those developed in this work, especially in under-resourced languages, can play an important role in boosting their linguistic capacity.ESRC Doctoral Fellowship [ES/J500033/1], ERC Consolidator Grant LEXICAL [648909
Machine Learning Advances for Practical Problems in Computer Vision
Convolutional neural networks (CNN) have become the de facto standard for computer vision tasks, due to their unparalleled performance and versatility. Although deep learning removes the need for extensive hand engineered features for every task, real world applications of CNNs still often require considerable engineering effort to produce usable results. In this thesis, we explore solutions to problems that arise in practical applications of CNNs.
We address a rarely acknowledged weakness of CNN object detectors: the tendency to emit many excess detection boxes per object, which must be pruned by non maximum suppression (NMS). This practice relies on the assumption that highly overlapping boxes are excess, which is problematic when objects are occluding overlapping detections are actually required. Therefore we propose a novel loss function that incentivises a CNN to emit exactly one detection per object, making NMS unnecessary.
Another common problem when deploying a CNN in the real world is domain shift - CNNs can be surprisingly vulnerable to sometimes quite subtle differences between the images they encounter at deployment and those they are trained on. We investigate the role that texture plays in domain shift, and propose a novel data augmentation technique using style transfer to train CNNs that are more robust against shifts in texture. We demonstrate that this technique results in better domain transfer on several datasets, without requiring any domain specific knowledge.
In collaboration with AstraZeneca, we develop an embedding space for cellular images collected in a high throughput imaging screen as part of a drug discovery project. This uses a combination of techniques to embed the images in 2D space such that similar images are nearby, for the purpose of visualization and data exploration. The images are also clustered automatically, splitting the large dataset into a smaller number of clusters that display a common phenotype. This allows biologists to quickly triage the high throughput screen, selecting a small subset of promising phenotypes for further investigation.
Finally, we investigate an unusual form of domain bias that manifested in a real-world visual binary classification project for counterfeit detection. We confirm that CNNs are able to ``cheat'' the task by exploiting a strong correlation between class label and the specific camera that acquired the image, and show that this reliably occurs when the correlation is present. We also investigate the question of how exactly the CNN is able to infer camera type from image pixels, given that this is impossible to the human eye.
The contributions in this thesis are of practical value to deep learning practitioners working on a variety of problems in the field of computer vision
Contextual Social Networking
The thesis centers around the multi-faceted research question of how contexts may
be detected and derived that can be used for new context aware Social Networking
services and for improving the usefulness of existing Social Networking services, giving
rise to the notion of Contextual Social Networking. In a first foundational part,
we characterize the closely related fields of Contextual-, Mobile-, and Decentralized
Social Networking using different methods and focusing on different detailed
aspects. A second part focuses on the question of how short-term and long-term
social contexts as especially interesting forms of context for Social Networking may
be derived. We focus on NLP based methods for the characterization of social relations
as a typical form of long-term social contexts and on Mobile Social Signal
Processing methods for deriving short-term social contexts on the basis of geometry
of interaction and audio. We furthermore investigate, how personal social agents
may combine such social context elements on various levels of abstraction. The third
part discusses new and improved context aware Social Networking service concepts.
We investigate special forms of awareness services, new forms of social information
retrieval, social recommender systems, context aware privacy concepts and services
and platforms supporting Open Innovation and creative processes.
This version of the thesis does not contain the included publications because of
copyrights of the journals etc. Contact in terms of the version with all included
publications: Georg Groh, [email protected] zentrale Gegenstand der vorliegenden Arbeit ist die vielschichtige Frage, wie Kontexte detektiert und abgeleitet werden können, die dazu dienen können, neuartige kontextbewusste Social Networking Dienste zu schaffen und bestehende Dienste in ihrem Nutzwert zu verbessern. Die (noch nicht abgeschlossene) erfolgreiche Umsetzung dieses Programmes führt auf ein Konzept, das man als Contextual Social Networking bezeichnen kann. In einem grundlegenden ersten Teil werden die eng zusammenhängenden Gebiete Contextual Social Networking, Mobile Social Networking und Decentralized Social Networking mit verschiedenen Methoden und unter Fokussierung auf verschiedene Detail-Aspekte näher beleuchtet und in Zusammenhang gesetzt. Ein zweiter Teil behandelt die Frage, wie soziale Kurzzeit- und Langzeit-Kontexte als für das Social Networking besonders interessante Formen von Kontext gemessen und abgeleitet werden können. Ein Fokus liegt hierbei auf NLP Methoden zur Charakterisierung sozialer Beziehungen als einer typischen Form von sozialem Langzeit-Kontext. Ein weiterer Schwerpunkt liegt auf Methoden aus dem Mobile Social Signal Processing zur Ableitung sinnvoller sozialer Kurzzeit-Kontexte auf der Basis von Interaktionsgeometrien und Audio-Daten. Es wird ferner untersucht, wie persönliche soziale Agenten Kontext-Elemente verschiedener Abstraktionsgrade miteinander kombinieren können. Der dritte Teil behandelt neuartige und verbesserte Konzepte für kontextbewusste Social Networking Dienste. Es werden spezielle Formen von Awareness Diensten, neue Formen von sozialem Information Retrieval, Konzepte für kontextbewusstes Privacy Management und Dienste und Plattformen zur Unterstützung von Open Innovation und Kreativität untersucht und vorgestellt. Diese Version der Habilitationsschrift enthält die inkludierten Publikationen zurVermeidung von Copyright-Verletzungen auf Seiten der Journals u.a. nicht. Kontakt in Bezug auf die Version mit allen inkludierten Publikationen: Georg Groh, [email protected]
Towards structured neural spoken dialogue modelling.
195 p.In this thesis, we try to alleviate some of the weaknesses of the current approaches to dialogue modelling,one of the most challenging areas of Artificial Intelligence. We target three different types of dialogues(open-domain, task-oriented and coaching sessions), and use mainly machine learning algorithms to traindialogue models. One challenge of open-domain chatbots is their lack of response variety, which can betackled using Generative Adversarial Networks (GANs). We present two methodological contributions inthis regard. On the one hand, we develop a method to circumvent the non-differentiability of textprocessingGANs. On the other hand, we extend the conventional task of discriminators, which oftenoperate at a single response level, to the batch level. Meanwhile, two crucial aspects of task-orientedsystems are their understanding capabilities because they need to correctly interpret what the user islooking for and their constraints), and the dialogue strategy. We propose a simple yet powerful way toimprove spoken understanding and adapt the dialogue strategy by explicitly processing the user's speechsignal through audio-processing transformer neural networks. Finally, coaching dialogues shareproperties of open-domain and task-oriented dialogues. They are somehow task-oriented but, there is norush to complete the task, and it is more important to calmly converse to make the users aware of theirown problems. In this context, we describe our collaboration in the EMPATHIC project, where a VirtualCoach capable of carrying out coaching dialogues about nutrition was built, using a modular SpokenDialogue System. Second, we model such dialogues with an end-to-end system based on TransferLearning
- …