162 research outputs found

    Preference Learning for Machine Translation

    Get PDF
    Automatic translation of natural language is still (as of 2017) a long-standing but unmet promise. While advancing at a fast rate, the underlying methods are still far from actually being able to reliably capture syntax or semantics of arbitrary utterances of natural language, way off transporting the encoded meaning into a second language. However, it is possible to build useful translating machines when the target domain is well known and the machine is able to learn and adapt efficiently and promptly from new inputs. This is possible thanks to efficient and effective machine learning methods which can be applied to automatic translation. In this work we present and evaluate methods for three distinct scenarios: a) We develop algorithms that can learn from very large amounts of data by exploiting pairwise preferences defined over competing translations, which can be used to make a machine translation system robust to arbitrary texts from varied sources, but also enable it to learn effectively to adapt to new domains of data; b) We describe a method that is able to efficiently learn external models which adhere to fine-grained preferences that are extracted from a constricted selection of translated material, e.g. for adapting to users or groups of users in a computer-aided translation scenario; c) We develop methods for two machine translation paradigms, neural- and traditional statistical machine translation, to directly adapt to user-defined preferences in an interactive post-editing scenario, learning precisely adapted machine translation systems. In all of these settings, we show that machine translation can be made significantly more useful by careful optimization via preference learning

    A multilingual neural coaching model with enhanced long-term dialogue structure

    Get PDF
    In this work we develop a fully data-driven conversational agent capable of carrying out motivational coach- ing sessions in Spanish, French, Norwegian, and English. Unlike the majority of coaching, and in general well-being related conversational agents that can be found in the literature, ours is not designed by hand- crafted rules. Instead, we directly model the coaching strategy of professionals with end users. To this end, we gather a set of virtual coaching sessions through a Wizard of Oz platform, and apply state of the art Natural Language Processing techniques. We employ a transfer learning approach, pretraining GPT2 neural language models and fine-tuning them on our corpus. However, since these only take as input a local dialogue history, a simple fine-tuning procedure is not capable of modeling the long-term dialogue strategies that appear in coaching sessions. To alleviate this issue, we first propose to learn dialogue phase and scenario embeddings in the fine-tuning stage. These indicate to the model at which part of the dialogue it is and which kind of coaching session it is carrying out. Second, we develop a global deep learning system which controls the long-term structure of the dialogue. We also show that this global module can be used to visualize and interpret the decisions taken by the the conversational agent, and that the learnt representations are comparable to dialogue acts. Automatic and human evaluation show that our proposals serve to improve the baseline models. Finally, interaction experiments with coaching experts indicate that the system is usable and gives rise to positive emotions in Spanish, French and English, while the results in Norwegian point out that there is still work to be done in fully data driven approaches with very low resource languages.This work has been partially funded by the Basque Government under grant PRE_2017_1_0357 and by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 769872

    Recuperação multimodal e interativa de informação orientada por diversidade

    Get PDF
    Orientador: Ricardo da Silva TorresTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Os métodos de Recuperação da Informação, especialmente considerando-se dados multimídia, evoluíram para a integração de múltiplas fontes de evidência na análise de relevância de itens em uma tarefa de busca. Neste contexto, para atenuar a distância semântica entre as propriedades de baixo nível extraídas do conteúdo dos objetos digitais e os conceitos semânticos de alto nível (objetos, categorias, etc.) e tornar estes sistemas adaptativos às diferentes necessidades dos usuários, modelos interativos que consideram o usuário mais próximo do processo de recuperação têm sido propostos, permitindo a sua interação com o sistema, principalmente por meio da realimentação de relevância implícita ou explícita. Analogamente, a promoção de diversidade surgiu como uma alternativa para lidar com consultas ambíguas ou incompletas. Adicionalmente, muitos trabalhos têm tratado a ideia de minimização do esforço requerido do usuário em fornecer julgamentos de relevância, à medida que mantém níveis aceitáveis de eficácia. Esta tese aborda, propõe e analisa experimentalmente métodos de recuperação da informação interativos e multimodais orientados por diversidade. Este trabalho aborda de forma abrangente a literatura acerca da recuperação interativa da informação e discute sobre os avanços recentes, os grandes desafios de pesquisa e oportunidades promissoras de trabalho. Nós propusemos e avaliamos dois métodos de aprimoramento do balanço entre relevância e diversidade, os quais integram múltiplas informações de imagens, tais como: propriedades visuais, metadados textuais, informação geográfica e descritores de credibilidade dos usuários. Por sua vez, como integração de técnicas de recuperação interativa e de promoção de diversidade, visando maximizar a cobertura de múltiplas interpretações/aspectos de busca e acelerar a transferência de informação entre o usuário e o sistema, nós propusemos e avaliamos um método multimodal de aprendizado para ranqueamento utilizando realimentação de relevância sobre resultados diversificados. Nossa análise experimental mostra que o uso conjunto de múltiplas fontes de informação teve impacto positivo nos algoritmos de balanceamento entre relevância e diversidade. Estes resultados sugerem que a integração de filtragem e re-ranqueamento multimodais é eficaz para o aumento da relevância dos resultados e também como mecanismo de potencialização dos métodos de diversificação. Além disso, com uma análise experimental minuciosa, nós investigamos várias questões de pesquisa relacionadas à possibilidade de aumento da diversidade dos resultados e a manutenção ou até mesmo melhoria da sua relevância em sessões interativas. Adicionalmente, nós analisamos como o esforço em diversificar afeta os resultados gerais de uma sessão de busca e como diferentes abordagens de diversificação se comportam para diferentes modalidades de dados. Analisando a eficácia geral e também em cada iteração de realimentação de relevância, nós mostramos que introduzir diversidade nos resultados pode prejudicar resultados iniciais, enquanto que aumenta significativamente a eficácia geral em uma sessão de busca, considerando-se não apenas a relevância e diversidade geral, mas também o quão cedo o usuário é exposto ao mesmo montante de itens relevantes e nível de diversidadeAbstract: Information retrieval methods, especially considering multimedia data, have evolved towards the integration of multiple sources of evidence in the analysis of the relevance of items considering a given user search task. In this context, for attenuating the semantic gap between low-level features extracted from the content of the digital objects and high-level semantic concepts (objects, categories, etc.) and making the systems adaptive to different user needs, interactive models have brought the user closer to the retrieval loop allowing user-system interaction mainly through implicit or explicit relevance feedback. Analogously, diversity promotion has emerged as an alternative for tackling ambiguous or underspecified queries. Additionally, several works have addressed the issue of minimizing the required user effort on providing relevance assessments while keeping an acceptable overall effectiveness. This thesis discusses, proposes, and experimentally analyzes multimodal and interactive diversity-oriented information retrieval methods. This work, comprehensively covers the interactive information retrieval literature and also discusses about recent advances, the great research challenges, and promising research opportunities. We have proposed and evaluated two relevance-diversity trade-off enhancement work-flows, which integrate multiple information from images, such as: visual features, textual metadata, geographic information, and user credibility descriptors. In turn, as an integration of interactive retrieval and diversity promotion techniques, for maximizing the coverage of multiple query interpretations/aspects and speeding up the information transfer between the user and the system, we have proposed and evaluated a multimodal learning-to-rank method trained with relevance feedback over diversified results. Our experimental analysis shows that the joint usage of multiple information sources positively impacted the relevance-diversity balancing algorithms. Our results also suggest that the integration of multimodal-relevance-based filtering and reranking was effective on improving result relevance and also boosted diversity promotion methods. Beyond it, with a thorough experimental analysis we have investigated several research questions related to the possibility of improving result diversity and keeping or even improving relevance in interactive search sessions. Moreover, we analyze how much the diversification effort affects overall search session results and how different diversification approaches behave for the different data modalities. By analyzing the overall and per feedback iteration effectiveness, we show that introducing diversity may harm initial results whereas it significantly enhances the overall session effectiveness not only considering the relevance and diversity, but also how early the user is exposed to the same amount of relevant items and diversityDoutoradoCiência da ComputaçãoDoutor em Ciência da ComputaçãoP-4388/2010140977/2012-0CAPESCNP

    Ranking and Retrieval under Semantic Relevance

    Get PDF
    This thesis presents a series of conceptual and empirical developments on the ranking and retrieval of candidates under semantic relevance. Part I of the thesis introduces the concept of uncertainty in various semantic tasks (such as recognizing textual entailment) in natural language processing, and the machine learning techniques commonly employed to model these semantic phenomena. A unified view of ranking and retrieval will be presented, and the trade-off between model expressiveness, performance, and scalability in model design will be discussed. Part II of the thesis focuses on applying these ranking and retrieval techniques to text: Chapter 3 examines the feasibility of ranking hypotheses given a premise with respect to a human's subjective probability of the hypothesis happening, effectively extending the traditional categorical task of natural language inference. Chapter 4 focuses on detecting situation frames for documents using ranking methods. Then we extend the ranking notion to retrieval, and develop both sparse (Chapter 5) and dense (Chapter 6) vector-based methods to facilitate scalable retrieval for potential answer paragraphs in question answering. Part III turns the focus to mentions and entities in text, while continuing the theme on ranking and retrieval: Chapter 7 discusses the ranking of fine-grained types that an entity mention could belong to, leading to state-of-the-art performance on hierarchical multi-label fine-grained entity typing. Chapter 8 extends the semantic relation of coreference to a cross-document setting, enabling models to retrieve from a large corpus, instead of in a single document, when resolving coreferent entity mentions

    Computer vision beyond the visible : image understanding through language

    Get PDF
    In the past decade, deep neural networks have revolutionized computer vision. High performing deep neural architectures trained for visual recognition tasks have pushed the field towards methods relying on learned image representations instead of hand-crafted ones, in the seek of designing end-to-end learning methods to solve challenging tasks, ranging from long-lasting ones such as image classification to newly emerging tasks like image captioning. As this thesis is framed in the context of the rapid evolution of computer vision, we present contributions that are aligned with three major changes in paradigm that the field has recently experienced, namely 1) the power of re-utilizing deep features from pre-trained neural networks for different tasks, 2) the advantage of formulating problems with end-to-end solutions given enough training data, and 3) the growing interest of describing visual data with natural language rather than pre-defined categorical label spaces, which can in turn enable visual understanding beyond scene recognition. The first part of the thesis is dedicated to the problem of visual instance search, where we particularly focus on obtaining meaningful and discriminative image representations which allow efficient and effective retrieval of similar images given a visual query. Contributions in this part of the thesis involve the construction of sparse Bag-of-Words image representations from convolutional features from a pre-trained image classification neural network, and an analysis of the advantages of fine-tuning a pre-trained object detection network using query images as training data. The second part of the thesis presents contributions to the problem of image-to-set prediction, understood as the task of predicting a variable-sized collection of unordered elements for an input image. We conduct a thorough analysis of current methods for multi-label image classification, which are able to solve the task in an end-to-end manner by simultaneously estimating both the label distribution and the set cardinality. Further, we extend the analysis of set prediction methods to semantic instance segmentation, and present an end-to-end recurrent model that is able to predict sets of objects (binary masks and categorical labels) in a sequential manner. Finally, the third part of the dissertation takes insights learned in the previous two parts in order to present deep learning solutions to connect images with natural language in the context of cooking recipes and food images. First, we propose a retrieval-based solution in which the written recipe and the image are encoded into compact representations that allow the retrieval of one given the other. Second, as an alternative to the retrieval approach, we propose a generative model to predict recipes directly from food images, which first predicts ingredients as sets and subsequently generates the rest of the recipe one word at a time by conditioning both on the image and the predicted ingredients.En l'última dècada, les xarxes neuronals profundes han revolucionat el camp de la visió per computador. Els resultats favorables obtinguts amb arquitectures neuronals profundes entrenades per resoldre tasques de reconeixement visual han causat un canvi de paradigma cap al disseny de mètodes basats en representacions d'imatges apreses de manera automàtica, deixant enrere les tècniques tradicionals basades en l'enginyeria de representacions. Aquest canvi ha permès l'aparició de tècniques basades en l'aprenentatge d'extrem a extrem (end-to-end), capaces de resoldre de manera efectiva molts dels problemes tradicionals de la visió per computador (e.g. classificació d'imatges o detecció d'objectes), així com nous problemes emergents com la descripció textual d'imatges (image captioning). Donat el context de la ràpida evolució de la visió per computador en el qual aquesta tesi s'emmarca, presentem contribucions alineades amb tres dels canvis més importants que la visió per computador ha experimentat recentment: 1) la reutilització de representacions extretes de models neuronals pre-entrenades per a tasques auxiliars, 2) els avantatges de formular els problemes amb solucions end-to-end entrenades amb grans bases de dades, i 3) el creixent interès en utilitzar llenguatge natural en lloc de conjunts d'etiquetes categòriques pre-definits per descriure el contingut visual de les imatges, facilitant així l'extracció d'informació visual més enllà del reconeixement de l'escena i els elements que la composen La primera part de la tesi està dedicada al problema de la cerca d'imatges (image retrieval), centrada especialment en l'obtenció de representacions visuals significatives i discriminatòries que permetin la recuperació eficient i efectiva d'imatges donada una consulta formulada amb una imatge d'exemple. Les contribucions en aquesta part de la tesi inclouen la construcció de representacions Bag-of-Words a partir de descriptors locals obtinguts d'una xarxa neuronal entrenada per classificació, així com un estudi dels avantatges d'utilitzar xarxes neuronals per a detecció d'objectes entrenades utilitzant les imatges d'exemple, amb l'objectiu de millorar les capacitats discriminatòries de les representacions obtingudes. La segona part de la tesi presenta contribucions al problema de predicció de conjunts a partir d'imatges (image to set prediction), entès com la tasca de predir una col·lecció no ordenada d'elements de longitud variable donada una imatge d'entrada. En aquest context, presentem una anàlisi exhaustiva dels mètodes actuals per a la classificació multi-etiqueta d'imatges, que són capaços de resoldre la tasca de manera integral calculant simultàniament la distribució probabilística sobre etiquetes i la cardinalitat del conjunt. Seguidament, estenem l'anàlisi dels mètodes de predicció de conjunts a la segmentació d'instàncies semàntiques, presentant un model recurrent capaç de predir conjunts d'objectes (representats per màscares binàries i etiquetes categòriques) de manera seqüencial. Finalment, la tercera part de la tesi estén els coneixements apresos en les dues parts anteriors per presentar solucions d'aprenentatge profund per connectar imatges amb llenguatge natural en el context de receptes de cuina i imatges de plats cuinats. En primer lloc, proposem una solució basada en algoritmes de cerca, on la recepta escrita i la imatge es codifiquen amb representacions compactes que permeten la recuperació d'una donada l'altra. En segon lloc, com a alternativa a la solució basada en algoritmes de cerca, proposem un model generatiu capaç de predir receptes (compostes pels seus ingredients, predits com a conjunts, i instruccions) directament a partir d'imatges de menjar.Postprint (published version
    corecore