1,212 research outputs found

    Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders

    Full text link
    While recent neural encoder-decoder models have shown great promise in modeling open-domain conversations, they often generate dull and generic responses. Unlike past work that has focused on diversifying the output of the decoder at word-level to alleviate this problem, we present a novel framework based on conditional variational autoencoders that captures the discourse-level diversity in the encoder. Our model uses latent variables to learn a distribution over potential conversational intents and generates diverse responses using only greedy decoders. We have further developed a novel variant that is integrated with linguistic prior knowledge for better performance. Finally, the training procedure is improved by introducing a bag-of-word loss. Our proposed models have been validated to generate significantly more diverse responses than baseline approaches and exhibit competence in discourse-level decision-making.Comment: Appeared in ACL2017 proceedings as a long paper. Correct a calculation mistake in Table 1 E-bow & A-bow and results into higher score

    Induced Model Matching: How Restricted Models Can Help Larger Ones

    Full text link
    We consider scenarios where a very accurate predictive model using restricted features is available at the time of training of a larger, full-featured, model. This restricted model may be thought of as "side-information", derived either from an auxiliary exhaustive dataset or on the same dataset, by forcing the restriction. How can the restricted model be useful to the full model? We propose an approach for transferring the knowledge of the restricted model to the full model, by aligning the full model's context-restricted performance with that of the restricted model's. We call this methodology Induced Model Matching (IMM) and first illustrate its general applicability by using logistic regression as a toy example. We then explore IMM's use in language modeling, the application that initially inspired it, and where it offers an explicit foundation in contrast to the implicit use of restricted models in techniques such as noising. We demonstrate the methodology on both LSTM and transformer full models, using NN-grams as restricted models. To further illustrate the potential of the principle whenever it is much cheaper to collect restricted rather than full information, we conclude with a simple RL example where POMDP policies can improve learned MDP policies via IMM

    Blending Learning and Inference in Structured Prediction

    Full text link
    In this paper we derive an efficient algorithm to learn the parameters of structured predictors in general graphical models. This algorithm blends the learning and inference tasks, which results in a significant speedup over traditional approaches, such as conditional random fields and structured support vector machines. For this purpose we utilize the structures of the predictors to describe a low dimensional structured prediction task which encourages local consistencies within the different structures while learning the parameters of the model. Convexity of the learning task provides the means to enforce the consistencies between the different parts. The inference-learning blending algorithm that we propose is guaranteed to converge to the optimum of the low dimensional primal and dual programs. Unlike many of the existing approaches, the inference-learning blending allows us to learn efficiently high-order graphical models, over regions of any size, and very large number of parameters. We demonstrate the effectiveness of our approach, while presenting state-of-the-art results in stereo estimation, semantic segmentation, shape reconstruction, and indoor scene understanding

    Structured local exponential models for machine translation

    Get PDF
    This thesis proposes a synthesis and generalization of local exponential translation models, the subclass of feature-rich translation models which associate probability distributions with individual rewrite rules used by the translation system, such as synchronous context-free rules, or with other individual aspects of translation hypotheses such as word pairs or reordering events. Unlike other authors we use these estimates to replace the traditional phrase models and lexical scores, rather than in addition to them, thereby demonstrating that the local exponential phrase models can be regarded as a generalization of standard methods not only in theoretical but also in practical terms. We further introduce a form of local translation models that combine features associated with surface forms of rules and features associated with less specific representation -- including those based on lemmas, inflections, and reordering patterns -- such that surface-form estimates are recovered as a special case of the model. Crucially, the proposed approach allows estimation of parameters for the latter type of features from training sets that include multiple source phrases, thereby overcoming an important training set fragmentation problem which hampers previously proposed local translation models. These proposals are experimentally validated. Conditioning all phrase-based probabilities in a hierarchical phrase-based system on source-side contextual information produces significant performance improvements. Extending the contextually-sensitive estimates with features modeling source-side morphology and reordering patterns yields consistent additional improvements, while further experiments show significant improvements obtained from modeling observed and unobserved inflections for a morphologically rich target language

    Fast text-only domain adaptation of RNN-transducer prediction network

    Get PDF
    Publisher Copyright: Copyright © 2021 ISCA.Adaption of end-to-end speech recognition systems to new tasks is known to be challenging. A number of solutions have been proposed which apply external language models with various fusion methods, possibly with a combination of two-pass decoding. Also TTS systems have been used to generate adaptation data for the end-to-end models. In this paper we show that RNN-transducer models can be effectively adapted to new domains using only small amounts of textual data. By taking advantage of model's inherent structure, where the prediction network is interpreted as a language model, we can apply fast adaptation to the model. Adapting the model avoids the need for complicated decoding time fusions and external language models. Using appropriate regularization, the prediction network can be adapted to new domains while still retaining good generalization capabilities. We show with multiple ASR evaluation tasks how this method can provide relative gains of 10-45% in target task WER. We also share insights how RNN-transducer prediction network performs as a language model.Peer reviewe
    • …
    corecore