1,215 research outputs found

    Zero-Shot Cross-Lingual Transfer with Meta Learning

    Full text link
    Learning what to share between tasks has been a topic of great importance recently, as strategic sharing of knowledge has been shown to improve downstream task performance. This is particularly important for multilingual applications, as most languages in the world are under-resourced. Here, we consider the setting of training models on multiple different languages at the same time, when little or no data is available for languages other than English. We show that this challenging setup can be approached using meta-learning, where, in addition to training a source language model, another model learns to select which training instances are the most beneficial to the first. We experiment using standard supervised, zero-shot cross-lingual, as well as few-shot cross-lingual settings for different natural language understanding tasks (natural language inference, question answering). Our extensive experimental setup demonstrates the consistent effectiveness of meta-learning for a total of 15 languages. We improve upon the state-of-the-art for zero-shot and few-shot NLI (on MultiNLI and XNLI) and QA (on the MLQA dataset). A comprehensive error analysis indicates that the correlation of typological features between languages can partly explain when parameter sharing learned via meta-learning is beneficial.Comment: Accepted as long paper in EMNLP2020 main conferenc

    SMDDH: Singleton Mention detection using Deep Learning in Hindi Text

    Full text link
    Mention detection is an important component of coreference resolution system, where mentions such as name, nominal, and pronominals are identified. These mentions can be purely coreferential mentions or singleton mentions (non-coreferential mentions). Coreferential mentions are those mentions in a text that refer to the same entities in a real world. Whereas, singleton mentions are mentioned only once in the text and do not participate in the coreference as they are not mentioned again in the following text. Filtering of these singleton mentions can substantially improve the performance of a coreference resolution process. This paper proposes a singleton mention detection module based on a fully connected network and a Convolutional neural network for Hindi text. This model utilizes a few hand-crafted features and context information, and word embedding for words. The coreference annotated Hindi dataset comprising of 3.6K sentences, and 78K tokens are used for the task. In terms of Precision, Recall, and F-measure, the experimental findings obtained are excellent

    t-Exponential Memory Networks for Question-Answering Machines

    Full text link
    Recent advances in deep learning have brought to the fore models that can make multiple computational steps in the service of completing a task; these are capable of describ- ing long-term dependencies in sequential data. Novel recurrent attention models over possibly large external memory modules constitute the core mechanisms that enable these capabilities. Our work addresses learning subtler and more complex underlying temporal dynamics in language modeling tasks that deal with sparse sequential data. To this end, we improve upon these recent advances, by adopting concepts from the field of Bayesian statistics, namely variational inference. Our proposed approach consists in treating the network parameters as latent variables with a prior distribution imposed over them. Our statistical assumptions go beyond the standard practice of postulating Gaussian priors. Indeed, to allow for handling outliers, which are prevalent in long observed sequences of multivariate data, multivariate t-exponential distributions are imposed. On this basis, we proceed to infer corresponding posteriors; these can be used for inference and prediction at test time, in a way that accounts for the uncertainty in the available sparse training data. Specifically, to allow for our approach to best exploit the merits of the t-exponential family, our method considers a new t-divergence measure, which generalizes the concept of the Kullback-Leibler divergence. We perform an extensive experimental evaluation of our approach, using challenging language modeling benchmarks, and illustrate its superiority over existing state-of-the-art techniques
    corecore