27 research outputs found

    MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale

    Full text link
    We study the zero-shot transfer capabilities of text matching models on a massive scale, by self-supervised training on 140 source domains from community question answering forums in English. We investigate the model performances on nine benchmarks of answer selection and question similarity tasks, and show that all 140 models transfer surprisingly well, where the large majority of models substantially outperforms common IR baselines. We also demonstrate that considering a broad selection of source domains is crucial for obtaining the best zero-shot transfer performances, which contrasts the standard procedure that merely relies on the largest and most similar domains. In addition, we extensively study how to best combine multiple source domains. We propose to incorporate self-supervised with supervised multi-task learning on all available source domains. Our best zero-shot transfer model considerably outperforms in-domain BERT and the previous state of the art on six benchmarks. Fine-tuning of our model with in-domain data results in additional large gains and achieves the new state of the art on all nine benchmarks.Comment: EMNLP-202

    Improving QA Generalization by Concurrent Modeling of Multiple Biases

    Get PDF
    Existing NLP datasets contain various biases that models can easily exploit to achieve high performances on the corresponding evaluation sets. However, focusing on dataset-specific biases limits their ability to learn more generalizable knowledge about the task from more general data patterns. In this paper, we investigate the impact of debiasing methods for improving generalization and propose a general framework for improving the performance on both in-domain and out-of-domain datasets by concurrent modeling of multiple biases in the training data. Our framework weights each example based on the biases it contains and the strength of those biases in the training data. It then uses these weights in the training objective so that the model relies less on examples with high bias weights. We extensively evaluate our framework on extractive question answering with training data from various domains with multiple biases of different strengths. We perform the evaluations in two different settings, in which the model is trained on a single domain or multiple domains simultaneously, and show its effectiveness in both settings compared to state-of-the-art debiasing methods

    Real-Time Summarization of Big Data Streams

    No full text
    Events like natural disasters, riots or protests trigger an increased information need for many people, because of regional closeness, social relations or general interest. Due to a high amount of news-articles that are created by different publishers during such events, it is nearly impossible for individual persons to process all information with the goal of staying up-to-date. Real-time summarization systems can help in such cases by providing persons with updates on the event while the situation still is developing, without requiring the individual person to manually analyze a large amount of news-articles. In this master thesis, a framework for real-time summarization is presented and multiple summarization systems based on this framework are introduced. Besides achieving a good summarization quality, another focus of this work was to retain real-time properties in terms of summarization and in terms of computational performance. Based on a simple approach defined as Baseline, different improvements were made with the goal to create an advanced system which achieves a performance similar to other state-of-the-art temporal summarization systems. The best resulting system of this work is an adaptive approach which is able to change configurations and algorithms during run-time to automatically select the best method to summarize each target-event. The adaptive selection is performed by detecting the importance of an event, based on its news-coverage. The system also makes use of an approach that requires all information to be reported by multiple sources before it can be included in an update. The adaptive summarization system showed superior results in terms of summarization quality compared to the Baseline system. Furthermore, a comparison to a state-of-the-art temporal summarization system also showed better results of the adaptive approach. At the same time, all real-time goals were achieved

    Representation Learning and Learning from Limited Labeled Data for Community Question Answering

    No full text
    The amount of information published on the Internet is growing steadily. Accessing the vast knowledge in them more effectively is a fundamental goal of many tasks in natural language processing. In this thesis, we address this challenge from the perspective of community question answering by leveraging data from web forums and Q&A communities to find and identify answers for given questions automatically. More precisely, we are concerned with fundamental challenges that arise from this setting, broadly categorized in (1) obtaining better text representations and (2) dealing with scenarios where we have little or no labeled training data. We first study attention mechanisms for learning representations of questions and answers to compare them efficiently and effectively. A limitation of previous approaches is that they leverage question information when learning answer representations. This procedure of dependent encoding requires us to obtain separate answer representations for each question, which is inefficient. To remedy this, we propose a self-attentive model that does not suffer from this drawback. We show that our model achieves on-par or better performance for answer selection tasks compared to other approaches while allowing us to encode questions and answers independently. Due to the importance of attention mechanisms, we present a framework to effortlessly transform answer selection models into prototypical question answering systems for the interactive inspection and side-by-side comparison of attention weights. Besides purely monolingual approaches, we study how to transfer text representations across languages. A popular concept to obtain universally re-usable representations is the one of sentence embeddings. Previous work either studied them only monolingually or cross-lingually for only a few individual datasets. We go beyond this by studying universal cross-lingual sentence embeddings, which are re-usable across many different classification tasks and across languages. Our training-free approach generalizes the concept of average word embeddings by concatenating different kinds of word embeddings and by computing several generalized means. Due to its simplicity, we can effortlessly extend our approach to new languages by incorporating cross-lingual word embeddings. We show that our sentence embeddings outperform more complex techniques monolingually on nine tasks and achieve the best results cross-lingually for the transfer from English to German and French. We complement this by studying an orthogonal approach where we machine translate the input from German to English and continue monolingually. We investigate the impact of a standard neural machine translation model on the performance of models for determining question similarity in programming and operating systems forums. We highlight that translation mistakes can have a substantial performance impact, and we mitigate this by adapting our machine translation models to these specialized domains using back-translation. In the second part, we study monolingual scenarios with (a) little labeled data, (b) only unlabeled data, (c) no target dataset information. These are critical challenges in our setting as there exist large numbers of web forums that contain only a few labeled question-answer pairs and no labeled similar questions. One approach to generalize from small training data is to use simple models with few trainable layers. We present COALA, a shallow task-specific network architecture specialized in answer selection, containing only one trainable layer. This layer learns representations of word n-grams in questions and answers, which we compare and aggregate for scoring. Our approach improves upon a more complex compare-aggregate architecture by 4.5 percentage points on average, across six datasets with small training data. Moreover, it outperforms standard IR baselines already with 25 labeled instances. The standard method for training models to determine question similarity requires labeled question pairs, which do not exist for many forums. Therefore, we investigate alternatives such as self-supervised training with question title-body information, and we propose duplicate question generation. By leveraging larger amounts of unlabeled data, we show that both methods can achieve substantial improvements over adversarial domain transfer and outperform supervised in-domain training on two datasets. We find that duplicate question generation transfers well to unseen domains, and that we can leverage self-supervised training to obtain suitable answer selection models based on state-of-the-art pre-trained transformers. Finally, we argue that it can be prohibitive to train separate specialized models for each forum. It is desirable to obtain one model that generalizes well to several unseen scenarios. Towards this goal, we broadly study the zero-shot transfer capabilities of text matching models in community question answering. We train 140 models with self-supervised training signals on different forums and transfer them to nine evaluation datasets of question similarity and answer selection tasks. We find that the large majority of models generalize surprisingly well, and in six cases, all models outperform standard IR baselines. Our analyses reveal that considering a broad selection of source domains is crucial because the best zero-shot transfer performance often correlates with neither domain similarity nor training data size. We investigate different combination techniques and propose incorporating self-supervised and supervised multi-task learning with data from all source forums. Our best model for zero-shot transfer, MultiCQA, outperforms in-domain models on six datasets even though it has not seen target-domain data during training

    Representation Learning for Answer Selection with LSTM-Based Importance Weighting

    No full text
    We present an approach to non-factoid answer selection with a separate component based on BiLSTM to determine the importance of segments in the input. In contrast to other recently proposed attention-based models within the same area, we determine the importance while assuming the independence of questions and candidate answers. Experimental results show the effectiveness of our approach, which outperforms several state-of-the-art attention-based models on the recent non-factoid answer selection datasets InsuranceQA v1 and v2. We show that it is possible to perform effective importance weighting for answer selection without relying on the relatedness of questions and answers. The source code of our experiments is publicly available

    End-to-End Non-Factoid Question Answering with an Interactive Visualization of Neural Attention Weights

    No full text
    Advanced attention mechanisms are an important part of successful neural network approaches for non-factoid answer selection because they allow the models to focus on few important segments within rather long answer texts. Analyzing attention mechanisms is thus crucial for understanding strengths and weaknesses of particular models. We present an extensible, highly modular service architecture that enables the transformation of neural network models for non-factoid answer selection into fully featured end-to-end question answering systems. The primary objective  of our system is to enable researchers a way to interactively explore and compare attention-based neural networks for answer selection. Our interactive user interface helps researchers to better understand the capabilities of the different approaches and can aid qualitative analyses
    corecore