36 research outputs found
Joint Learning of Answer Selection and Answer Summary Generation in Community Question Answering
Community question answering (CQA) gains increasing popularity in both
academy and industry recently. However, the redundancy and lengthiness issues
of crowdsourced answers limit the performance of answer selection and lead to
reading difficulties and misunderstandings for community users. To solve these
problems, we tackle the tasks of answer selection and answer summary generation
in CQA with a novel joint learning model. Specifically, we design a
question-driven pointer-generator network, which exploits the correlation
information between question-answer pairs to aid in attending the essential
information when generating answer summaries. Meanwhile, we leverage the answer
summaries to alleviate noise in original lengthy answers when ranking the
relevancy degrees of question-answer pairs. In addition, we construct a new
large-scale CQA corpus, WikiHowQA, which contains long answers for answer
selection as well as reference summaries for answer summarization. The
experimental results show that the joint learning method can effectively
address the answer redundancy issue in CQA and achieves state-of-the-art
results on both answer selection and text summarization tasks. Furthermore, the
proposed model is shown to be of great transferring ability and applicability
for resource-poor CQA tasks, which lack of reference answer summaries.Comment: Accepted by AAAI 2020 (oral
Text Summarization Across High and Low-Resource Settings
Natural language processing aims to build automated systems that can both understand and generate natural language textual data. As the amount of textual data available online has increased exponentially, so has the need for intelligence systems to comprehend and present it to the world. As a result, automatic text summarization, the process by which a text\u27s salient content is automatically distilled into a concise form, has become a necessary tool. Automatic text summarization approaches and applications vary based on the input summarized, which may constitute single or multiple documents of different genres. Furthermore, the desired output style may consist of a sentence or sub-sentential units chosen directly from the input in extractive summarization or a fusion and paraphrase of the input document in abstractive summarization. Despite differences in the above use-cases, specific themes, such as the role of large-scale data for training these models, the application of summarization models in real-world scenarios, and the need for adequately evaluating and comparing summaries, are common across these settings. This dissertation presents novel data and modeling techniques for deep neural network-based summarization models trained across high-resource (thousands of supervised training examples) and low-resource (zero to hundreds of supervised training examples) data settings and a comprehensive evaluation of the model and metric progress in the field. We examine both Recurrent Neural Network (RNN)-based and Transformer-based models to extract and generate summaries from the input. To facilitate the training of large-scale networks, we introduce datasets applicable for multi-document summarization (MDS) for pedagogical applications and for news summarization. While the high-resource settings allow models to advance state-of-the-art performance, the failure of such models to adapt to settings outside of that in which it was initially trained requires smarter use of labeled data and motivates work in low-resource summarization. To this end, we propose unsupervised learning techniques for both extractive summarization in question answering, abstractive summarization on distantly-supervised data for summarization of community question answering forums, and abstractive zero and few-shot summarization across several domains. To measure the progress made along these axes, we revisit the evaluation of current summarization models. In particular, this dissertation addresses the following research objectives: 1) High-resource Summarization. We introduce datasets for multi-document summarization, focusing on pedagogical applications for NLP, news summarization, and Wikipedia topic summarization. Large-scale datasets allow models to achieve state-of-the-art performance on these tasks compared to prior modeling techniques, and we introduce a novel model to reduce redundancy. However, we also examine how models trained on these large-scale datasets fare when applied to new settings, showing the need for more generalizable models. 2) Low-resource Summarization. While high-resource summarization improves model performance, for practical applications, data-efficient models are necessary. We propose a pipeline for creating synthetic training data for training extractive question-answering models, a form of query-based extractive summarization with short-phrase summaries. In other work, we propose an automatic pipeline for training a multi-document summarizer in answer summarization on community question-answering forums without labeled data. Finally, we push the boundaries of abstractive summarization model performance when little or no training data is available across several domains. 3) Automatic Summarization Evaluation. To understand the extent of progress made across recent modeling techniques and better understand the current evaluation protocols, we examine the current metrics used to compare summarization output quality across 12 metrics across 23 deep neural network models and propose better-motivated summarization evaluation guidelines as well as point to open problems in summarization evaluation
Recommended from our members
Neural Methods for Answer Passage Retrieval over Sparse Collections
Recent advances in machine learning have allowed information retrieval (IR) techniques to advance beyond the stage of handcrafting domain specific features. Specifically, deep neural models incorporate varying levels of features to learn whether a document answers the information need of a query. However, these neural models rely on a large number of parameters to successfully learn a relation between a query and a relevant document. This reliance on a large number of parameters, combined with the current methods of optimization relying on small updates necessitates numerous samples to allow the neural model to converge on an effective relevance function. This presents a significant obstacle in the realm of IR as relevance judgements are often sparse or noisy and combined with a large class imbalance. This is especially true for short text retrieval where there is often only one relevant passage. This problem is exacerbated when training these artificial neural networks, as excessive negative sampling can result in poor performance. Thus, we propose approaching this task through multiple avenues and examining their effectiveness on a non-factoid question answering (QA) task.We first propose learning local embeddings specific to the relevance information of the collection to improve performance of an upstream neural model. In doing so, we find significantly improved results over standard pre-trained embeddings, despite only developing the embeddings on a small collection which would not be sufficient for a full language model. Leveraging this local representation, and inspired by recent work in machine translation, we introduce a hybrid embedding based model that incorporates both pre-trained embeddings while dynamically constructing local representations from character embeddings. The hybrid approach relies on pre-trained embeddings to achieve an effective retrieval model, and continually adjusts its character level abstraction to fit a local representation.We next approach methods to adapt neural models to multiple IR collections, therefore reducing the collection specific training required and alleviating the need to retrain a neural model\u27s parameters for a new subdomain of a collection. First, we propose an adversarial retrieval model which achieves state-of-the-art performance on out of subdomain queries while maintaining in-domain performance. Second, we establish an informed negative sampling approach using a reinforcement learning agent. The agent is trained to directly maximize the performance of a neural IR model using a predefined IR metric by choosing which ranking function from which to sample negative documents. This policy based sampling allows the neural model to be exposed to more of a collection and results in a more consistent neural retrieval model over multiple training instances. Lastly, we move towards a universal retrieval function. We initially introduce a probe-based inspection of neural relevance models through the lens of standard natural language processing tasks and establish that while seemingly similar QA collections require the same basic abstract information, the final layers that determine relevance differ significantly. We then introduce Universal Retrieval Functions, a method to incorporate new collections using a library of previously trained linear relevance models and a common neural representation
Learning to represent, categorise and rank in community question answering
The task of Question Answering (QA) is arguably one of the oldest tasks in Natural Language Processing, attracting high levels of interest from both industry and academia. However, most research has focused on factoid questions, e.g. Who is the president of Ireland? In contrast, research on answering non-factoid questions, such as manner, reason, difference and opinion questions, has been rather piecemeal.
This was largely due to the absence of available labelled data for the task. This is changing, however, with the growing popularity of Community Question Answering (CQA) websites, such as Quora, Yahoo! Answers and the Stack Exchange family of forums. These websites provide natural labelled data allowing us to apply machine learning techniques.
Most previous state-of-the-art approaches to the tasks of CQA-based question answering involved handcrafted features in combination with linear models. In this thesis we hypothesise that the use of handcrafted features can be avoided and the tasks can be approached with representation learning techniques, specifically deep learning.
In the first part of this thesis we give an overview of deep learning in natural language processing and empirically evaluate our hypothesis on the task of detecting semantically equivalent questions, i.e. predicting if two questions can be answered by the same answer.
In the second part of the thesis we address the task of answer ranking, i.e. determining how suitable an answer is for a given question. In order to determine the suitability of representation learning for the task of answer ranking, we provide a rigorous experimental evaluation of various neural architectures, based on feedforward, recurrent and convolutional neural networks, as well as their combinations.
This thesis shows that deep learning is a very suitable approach to CQA-based QA, achieving state-of-the-art results on the two tasks we addressed
Recommended from our members
ANSWER SIMILARITY GROUPING AND DIVERSIFICATION IN QUESTION ANSWERING SYSTEMS
The rise in popularity of mobile and voice search has led to a shift in IR from document to passage retrieval for non-factoid questions. Various datasets such as MSMarco, as well as efficient retrieval models have been developed to identify single best answer passages for this task. However, such models do not specifically address questions which could have multiple or alternative answers. In this dissertation, we focus on this new research area that involves studying answer passage relationships and how this could be applied to passage retrieval tasks.
We first create a high quality dataset for the answer passage similarity task in the context of question answering. Manual annotation of passage pairs is performed to set the similarity labels, from which answer group information is automatically generated. We next investigate different types of representations, which could be used to create effective clusters. We experiment with various unsupervised representations and show that distributional representations outperform term based representations for this task. Next, weak supervision is leveraged to further improve the cluster modeling performance. We use BERT as the underlying model for training and show the relative performance of various weak signals such as GloVe and term-based Language Modeling for this task. In order to apply these clusters to the answer passage retrieval task for multi-answer questions, we use a modified version of the Maximal Marginal Relevance (MMR) diversification model. We demonstrate that answers retrieved using this model are more diverse i.e, cover more answer types with low redundancy as well as maximize relevance, with respect to the baselines. So far, we used passage clustering as a means to identify answer groups corresponding to a question and apply them in a question answering task. We extend this a step further by looking at related questions within a conversation. For this purpose, we expand the definition of Reciprocal Rank Fusion (RRF) and use this to identify pertinent history passages for such questions. Updated question rewrites generated using these passages are then used to improve the conversational search task. In addition to being the first work that looks at answer relationships, our specific contributions can be summarized as follows: (1) Creation of new datasets with passage similarity and answer type information; (2) Effective passage similarity clustering models using unsupervised representations and weak supervision methods; (3) Applying the passage similarity/clustering information to diversification framework; (4) Identifying good response history candidates using answer passage clustering for the conversational search task