1,344 research outputs found

    ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters

    Get PDF
    To bridge the gap between the capabilities of the state-of-the-art in factoid question answering (QA) and what users ask, we need large datasets of real user questions that capture the various question phenomena users are interested in, and the diverse ways in which these questions are formulated. We introduce ComQA, a large dataset of real user questions that exhibit different challenging aspects such as compositionality, temporal reasoning, and comparisons. ComQA questions come from the WikiAnswers community QA platform, which typically contains questions that are not satisfactorily answerable by existing search engine technology. Through a large crowdsourcing effort, we clean the question dataset, group questions into paraphrase clusters, and annotate clusters with their answers. ComQA contains 11,214 questions grouped into 4,834 paraphrase clusters. We detail the process of constructing ComQA, including the measures taken to ensure its high quality while making effective use of crowdsourcing. We also present an extensive analysis of the dataset and the results achieved by state-of-the-art systems on ComQA, demonstrating that our dataset can be a driver of future research on QA.Comment: 11 pages, NAACL 201

    Answering Causal Questions and Developing Tool Support

    Get PDF

    Aggregated search: a new information retrieval paradigm

    Get PDF
    International audienceTraditional search engines return ranked lists of search results. It is up to the user to scroll this list, scan within different documents and assemble information that fulfill his/her information need. Aggregated search represents a new class of approaches where the information is not only retrieved but also assembled. This is the current evolution in Web search, where diverse content (images, videos, ...) and relational content (similar entities, features) are included in search results. In this survey, we propose a simple analysis framework for aggregated search and an overview of existing work. We start with related work in related domains such as federated search, natural language generation and question answering. Then we focus on more recent trends namely cross vertical aggregated search and relational aggregated search which are already present in current Web search

    Understanding and exploiting user intent in community question answering

    Get PDF
    A number of Community Question Answering (CQA) services have emerged and proliferated in the last decade. Typical examples include Yahoo! Answers, WikiAnswers, and also domain-specific forums like StackOverflow. These services help users obtain information from a community - a user can post his or her questions which may then be answered by other users. Such a paradigm of information seeking is particularly appealing when the question cannot be answered directly by Web search engines due to the unavailability of relevant online content. However, question submitted to a CQA service are often colloquial and ambiguous. An accurate understanding of the intent behind a question is important for satisfying the user's information need more effectively and efficiently. In this thesis, we analyse the intent of each question in CQA by classifying it into five dimensions, namely: subjectivity, locality, navigationality, procedurality, and causality. By making use of advanced machine learning techniques, such as Co-Training and PU-Learning, we are able to attain consistent and significant classification improvements over the state-of-the-art in this area. In addition to the textual features, a variety of metadata features (such as the category where the question was posted to) are used to model a user's intent, which in turn help the CQA service to perform better in finding similar questions, identifying relevant answers, and recommending the most relevant answerers. We validate the usefulness of user intent in two different CQA tasks. Our first application is question retrieval, where we present a hybrid approach which blends several language modelling techniques, namely, the classic (query-likelihood) language model, the state-of-the-art translation-based language model, and our proposed intent-based language model. Our second application is answer validation, where we present a two-stage model which first ranks similar questions by using our proposed hybrid approach, and then validates whether the answer of the top candidate can be served as an answer to a new question by leveraging sentiment analysis, query quality assessment, and search lists validation

    Representation and Inference for Open-Domain Question Answering: Strength and Limits of two Italian Semantic Lexicons

    Get PDF
    La ricerca descritta nella tesi è stata dedicata alla costruzione di un prototipo di sistema di Question Answering per la lingua italiana. Il prototipo è stato utilizzato come ambiente di valutazione dell’utilità dell’informazione codificata in due lessici semantici computazionali, ItalWordNet e SIMPLE-CLIPS. Il fine è quello di metter in evidenza ipunti di forza e ilimiti della rappresentazione dell’informazione proposta dai due lessici

    Intent classification for a management conversational assistant

    Get PDF
    Intent classification is an essential step in processing user input to a conversational assistant. This work investigates techniques of intent classification of chat messages used for communication among software development teams with the aim of building an intent classifier for a management conversational assistant integrated into modern communication platforms used by developers. Experiments conducted using rule-based and common ML techniques have shown that careful choice of classification features has a significant impact on performance, and the best performing model was able to obtain a classification accuracy of 72%. A set of techniques for extracting useful features for text classification in the software engineering domain was also implemented and tested
    • …
    corecore