1,344 research outputs found
ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters
To bridge the gap between the capabilities of the state-of-the-art in factoid
question answering (QA) and what users ask, we need large datasets of real user
questions that capture the various question phenomena users are interested in,
and the diverse ways in which these questions are formulated. We introduce
ComQA, a large dataset of real user questions that exhibit different
challenging aspects such as compositionality, temporal reasoning, and
comparisons. ComQA questions come from the WikiAnswers community QA platform,
which typically contains questions that are not satisfactorily answerable by
existing search engine technology. Through a large crowdsourcing effort, we
clean the question dataset, group questions into paraphrase clusters, and
annotate clusters with their answers. ComQA contains 11,214 questions grouped
into 4,834 paraphrase clusters. We detail the process of constructing ComQA,
including the measures taken to ensure its high quality while making effective
use of crowdsourcing. We also present an extensive analysis of the dataset and
the results achieved by state-of-the-art systems on ComQA, demonstrating that
our dataset can be a driver of future research on QA.Comment: 11 pages, NAACL 201
Aggregated search: a new information retrieval paradigm
International audienceTraditional search engines return ranked lists of search results. It is up to the user to scroll this list, scan within different documents and assemble information that fulfill his/her information need. Aggregated search represents a new class of approaches where the information is not only retrieved but also assembled. This is the current evolution in Web search, where diverse content (images, videos, ...) and relational content (similar entities, features) are included in search results. In this survey, we propose a simple analysis framework for aggregated search and an overview of existing work. We start with related work in related domains such as federated search, natural language generation and question answering. Then we focus on more recent trends namely cross vertical aggregated search and relational aggregated search which are already present in current Web search
Understanding and exploiting user intent in community question answering
A number of Community Question Answering (CQA) services have emerged
and proliferated in the last decade. Typical examples include Yahoo! Answers,
WikiAnswers, and also domain-specific forums like StackOverflow. These services
help users obtain information from a community - a user can post his or her questions which may then be answered by other users. Such a paradigm of information seeking is particularly appealing when the question cannot be answered directly by Web search engines due to the unavailability of relevant online content. However, question submitted to a CQA service are often colloquial and ambiguous. An accurate understanding of the intent behind a question is important for satisfying the user's information need more effectively and efficiently.
In this thesis, we analyse the intent of each question in CQA by classifying
it into five dimensions, namely: subjectivity, locality, navigationality, procedurality,
and causality. By making use of advanced machine learning techniques, such
as Co-Training and PU-Learning, we are able to attain consistent and significant
classification improvements over the state-of-the-art in this area. In addition to
the textual features, a variety of metadata features (such as the category where
the question was posted to) are used to model a user's intent, which in turn help
the CQA service to perform better in finding similar questions, identifying relevant
answers, and recommending the most relevant answerers.
We validate the usefulness of user intent in two different CQA tasks. Our
first application is question retrieval, where we present a hybrid approach which
blends several language modelling techniques, namely, the classic (query-likelihood)
language model, the state-of-the-art translation-based language model, and our
proposed intent-based language model. Our second application is answer validation, where we present a two-stage model which first ranks similar questions by using
our proposed hybrid approach, and then validates whether the answer of the top
candidate can be served as an answer to a new question by leveraging sentiment
analysis, query quality assessment, and search lists validation
Representation and Inference for Open-Domain Question Answering: Strength and Limits of two Italian Semantic Lexicons
La ricerca descritta nella tesi è stata dedicata alla costruzione di un prototipo di sistema di Question Answering per la lingua italiana. Il prototipo è stato utilizzato come ambiente di valutazione dell’utilità dell’informazione codificata in due lessici semantici computazionali, ItalWordNet e SIMPLE-CLIPS. Il fine è quello di metter in evidenza ipunti di forza e ilimiti della rappresentazione dell’informazione proposta dai due lessici
Intent classification for a management conversational assistant
Intent classification is an essential step in processing user input to a conversational assistant. This work investigates techniques of intent classification of chat messages used for communication among software development teams with the aim of building an intent classifier for a management conversational assistant integrated into modern communication platforms used by developers. Experiments conducted using rule-based and common ML techniques have shown that careful choice of classification features has a significant impact on performance, and the best performing model was able to obtain a classification accuracy of 72%. A set of techniques for extracting useful features for text classification in the software engineering domain was also implemented and tested
- …