55 research outputs found
Word Embedding based Correlation Model for Question/Answer Matching
With the development of community based question answering (Q&A) services, a
large scale of Q&A archives have been accumulated and are an important
information and knowledge resource on the web. Question and answer matching has
been attached much importance to for its ability to reuse knowledge stored in
these systems: it can be useful in enhancing user experience with recurrent
questions. In this paper, we try to improve the matching accuracy by overcoming
the lexical gap between question and answer pairs. A Word Embedding based
Correlation (WEC) model is proposed by integrating advantages of both the
translation model and word embedding, given a random pair of words, WEC can
score their co-occurrence probability in Q&A pairs and it can also leverage the
continuity and smoothness of continuous space word representation to deal with
new pairs of words that are rare in the training parallel text. An experimental
study on Yahoo! Answers dataset and Baidu Zhidao dataset shows this new
method's promising potential.Comment: 8 pages, 2 figure
Understanding and exploiting user intent in community question answering
A number of Community Question Answering (CQA) services have emerged
and proliferated in the last decade. Typical examples include Yahoo! Answers,
WikiAnswers, and also domain-specific forums like StackOverflow. These services
help users obtain information from a community - a user can post his or her questions which may then be answered by other users. Such a paradigm of information seeking is particularly appealing when the question cannot be answered directly by Web search engines due to the unavailability of relevant online content. However, question submitted to a CQA service are often colloquial and ambiguous. An accurate understanding of the intent behind a question is important for satisfying the user's information need more effectively and efficiently.
In this thesis, we analyse the intent of each question in CQA by classifying
it into five dimensions, namely: subjectivity, locality, navigationality, procedurality,
and causality. By making use of advanced machine learning techniques, such
as Co-Training and PU-Learning, we are able to attain consistent and significant
classification improvements over the state-of-the-art in this area. In addition to
the textual features, a variety of metadata features (such as the category where
the question was posted to) are used to model a user's intent, which in turn help
the CQA service to perform better in finding similar questions, identifying relevant
answers, and recommending the most relevant answerers.
We validate the usefulness of user intent in two different CQA tasks. Our
first application is question retrieval, where we present a hybrid approach which
blends several language modelling techniques, namely, the classic (query-likelihood)
language model, the state-of-the-art translation-based language model, and our
proposed intent-based language model. Our second application is answer validation, where we present a two-stage model which first ranks similar questions by using
our proposed hybrid approach, and then validates whether the answer of the top
candidate can be served as an answer to a new question by leveraging sentiment
analysis, query quality assessment, and search lists validation
Understanding and exploiting user intent in community question answering
A number of Community Question Answering (CQA) services have emerged
and proliferated in the last decade. Typical examples include Yahoo! Answers,
WikiAnswers, and also domain-specific forums like StackOverflow. These services
help users obtain information from a community - a user can post his or her questions which may then be answered by other users. Such a paradigm of information seeking is particularly appealing when the question cannot be answered directly by Web search engines due to the unavailability of relevant online content. However, question submitted to a CQA service are often colloquial and ambiguous. An accurate understanding of the intent behind a question is important for satisfying the user's information need more effectively and efficiently.
In this thesis, we analyse the intent of each question in CQA by classifying
it into five dimensions, namely: subjectivity, locality, navigationality, procedurality,
and causality. By making use of advanced machine learning techniques, such
as Co-Training and PU-Learning, we are able to attain consistent and significant
classification improvements over the state-of-the-art in this area. In addition to
the textual features, a variety of metadata features (such as the category where
the question was posted to) are used to model a user's intent, which in turn help
the CQA service to perform better in finding similar questions, identifying relevant
answers, and recommending the most relevant answerers.
We validate the usefulness of user intent in two different CQA tasks. Our
first application is question retrieval, where we present a hybrid approach which
blends several language modelling techniques, namely, the classic (query-likelihood)
language model, the state-of-the-art translation-based language model, and our
proposed intent-based language model. Our second application is answer validation, where we present a two-stage model which first ranks similar questions by using
our proposed hybrid approach, and then validates whether the answer of the top
candidate can be served as an answer to a new question by leveraging sentiment
analysis, query quality assessment, and search lists validation
Retrieving questions and answers in community-based question answering services
Ph.DDOCTOR OF PHILOSOPH
Beyond Question Answering: Understanding the Information Need of the User
Intelligent interaction between humans and computers has been a dream of artificial intelligence since the beginning of digital era and one of the original motivations behind the creation of artificial intelligence. A key step towards the achievement of such an ambitious goal is to enable the Question Answering systems understand the information need of the user.
In this thesis, we attempt to enable the QA system's ability to understand the user's information need by three approaches. First, an clarification question generation method is proposed to help the user clarify the information need and bridge information need gap between QA system and the user. Next, a translation based model is obtained from the large archives of Community Question Answering data, to model the information need behind a question and boost the performance of question recommendation. Finally, a fine-grained classification framework is proposed to enable the systems to recommend answered questions based on information need satisfaction
Applying semantic analysis to finding similar questions in community question answering systems
Master'sMASTER OF SCIENC
FROM USER-GENERATED-CONTENT TO STRUCTURED KNOWLEDGE EXPLORING MULTI-ASPECT SENTENCE REPRESENTATION AND PROTOTYPE HIERARCHY BASED CATEGORIZATION FOR ORGANIZATION OF TEXT COLLECTIONS
Ph.DDOCTOR OF PHILOSOPH
- …