6,220 research outputs found

    Word Embedding based Correlation Model for Question/Answer Matching

    Full text link
    With the development of community based question answering (Q&A) services, a large scale of Q&A archives have been accumulated and are an important information and knowledge resource on the web. Question and answer matching has been attached much importance to for its ability to reuse knowledge stored in these systems: it can be useful in enhancing user experience with recurrent questions. In this paper, we try to improve the matching accuracy by overcoming the lexical gap between question and answer pairs. A Word Embedding based Correlation (WEC) model is proposed by integrating advantages of both the translation model and word embedding, given a random pair of words, WEC can score their co-occurrence probability in Q&A pairs and it can also leverage the continuity and smoothness of continuous space word representation to deal with new pairs of words that are rare in the training parallel text. An experimental study on Yahoo! Answers dataset and Baidu Zhidao dataset shows this new method's promising potential.Comment: 8 pages, 2 figure

    Mining question-answer pairs from web forum: a survey of challenges and resolutions

    Get PDF
    Internet forums, which are also known as discussion boards, are popular web applications. Members of the board discuss issues and share ideas to form a community within the board, and as a result generate huge amount of content on different topics on daily basis. Interest in information extraction and knowledge discovery from such sources has been on the increase in the research community. A number of factors are limiting the potentiality of mining knowledge from forums. Lexical chasm or lexical gap that renders some Natural Language Processing techniques (NLP) less effective, Informal tone that creates noisy data, drifting of discussion topic that prevents focused mining and asynchronous issue that makes it difficult to establish post-reply relationship are some of the problems that need to be addressed. This survey introduces these challenges within the framework of question answering. The survey provides description of the problems; cites and explores useful publications to the reader for further examination; provides an overview of resolution strategies and findings relevant to the challenges

    Ontologies across disciplines

    Get PDF

    A Word Embedding based Method for Question Retrieval in Community Question Answering

    Get PDF
    International audienceCommunity Question Answering (cQA) continues to gain momentum owing to the unceasing rise of user-generated content that dominates the web. CQA are platforms that enable people with different backgrounds to share knowledge by freely asking and answering each other. In this paper, we focus on question retrieval which is deemed to be a key task in cQA. It aims at finding similar archived questions given a new query, assuming that the answers to the similar questions should also answer the new one. This is known to be a challenging task due to the ver-boseness in natural language and the word mismatch between the questions. Most traditional methods measure the similarity between questions based on the bag-of-words (BOWs) representation capturing no semantics between words. In this paper , we rely on word representation to capture the words semantic information in language vector space. Questions are then ranked using cosine similarity based on the vector-based word representation for each question. Experiments conducted on large-scale cQA data show that our method gives promising results
    corecore