32,879 research outputs found

    Using Word Embeddings to Retrieve Semantically Similar Questions in Community Question Answering

    Get PDF
    International audienceThis paper focuses on question retrieval which is a crucial and tricky task in Community Question Answering (cQA). Question retrieval aims at finding historical questions that are semantically equivalent to the queried ones, assuming that the answers to the similar questions should also answer the new ones. The major challenges are the lexical gap problem as well as the verboseness in natural language. Most existing methods measure the similarity between questions based on the bag-of-words (BOWs) representation capturing no semantics between words. In this paper, we rely on word embeddings and TF-IDF for a meaningful vector representation of the questions. The similarity between questions is measured using cosine similarity based on their vector-based word representations. Experiments carried out on a real world data set from Yahoo! Answers show that our method is competetive

    Enhancing Question Retrieval in Community Question Answering Using Word Embeddings

    Get PDF
    International audienceCommunity Question Answering (CQA) services have evolved into a popular way of online information seeking, where users can interact and exchange knowledge in the form of questions and answers. In this paper, we study the problem of finding historical questions that are semantically equivalent to the queried ones, assuming that the answers to the similar questions should also answer the new ones. The major challenge of question retrieval is the word mismatch problem between questions, as users can formulate the same question using different wording. Most existing methods measure the similarity between questions based on the bag-of-words (BOWs) representation capturing no semantics between words. Therefore, this study proposes to use word embeddings, which can capture semantic and syntactic information from contexts, to vectorize the questions. The questions are clustered using Kmeans to speed up the search and ranking tasks. The similarity between the questions is measured using cosine similarity based on their weighted continuous valued vectors. We run our experiments on real world data set from Yahoo! Answers in English and Arabic to show the efficiency and generality of our proposed method

    Semantic keyword search for expert witness discovery

    No full text
    In the last few years, there has been an increase in the amount of information stored in semantically enriched knowledge bases, represented in RDF format. These improve the accuracy of search results when the queries are semantically formal. However framing such queries is inappropriate for inexperience users because they require specialist knowledge of ontology and syntax. In this paper, we explore an approach that automates the process of converting a conventional keyword search into a semantically formal query in order to find an expert on a semantically enriched knowledge base. A case study on expert witness discovery for the resolution of a legal dispute is chosen as the domain of interest and a system named SKengine is implemented to illustrate the approach. As well as providing an easy user interface, our experiment shows that SKengine can retrieve expert witness information with higher precision and higher recall, compared with the other system, with the same interface, implemented by a vector model approach

    Semantic keyword search for expert witness discovery

    Get PDF
    In the last few years, there has been an increase in the amount of information stored in semantically enriched knowledge bases, represented in RDF format. These improve the accuracy of search results when the queries are semantically formal. However framing such queries is inappropriate for inexperience users because they require specialist knowledge of ontology and syntax. In this paper, we explore an approach that automates the process of converting a conventional keyword search into a semantically formal query in order to find an expert on a semantically enriched knowledge base. A case study on expert witness discovery for the resolution of a legal dispute is chosen as the domain of interest and a system named SKengine is implemented to illustrate the approach. As well as providing an easy user interface, our experiment shows that SKengine can retrieve expert witness information with higher precision and higher recall, compared with the other system, with the same interface, implemented by a vector model approach

    Piloting an Empirical Study on Measures for Workflow Similarity

    Get PDF
    Service discovery of state dependent services has to take workflow aspects into account. To increase the usability of a service discovery, the result list of services should be ordered with regard to the relevance of the services. Means of ordering a list of workflows due to their similarity with regard to a query are missing. This paper presents a pilot of an empirical study on the influence of different measures on workflow similarity. It turns out that, although preliminary, relations between different measures are indicated and that a similarity definition depends on the application scenario in which the service discovery is applied

    Word Embedding based Correlation Model for Question/Answer Matching

    Full text link
    With the development of community based question answering (Q&A) services, a large scale of Q&A archives have been accumulated and are an important information and knowledge resource on the web. Question and answer matching has been attached much importance to for its ability to reuse knowledge stored in these systems: it can be useful in enhancing user experience with recurrent questions. In this paper, we try to improve the matching accuracy by overcoming the lexical gap between question and answer pairs. A Word Embedding based Correlation (WEC) model is proposed by integrating advantages of both the translation model and word embedding, given a random pair of words, WEC can score their co-occurrence probability in Q&A pairs and it can also leverage the continuity and smoothness of continuous space word representation to deal with new pairs of words that are rare in the training parallel text. An experimental study on Yahoo! Answers dataset and Baidu Zhidao dataset shows this new method's promising potential.Comment: 8 pages, 2 figure
    corecore