32,879 research outputs found
Using Word Embeddings to Retrieve Semantically Similar Questions in Community Question Answering
International audienceThis paper focuses on question retrieval which is a crucial and tricky task in Community Question Answering (cQA). Question retrieval aims at finding historical questions that are semantically equivalent to the queried ones, assuming that the answers to the similar questions should also answer the new ones. The major challenges are the lexical gap problem as well as the verboseness in natural language. Most existing methods measure the similarity between questions based on the bag-of-words (BOWs) representation capturing no semantics between words. In this paper, we rely on word embeddings and TF-IDF for a meaningful vector representation of the questions. The similarity between questions is measured using cosine similarity based on their vector-based word representations. Experiments carried out on a real world data set from Yahoo! Answers show that our method is competetive
Enhancing Question Retrieval in Community Question Answering Using Word Embeddings
International audienceCommunity Question Answering (CQA) services have evolved into a popular way of online information seeking, where users can interact and exchange knowledge in the form of questions and answers. In this paper, we study the problem of finding historical questions that are semantically equivalent to the queried ones, assuming that the answers to the similar questions should also answer the new ones. The major challenge of question retrieval is the word mismatch problem between questions, as users can formulate the same question using different wording. Most existing methods measure the similarity between questions based on the bag-of-words (BOWs) representation capturing no semantics between words. Therefore, this study proposes to use word embeddings, which can capture semantic and syntactic information from contexts, to vectorize the questions. The questions are clustered using Kmeans to speed up the search and ranking tasks. The similarity between the questions is measured using cosine similarity based on their weighted continuous valued vectors. We run our experiments on real world data set from Yahoo! Answers in English and Arabic to show the efficiency and generality of our proposed method
Semantic keyword search for expert witness discovery
In the last few years, there has been an increase in the amount of information stored in semantically enriched knowledge bases, represented in RDF format. These improve the accuracy of search results when the queries are semantically formal. However framing such queries is inappropriate for inexperience users because they require specialist knowledge of ontology and syntax. In this paper, we explore an approach that automates the process of converting a conventional keyword search into a semantically formal query in order to find an expert on a semantically enriched knowledge base. A case study on expert witness discovery for the resolution of a legal dispute is chosen as the domain of interest and a system named SKengine is implemented to illustrate the approach. As well as providing an easy user interface, our experiment shows that SKengine can retrieve expert witness information with higher precision and higher recall, compared with the other system, with the same interface, implemented by a vector model approach
Semantic keyword search for expert witness discovery
In the last few years, there has been an increase in the amount of information stored in semantically enriched knowledge bases, represented in RDF format. These improve the accuracy of search results when the queries are semantically formal. However framing such queries is inappropriate for inexperience users because they require specialist knowledge of ontology and syntax. In this paper, we explore an approach that automates the process of converting a conventional keyword search into a semantically formal query in order to find an expert on a semantically enriched knowledge base. A case study on expert witness discovery for the resolution of a legal dispute is chosen as the domain of interest and a system named SKengine is implemented to illustrate the approach. As well as providing an easy user interface, our experiment shows that SKengine can retrieve expert witness information with higher precision and higher recall, compared with the other system, with the same interface, implemented by a vector model approach
Piloting an Empirical Study on Measures for Workflow Similarity
Service discovery of state dependent services has to take workflow aspects into account. To increase the usability of a service discovery, the result list of services should be ordered with regard to the relevance of the services. Means of ordering a list of workflows due to their similarity with regard to a query are missing. This paper presents a pilot of an empirical study on the influence of different measures on workflow similarity. It turns out that, although preliminary, relations between different measures are indicated and that a similarity definition depends on the application scenario in which the service discovery is applied
Word Embedding based Correlation Model for Question/Answer Matching
With the development of community based question answering (Q&A) services, a
large scale of Q&A archives have been accumulated and are an important
information and knowledge resource on the web. Question and answer matching has
been attached much importance to for its ability to reuse knowledge stored in
these systems: it can be useful in enhancing user experience with recurrent
questions. In this paper, we try to improve the matching accuracy by overcoming
the lexical gap between question and answer pairs. A Word Embedding based
Correlation (WEC) model is proposed by integrating advantages of both the
translation model and word embedding, given a random pair of words, WEC can
score their co-occurrence probability in Q&A pairs and it can also leverage the
continuity and smoothness of continuous space word representation to deal with
new pairs of words that are rare in the training parallel text. An experimental
study on Yahoo! Answers dataset and Baidu Zhidao dataset shows this new
method's promising potential.Comment: 8 pages, 2 figure
- …