2 research outputs found

    Automatic Discovery and Ranking of Synonyms for Search Keywords in the Web

    Get PDF
    Search engines are an indispensable part of a web user's life. A vast majority of these web users experience difficulties caused by the keyword-based search engines such as inaccurate results for queries and irrelevant URLs even though the given keyword is present in them. Also, relevant URLs may be lost as they may have the synonym of the keyword and not the original one. This condition is known as the polysemy problem. To alleviate these problems, we propose an algorithm called automatic discovery and ranking of synonyms for search keywords in the web (ADRS). The proposed method generates a list of candidate synonyms for individual keywords by employing the relevance factor of the URLs associated with the synonyms. Then, ranking of these candidate synonyms is done using co-occurrence frequencies and various page count-based measures. One of the major advantages of our algorithm is that it is highly scalable which makes it applicable to online data on the dynamic, domain-independent and unstructured World Wide Web. The experimental results show that the best results are obtained using the proposed algorithm with WebJaccard

    Graph-based Cluster Analysis to Identify Similar Questions: A Design Science Approach

    Get PDF
    Social question answering (SQA) services allow users to clarify their queries by asking questions and obtaining answers from other users. To enhance the responsiveness of such services, one can identify similar questions and, thereafter, return the answers available. However, identifying similar questions is difficult because of the complex language structure of user-generated questions. For this reason, we developed an approach to cluster similar questions based on a web of social relationships among the questions, the answers, the askers, and the answerers. To do so, we designed a graph-based cluster analysis using design science research guidelines. In evaluating the results, we found that the proposed graph-based cluster analysis is more promising than baseline methods
    corecore