21 research outputs found

    A Comparative Study of Machine Learning Approaches- SVM and LS-SVM using a Web Search Engine Based Application

    Get PDF
    Abstract — Semantic similarity refers to the concept by which a set of documents or words within the documents are assigned a weight based on their meaning. The accurate measurement of such similarity plays important roles in Natural language Processing and Information Retrieval tasks such as Query Expansion and Word Sense Disambiguation. Page counts and snippets retrieved by the search engines help to measure the semantic similarity between two words. Different similarity scores are calculated for the queried conjunctive word. Lexical pattern extraction algorithm identifies the patterns from the snippets. Two machine learning approaches- Support Vector Machine and Latent Structural Support Vector Machine are used for measuring semantic similarity between two words by combining the similarity scores from page counts and cluster of patterns retrieved from the snippets. A comparative study is made between the similarity results from both the machines. SVM classifies between synonymous and non-synonymous words using maximum marginal hyper plane. LS-SVM shows a much more accurate result by considering the latent values in the dataset

    A graph theory-based online keywords model for image semantic extraction

    Get PDF
    Image captions and keywords are the semantic descriptions of the dominant visual content features in a targeted visual scene. Traditional image keywords extraction processes involves intensive data- and knowledge-level operations by using computer vision and machine learning techniques. However, recent studies have shown that the gap between pixel-level processing and the semantic definition of an image is difficult to bridge by counting only the visual features. In this paper, augmented image semantic information has been introduced through harnessing functions of online image search engines. A graphical model named as the “Head-words Relationship Network” (HWRN) has been devised for tackling the aforementioned problems. The proposed algorithm starts from retrieving online images of similarly visual features from the input image, the text content of their hosting webpages are then extracted, classified and analysed for semantic clues. The relationships of those “head-words” from relevant webpages can then be modelled and quantified using linguistic tools. Experiments on the prototype system have proven the effectiveness of this novel approach. Performance evaluation over benchmarking state-of-the-art approaches has also shown satisfactory results and promising future applications

    A Review on Resemblance of User Profiles in Social Networks using Similarity Measures

    Get PDF
    Online Social Networking is increasing at a fast rate. There are lots of profiles of the users and there is too much resemblance between the user profiles which can help recruiter’s to select the best candidates for the Job Profile. Now, each similarity measure has its own applicability and best suited to a particular type of attribute values and if these measures are collectively combined then it can help us to find the best resemblance among the user profile ,the result of which matches to the actual result. In this paper, the discussion of the past studies is done and how our research is proposing a framework for finding the resemblance is being discussed.

    Computing semantic similarity measure between words using web search engine

    Get PDF
    Semantic Similarity measures between words plays an important role in information retrieval, natural language processing and in various tasks on the web. In this paper, we have proposed a Modified Pattern Extraction Algorithm to compute the supervised semantic similarity measure between the words by combining both page count method and web snippets method. Four association measures are used to find semantic similarity between words in page count method using web search engines. We use a Sequential Minimal Optimization (SMO) support vector machines (SVM) to find the optimal combination of page counts-based similarity scores and top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous word-pairs and non-synonymous word-pairs. The proposed Modified Pattern Extraction Algorithm outperforms by 89.8 percent of correlation value

    Automatic Discovery and Ranking of Synonyms for Search Keywords in the Web

    Get PDF
    Search engines are an indispensable part of a web user's life. A vast majority of these web users experience difficulties caused by the keyword-based search engines such as inaccurate results for queries and irrelevant URLs even though the given keyword is present in them. Also, relevant URLs may be lost as they may have the synonym of the keyword and not the original one. This condition is known as the polysemy problem. To alleviate these problems, we propose an algorithm called automatic discovery and ranking of synonyms for search keywords in the web (ADRS). The proposed method generates a list of candidate synonyms for individual keywords by employing the relevance factor of the URLs associated with the synonyms. Then, ranking of these candidate synonyms is done using co-occurrence frequencies and various page count-based measures. One of the major advantages of our algorithm is that it is highly scalable which makes it applicable to online data on the dynamic, domain-independent and unstructured World Wide Web. The experimental results show that the best results are obtained using the proposed algorithm with WebJaccard


    Get PDF
    ABSTRACT Semantic similarity measures between words play an important role in community minin

    Evaluando la similitud semántica en textos cortos usando el contexto relacionado y DISCO

    Get PDF
    Medir el grado de similitud semántica entre textos o conceptos es una tarea desafiante e importante en varias aplicaciones de Recuperación de Información y Procesamiento del Lenguaje Natural. Dada la importancia de la tarea, en este artículo se propone un método para medir la similitud semántica entre un par de oraciones usando la técnica “Hipótesis Distribucional”, para recuperar desde la Web, contextos relacionados con el conjunto de entrenamiento. Los contextos relacionados son un componente importante para calcular la similitud semántica entre pares de oraciones. En el artículo se presentan los resultados obtenidos desde un conjunto de entrenamiento estándar. La evaluación empírica muestra que el enfoque propuesto supera el baseline, así como algunos métodos propuestos previamente en el conjunto de entrenamiento estándar

    Web search engine based semantic similarity measure between words using pattern retrieval algorithm

    Get PDF
    Semantic Similarity measures plays an important role in information retrieval, natural language processing and various tasks on web such as relation extraction, community mining, document clustering, and automatic meta-data extraction. In this paper, we have proposed a Pattern Retrieval Algorithm [PRA] to compute the semantic similarity measure between the words by combining both page count method and web snippets method. Four association measures are used to find semantic similarity between words in page count method using web search engines. We use a Sequential Minimal Optimization (SMO) support vector machines (SVM) to find the optimal combination of page counts-based similarity scores and top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous word-pairs and nonsynonymous word-pairs. The proposed approach aims to improve the Correlation values, Precision, Recall, and F-measures, compared to the existing methods. The proposed algorithm outperforms by 89.8% of correlation value

    Smart Image Search System Using Personalized Semantic Search Method

    Get PDF
    Due to the emerge in huge numbers of information on the internet nowadays, search technologies are widely used in various fields. Achieving the most relevant search result for the users becomes a big challenge now. While the traditional semantic search technologies seem to achieve the most relevant search result, however, it faces two problems: one is the one-size-fits-all problem, and another is low efficiency. The purpose of this research is to build a Smart Image Search System by using the personalized semantic search method to solve those problems. The personalized semantic search method makes the search system avoids the one-size-fits-all issue, and increase the efficiency. In the Smart Image Search System, the personalized semantic search method provides users three options to search. They are non-option search, general-option search, and private-option search. Each option search has its specific user needs to achieve the most relevant results. Those options are adopted to solve the one-size-fits-all problem. Also, based on the idea of semantic context concept, the personalized semantic method uses two approaches to increase the search efficiency. First, it applies Apache OpenNLP Library to avoid useless words. Second, it considers the searchers’ actions such as click and feedbacks to affect the associated words and associated weight. The Smart Image Search System uses the associated words and associated weight to calculate the relativity for the search results. This approach makes the Smart Image Search System becomes a self-improved system. Smart Image Search System is implemented based on the presented methodology and design. As a result of current research on semantic search technologies, we conclude that the Smart Image Search System can avoid useless words, fix the one-size-fits-all problem, and self-improve its relevancy