105 research outputs found

    Self-adaptive GA, quantitative semantic similarity measures and ontology-based text clustering

    Get PDF
    As the common clustering algorithms use vector space model (VSM) to represent document, the conceptual relationships between related terms which do not co-occur literally are ignored. A genetic algorithm-based clustering technique, named GA clustering, in conjunction with ontology is proposed in this article to overcome this problem. In general, the ontology measures can be partitioned into two categories: thesaurus-based methods and corpus-based methods. We take advantage of the hierarchical structure and the broad coverage taxonomy of Wordnet as the thesaurus-based ontology. However, the corpus-based method is rather complicated to handle in practical application. We propose a transformed latent semantic analysis (LSA) model as the corpus-based method in this paper. Moreover, two hybrid strategies, the combinations of the various similarity measures, are implemented in the clustering experiments. The results show that our GA clustering algorithm, in conjunction with the thesaurus-based and the LSA-based method, apparently outperforms that with other similarity measures. Moreover, the superiority of the GA clustering algorithm proposed over the commonly used k-means algorithm and the standard GA is demonstrated by the improvements of the clustering performance

    Web Page Annotation Using Web Usage Mining and Domain Knowledge Ontology

    Get PDF
    Today’s world the growth of the WWW has increased tremendously, the user is totally relying on web for information. Search engine provides the result pages to the user but all are not relevant so the challenging task is extracting the pages from web and provide to the user. WUM is an approach to extract knowledge and use it to the different purposes. In this paper new semantic approach is proposed based on WUM and Domain Knowledge Ontology. Ontology database preparation, it is also challenging task in this project

    TAIP: an anytime algorithm for allocating student teams to internship programs

    Full text link
    In scenarios that require teamwork, we usually have at hand a variety of specific tasks, for which we need to form a team in order to carry out each one. Here we target the problem of matching teams with tasks within the context of education, and specifically in the context of forming teams of students and allocating them to internship programs. First we provide a formalization of the Team Allocation for Internship Programs Problem, and show the computational hardness of solving it optimally. Thereafter, we propose TAIP, a heuristic algorithm that generates an initial team allocation which later on attempts to improve in an iterative process. Moreover, we conduct a systematic evaluation to show that TAIP reaches optimality, and outperforms CPLEX in terms of time.Comment: 10 pages, 7 figure

    A Review on Computing Semantic Similarity of Concepts in Knowledge Graphs

    Get PDF
    Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between them is based on the likeness of their meaning or semantic content as opposed to similarity which can be estimated regarding their syntactical representation (e.g. their string format). One of the drawbacks of conventional knowledge-based approaches (e.g. path or lch) in addressing such task is that the semantic similarity of any two concepts with the same path length is the same (uniform distance problem).To propose a weighted path length (wpath) method to combine both path length and IC in measuring the semantic similarity between concepts. The IC of two conceptsďż˝ LCS is used to weight their shortest path length so that those concept pairs having same path length can have different semantic similarity score if they have different LCS

    A Synonym Contextual-based Process for Handling Word Similarity in Malay Sentence

    Get PDF
    In this paper, we attempt to describe a method of finding word similarity within a Malay sentence. The list of similarity word produced is based on searching the appropriate context within a Malay  sentence. The context is determined by seeking rules from a rule-based phrase database. In implementing this approach, a working prototype application is described which can be used as a tool for improving writing text in Malay language, especially well adapted toward the requirements of teaching and learning this language in primary and secondary schools. The overall concept presented in this paper will assist us to identify clearly what are the basic components and their specifications that should exist in the process. On the other hand, it is also important to point out the possible drawbacks and constraints of the practical approach suggested

    An overview of textual semantic similarity measures based on web intelligence

    Get PDF
    Computing the semantic similarity between terms (or short text expressions) that have the same meaning but which are not lexicographically similar is a key challenge in many computer related fields. The problem is that traditional approaches to semantic similarity measurement are not suitable for all situations, for example, many of them often fail to deal with terms not covered by synonym dictionaries or are not able to cope with acronyms, abbreviations, buzzwords, brand names, proper nouns, and so on. In this paper, we present and evaluate a collection of emerging techniques developed to avoid this problem. These techniques use some kinds of web intelligence to determine the degree of similarity between text expressions. These techniques implement a variety of paradigms including the study of co-occurrence, text snippet comparison, frequent pattern finding, or search log analysis. The goal is to substitute the traditional techniques where necessary

    A Review on Identification of Contextual Similar Sentences

    Get PDF
    The task of identifying contextual similar sentences plays a crucial role in various natural language processing applications such as information retrieval, paraphrase detection, and question answering systems. This paper presents a comprehensive review of the methodologies, techniques, and advancements in the identification of contextual similar sentences. Beginning with an overview of the importance and challenges associated with this task, the paper delves into the various approaches employed, including traditional similarity metrics, deep learning architectures, and transformer-based models. Furthermore, the review explores different datasets and evaluation metrics used to assess the performance of these methods. Additionally, the paper discusses recent trends, emerging research directions, and potential applications in the field. By synthesizing existing literature, this review aims to provide researchers and practitioners with insights into the state-of-the-art techniques and future avenues for advancing the identification of contextual similar sentences

    Using Semantic Technologies in Digital Libraries- A Roadmap to Quality Evaluation

    Get PDF
    Abstract. In digital libraries semantic techniques are often deployed to reduce the expensive manual overhead for indexing documents, maintaining metadata, or caching for future search. However, using such techniques may cause a decrease in a collection’s quality due to their statistical nature. Since data quality is a major concern in digital libraries, it is important to be able to measure the (loss of) quality of metadata automatically generated by semantic techniques. In this paper we present a user study based on a typical semantic technique use
    • …
    corecore