23,322 research outputs found

    A proposal for annotation, semantic similarity and classification of textual documents

    Get PDF
    The original publication is available at www.springerlink.comInternational audienceIn this paper, we present an approach for classifying documents based on the notion of a semantic similarity and the effective representation of the content of the documents. The content of a document is annotated and the resulting annotation is represented by a labeled tree whose nodes and edges are represented by concepts lying within a domain ontology. A reasoning process may be carried out on annotation trees, allowing the comparison of documents between each others, for classification or information retrieval purposes. An algorithm for classifying documents with respect to semantic similarity and a discussion conclude the paper

    Self-adaptive GA, quantitative semantic similarity measures and ontology-based text clustering

    Get PDF
    As the common clustering algorithms use vector space model (VSM) to represent document, the conceptual relationships between related terms which do not co-occur literally are ignored. A genetic algorithm-based clustering technique, named GA clustering, in conjunction with ontology is proposed in this article to overcome this problem. In general, the ontology measures can be partitioned into two categories: thesaurus-based methods and corpus-based methods. We take advantage of the hierarchical structure and the broad coverage taxonomy of Wordnet as the thesaurus-based ontology. However, the corpus-based method is rather complicated to handle in practical application. We propose a transformed latent semantic analysis (LSA) model as the corpus-based method in this paper. Moreover, two hybrid strategies, the combinations of the various similarity measures, are implemented in the clustering experiments. The results show that our GA clustering algorithm, in conjunction with the thesaurus-based and the LSA-based method, apparently outperforms that with other similarity measures. Moreover, the superiority of the GA clustering algorithm proposed over the commonly used k-means algorithm and the standard GA is demonstrated by the improvements of the clustering performance

    A Large Scale Dataset for the Evaluation of Ontology Matching Systems

    Get PDF
    Recently, the number of ontology matching techniques and systems has increased significantly. This makes the issue of their evaluation and comparison more severe. One of the challenges of the ontology matching evaluation is in building large scale evaluation datasets. In fact, the number of possible correspondences between two ontologies grows quadratically with respect to the numbers of entities in these ontologies. This often makes the manual construction of the evaluation datasets demanding to the point of being infeasible for large scale matching tasks. In this paper we present an ontology matching evaluation dataset composed of thousands of matching tasks, called TaxME2. It was built semi-automatically out of the Google, Yahoo and Looksmart web directories. We evaluated TaxME2 by exploiting the results of almost two dozen of state of the art ontology matching systems. The experiments indicate that the dataset possesses the desired key properties, namely it is error-free, incremental, discriminative, monotonic, and hard for the state of the art ontology matching systems. The paper has been accepted for publication in "The Knowledge Engineering Review", Cambridge Universty Press (ISSN: 0269-8889, EISSN: 1469-8005)

    Knowledge Search within a Company-WIKI

    Get PDF
    The usage of Wikis for the purpose of knowledge management within a business company is only of value if the stored information can be found easily. The fundamental characteristic of a Wiki, its easy and informal usage, results in large amounts of steadily changing, unstructured documents. The widely used full-text search often provides search results of insufficient accuracy. In this paper, we will present an approach likely to improve search quality, through the use of Semantic Web, Text Mining, and Case Based Reasoning (CBR) technologies. Search results are more precise and complete because, in contrast to full-text search, the proposed knowledge-based search operates on the semantic layer

    ONTOLOGY BASED TECHNICAL SKILL SIMILARITY

    Get PDF
    Online job boards have become a major platform for technical talent procurement and job search. These job portals have given rise to challenging matching and search problems. The core matching or search happens between technical skills of the job requirements and the candidate\u27s profile or keywords. The extensive list of technical skills and its polyonymous nature makes it less effective to perform a direct keyword matching. This results in substandard job matching or search results which misses out a closely matching candidate on account of it not having the exact skills. It is important to use a semantic similarity measure between skills to improve the relevance of the results. This paper proposes a semantic similarity measure between technical skills using a knowledge based approach. The approach builds an ontology using DBpedia and uses it to derive a similarity score. Feature based ontology similarity measures are used to derive a similarity score between two skills. The ontology also helps in resolving a base skill from its multiple representations. The paper discusses implementation of custom ontology, similarity measuring system and performance of the system in comparing technical skills. The proposed approach performs better than the Resumatcher system in finding the similarity between skills. Keywords
    • …
    corecore