73,514 research outputs found

    Self-adaptive GA, quantitative semantic similarity measures and ontology-based text clustering

    Get PDF
    As the common clustering algorithms use vector space model (VSM) to represent document, the conceptual relationships between related terms which do not co-occur literally are ignored. A genetic algorithm-based clustering technique, named GA clustering, in conjunction with ontology is proposed in this article to overcome this problem. In general, the ontology measures can be partitioned into two categories: thesaurus-based methods and corpus-based methods. We take advantage of the hierarchical structure and the broad coverage taxonomy of Wordnet as the thesaurus-based ontology. However, the corpus-based method is rather complicated to handle in practical application. We propose a transformed latent semantic analysis (LSA) model as the corpus-based method in this paper. Moreover, two hybrid strategies, the combinations of the various similarity measures, are implemented in the clustering experiments. The results show that our GA clustering algorithm, in conjunction with the thesaurus-based and the LSA-based method, apparently outperforms that with other similarity measures. Moreover, the superiority of the GA clustering algorithm proposed over the commonly used k-means algorithm and the standard GA is demonstrated by the improvements of the clustering performance

    Discovering information flow using a high dimensional conceptual space

    Get PDF
    This paper presents an informational inference mechanism realized via the use of a high dimensional conceptual space. More specifically, we claim to have operationalized important aspects of G?rdenforss recent three-level cognitive model. The connectionist level is primed with the Hyperspace Analogue to Language (HAL) algorithm which produces vector representations for use at the conceptual level. We show how inference at the symbolic level can be implemented by employing Barwise and Seligmans theory of information flow. This article also features heuristics for enhancing HAL-based representations via the use of quality properties, determining concept inclusion and computing concept composition. The worth of these heuristics in underpinning informational inference are demonstrated via a series of experiments. These experiments, though small in scale, show that informational inference proposed in this article has a very different character to the semantic associations produced by the Minkowski distance metric and concept similarity computed via the cosine coefficient. In short, informational inference generally uncovers concepts that are carried, or, in some cases, implied by another concept, (or combination of concepts)

    Generic Subsequence Matching Framework: Modularity, Flexibility, Efficiency

    Get PDF
    Subsequence matching has appeared to be an ideal approach for solving many problems related to the fields of data mining and similarity retrieval. It has been shown that almost any data class (audio, image, biometrics, signals) is or can be represented by some kind of time series or string of symbols, which can be seen as an input for various subsequence matching approaches. The variety of data types, specific tasks and their partial or full solutions is so wide that the choice, implementation and parametrization of a suitable solution for a given task might be complicated and time-consuming; a possibly fruitful combination of fragments from different research areas may not be obvious nor easy to realize. The leading authors of this field also mention the implementation bias that makes difficult a proper comparison of competing approaches. Therefore we present a new generic Subsequence Matching Framework (SMF) that tries to overcome the aforementioned problems by a uniform frame that simplifies and speeds up the design, development and evaluation of subsequence matching related systems. We identify several relatively separate subtasks solved differently over the literature and SMF enables to combine them in straightforward manner achieving new quality and efficiency. This framework can be used in many application domains and its components can be reused effectively. Its strictly modular architecture and openness enables also involvement of efficient solutions from different fields, for instance efficient metric-based indexes. This is an extended version of a paper published on DEXA 2012.Comment: This is an extended version of a paper published on DEXA 201
    • …
    corecore