43,586 research outputs found

    Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures

    Get PDF
    This paper compares several published methods for clustering chemical structures, using both graph- and fingerprint-based similarity measures. The clusterings from each method were compared to determine the degree of cluster overlap. Each method was also evaluated on how well it grouped structures into clusters possessing a non-trivial substructural commonality. The methods which employ adjustable parameters were tested to determine the stability of each parameter for datasets of varying size and composition. Our experiments suggest that both graph- and fingerprint-based similarity measures can be used effectively for generating chemical clusterings; it is also suggested that the CAST and Yin–Chen methods, suggested recently for the clustering of gene expression patterns, may also prove effective for the clustering of 2D chemical structures

    A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

    Get PDF
    The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

    Automatic histogram threshold using fuzzy measures

    Get PDF
    In this paper, an automatic histogram threshold approach based on a fuzziness measure is presented. This work is an improvement of an existing method. Using fuzzy logic concepts, the problems involved in finding the minimum of a criterion function are avoided. Similarity between gray levels is the key to find an optimal threshold. Two initial regions of gray levels, located at the boundaries of the histogram, are defined. Then, using an index of fuzziness, a similarity process is started to find the threshold point. A significant contrast between objects and background is assumed. Previous histogram equalization is used in small contrast images. No prior knowledge of the image is required.info:eu-repo/semantics/publishedVersio

    A new fuzzy set merging technique using inclusion-based fuzzy clustering

    Get PDF
    This paper proposes a new method of merging parameterized fuzzy sets based on clustering in the parameters space, taking into account the degree of inclusion of each fuzzy set in the cluster prototypes. The merger method is applied to fuzzy rule base simplification by automatically replacing the fuzzy sets corresponding to a given cluster with that pertaining to cluster prototype. The feasibility and the performance of the proposed method are studied using an application in mobile robot navigation. The results indicate that the proposed merging and rule base simplification approach leads to good navigation performance in the application considered and to fuzzy models that are interpretable by experts. In this paper, we concentrate mainly on fuzzy systems with Gaussian membership functions, but the general approach can also be applied to other parameterized fuzzy sets

    The Recommendation Architecture: Lessons from Large-Scale Electronic Systems Applied to Cognition

    Get PDF
    A fundamental approach of cognitive science is to understand cognitive systems by separating them into modules. Theoretical reasons are described which force any system which learns to perform a complex combination of real time functions into a modular architecture. Constraints on the way modules divide up functionality are also described. The architecture of such systems, including biological systems, is constrained into a form called the recommendation architecture, with a primary separation between clustering and competition. Clustering is a modular hierarchy which manages the interactions between functions on the basis of detection of functionally ambiguous repetition. Change to previously detected repetitions is limited in order to maintain a meaningful, although partially ambiguous context for all modules which make use of the previously defined repetitions. Competition interprets the repetition conditions detected by clustering as a range of alternative behavioural recommendations, and uses consequence feedback to learn to select the most appropriate recommendation. The requirements imposed by functional complexity result in very specific structures and processes which resemble those of brains. The design of an implemented electronic version of the recommendation architecture is described, and it is demonstrated that the system can heuristically define its own functionality, and learn without disrupting earlier learning. The recommendation architecture is compared with a range of alternative cognitive architectural proposals, and the conclusion reached that it has substantial potential both for understanding brains and for designing systems to perform cognitive functions

    Distributed Holistic Clustering on Linked Data

    Full text link
    Link discovery is an active field of research to support data integration in the Web of Data. Due to the huge size and number of available data sources, efficient and effective link discovery is a very challenging task. Common pairwise link discovery approaches do not scale to many sources with very large entity sets. We here propose a distributed holistic approach to link many data sources based on a clustering of entities that represent the same real-world object. Our clustering approach provides a compact and fused representation of entities, and can identify errors in existing links as well as many new links. We support a distributed execution of the clustering approach to achieve faster execution times and scalability for large real-world data sets. We provide a novel gold standard for multi-source clustering, and evaluate our methods with respect to effectiveness and efficiency for large data sets from the geographic and music domains
    corecore