2 research outputs found

    External Cluster Assessment

    No full text
    Automated tools for knowledge discovery are frequently invoked in databases where objects already group into some known (i.e., external) classification scheme. In the context of unsupervised learning or clustering, such tools delve inside large databases looking for alternative classification schemes that are meaningful and novel. An assessment of the information gained with new clusters can be effected by looking at the degree of separation between each new cluster and its most similar class. Our approach models each cluster and class as a multivariate Gaussian distribution and estimates their degree of separation through an information theoretic measure (i.e., through relative entropy or Kullback Leibler distance). The inherently large computational cost of this step is alleviated by first projecting all data over the single dimension that best separates both distributions (using Fisher’s Linear Discriminant). We test our algorithm on a dataset of Martian surfaces using the traditional division into geological units as external classes and the new, hydrology-inspired, automatically performed division as novel clusters. We find the new partitioning constitutes a formally meaningful classification that deviates substantially from the traditional classification

    Piece-Wise Model Fitting Using Local Data Patterns”, to appear

    No full text
    Abstract. In this paper we propose a novel classification algorithm that fits models of different complexity on separate regions of the input space. The goal is to achieve a balance between global and local learning strategies by decomposing the classification task into simpler subproblems; each task narrows the learning problem to a local region of high example density over the input space. Specifically, our proposed approach is to apply a clustering algorithm to every set of training examples that belong to the same class; each cluster becomes an intermediate concept that is learned by selecting a model with an (estimated) optimal degree of complexity. Experimental results on real-world domains show consistent good performance in predictive accuracy with our piece-wise model fitting strategy.
    corecore