377,515 research outputs found

    Model-based learning for point pattern data

    Get PDF
    This article proposes a framework for model-based point pattern learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed

    Probabilistic Sparse Subspace Clustering Using Delayed Association

    Full text link
    Discovering and clustering subspaces in high-dimensional data is a fundamental problem of machine learning with a wide range of applications in data mining, computer vision, and pattern recognition. Earlier methods divided the problem into two separate stages of finding the similarity matrix and finding clusters. Similar to some recent works, we integrate these two steps using a joint optimization approach. We make the following contributions: (i) we estimate the reliability of the cluster assignment for each point before assigning a point to a subspace. We group the data points into two groups of "certain" and "uncertain", with the assignment of latter group delayed until their subspace association certainty improves. (ii) We demonstrate that delayed association is better suited for clustering subspaces that have ambiguities, i.e. when subspaces intersect or data are contaminated with outliers/noise. (iii) We demonstrate experimentally that such delayed probabilistic association leads to a more accurate self-representation and final clusters. The proposed method has higher accuracy both for points that exclusively lie in one subspace, and those that are on the intersection of subspaces. (iv) We show that delayed association leads to huge reduction of computational cost, since it allows for incremental spectral clustering

    Fuzzy c-Means Clustering untuk Pengenalan Pola Studi kasus Data Saham

    Get PDF
    Fuzzy Clustering is one of the five roles used by data mining experts to transform large amounts of data into useful information, and one method that is often and widely used is Fuzzy c-Means (FCM) Clustering. FCM is a data clustering technique where the existence of each data point in the cluster is based on the degree of membership. This study aims to see the pattern of data samples or data categories using FCM clustering. The analyzed data is stock data on Jakarta Stock Exchange (BEJ) in the Property and Real Estate sector (issuer group). The data mining processes comply Cross Industry Standard Process Model for Data mining Process (Crisp-DM), with several stages, starting with the stage of getting to know the business process (Business Understanding) then studying the data (Data Understanding), continuing with the Data Preparation stage, Modeling stage, Evaluation stage and finally the Deployment stage. In the modeling stage, the FCM model is used. FCM clustering model data mining can analyze data in large databases with many variables and complicated, especially to get patterns from the data. Then a Fuzzy Inference System (FIS) was built based on a known pattern for simulating input data into output data based on fuzzy logic. Keywords: Fuzzy c-Means Clustering, Pattern Recognitio

    Identifying structural changes with unsupervised machine learning methods

    Get PDF
    Unsupervised machine learning methods are used to identify structural changes using the melting point transition in classical molecular dynamics simulations as an example application of the approach. Dimensionality reduction and clustering methods are applied to instantaneous radial distributions of atomic configurations from classical molecular dynamics simulations of metallic systems over a large temperature range. Principal component analysis is used to dramatically reduce the dimensionality of the feature space across the samples using an orthogonal linear transformation that preserves the statistical variance of the data under the condition that the new feature space is linearly independent. From there, k-means clustering is used to partition the samples into solid and liquid phases through a criterion motivated by the geometry of the reduced feature space of the samples, allowing for an estimation of the melting point transition. This pattern criterion is conceptually similar to how humans interpret the data but with far greater throughput, as the shapes of the radial distributions are different for each phase and easily distinguishable by humans. The transition temperature estimates derived from this machine learning approach produce comparable results to other methods on similarly small system sizes. These results show that machine learning approaches can be applied to structural changes in physical systems

    Possibilistic clustering for shape recognition

    Get PDF
    Clustering methods have been used extensively in computer vision and pattern recognition. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that total commitment of a vector to a given class is not required at each iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in noisy environments. Recently, we cast the clustering problem into the framework of possibility theory. Our approach was radically different from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. We constructed an appropriate objective function whose minimum will characterize a good possibilistic partition of the data, and we derived the membership and prototype update equations from necessary conditions for minimization of our criterion function. In this paper, we show the ability of this approach to detect linear and quartic curves in the presence of considerable noise
    corecore