377,515 research outputs found
Model-based learning for point pattern data
This article proposes a framework for model-based point pattern learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed
Probabilistic Sparse Subspace Clustering Using Delayed Association
Discovering and clustering subspaces in high-dimensional data is a
fundamental problem of machine learning with a wide range of applications in
data mining, computer vision, and pattern recognition. Earlier methods divided
the problem into two separate stages of finding the similarity matrix and
finding clusters. Similar to some recent works, we integrate these two steps
using a joint optimization approach. We make the following contributions: (i)
we estimate the reliability of the cluster assignment for each point before
assigning a point to a subspace. We group the data points into two groups of
"certain" and "uncertain", with the assignment of latter group delayed until
their subspace association certainty improves. (ii) We demonstrate that delayed
association is better suited for clustering subspaces that have ambiguities,
i.e. when subspaces intersect or data are contaminated with outliers/noise.
(iii) We demonstrate experimentally that such delayed probabilistic association
leads to a more accurate self-representation and final clusters. The proposed
method has higher accuracy both for points that exclusively lie in one
subspace, and those that are on the intersection of subspaces. (iv) We show
that delayed association leads to huge reduction of computational cost, since
it allows for incremental spectral clustering
Fuzzy c-Means Clustering untuk Pengenalan Pola Studi kasus Data Saham
Fuzzy Clustering is one of the five roles used by data mining experts to transform large amounts of data into useful information, and one method that is often and widely used is Fuzzy c-Means (FCM) Clustering. FCM is a data clustering technique where the existence of each data point in the cluster is based on the degree of membership. This study aims to see the pattern of data samples or data categories using FCM clustering. The analyzed data is stock data on Jakarta Stock Exchange (BEJ) in the Property and Real Estate sector (issuer group). The data mining processes comply Cross Industry Standard Process Model for Data mining Process (Crisp-DM), with several stages, starting with the stage of getting to know the business process (Business Understanding) then studying the data (Data Understanding), continuing with the Data Preparation stage, Modeling stage, Evaluation stage and finally the Deployment stage. In the modeling stage, the FCM model is used. FCM clustering model data mining can analyze data in large databases with many variables and complicated, especially to get patterns from the data. Then a Fuzzy Inference System (FIS) was built based on a known pattern for simulating input data into output data based on fuzzy logic.
Keywords: Fuzzy c-Means Clustering, Pattern Recognitio
Identifying structural changes with unsupervised machine learning methods
Unsupervised machine learning methods are used to identify structural changes
using the melting point transition in classical molecular dynamics simulations
as an example application of the approach. Dimensionality reduction and
clustering methods are applied to instantaneous radial distributions of atomic
configurations from classical molecular dynamics simulations of metallic
systems over a large temperature range. Principal component analysis is used to
dramatically reduce the dimensionality of the feature space across the samples
using an orthogonal linear transformation that preserves the statistical
variance of the data under the condition that the new feature space is linearly
independent. From there, k-means clustering is used to partition the samples
into solid and liquid phases through a criterion motivated by the geometry of
the reduced feature space of the samples, allowing for an estimation of the
melting point transition. This pattern criterion is conceptually similar to how
humans interpret the data but with far greater throughput, as the shapes of the
radial distributions are different for each phase and easily distinguishable by
humans. The transition temperature estimates derived from this machine learning
approach produce comparable results to other methods on similarly small system
sizes. These results show that machine learning approaches can be applied to
structural changes in physical systems
Possibilistic clustering for shape recognition
Clustering methods have been used extensively in computer vision and pattern recognition. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that total commitment of a vector to a given class is not required at each iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in noisy environments. Recently, we cast the clustering problem into the framework of possibility theory. Our approach was radically different from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. We constructed an appropriate objective function whose minimum will characterize a good possibilistic partition of the data, and we derived the membership and prototype update equations from necessary conditions for minimization of our criterion function. In this paper, we show the ability of this approach to detect linear and quartic curves in the presence of considerable noise
- …