35,370 research outputs found
Machine learning for crystal identification and discovery
As computers get faster, researchers -- not hardware or algorithms -- become
the bottleneck in scientific discovery. Computational study of colloidal
self-assembly is one area that is keenly affected: even after computers
generate massive amounts of raw data, performing an exhaustive search to
determine what (if any) ordered structures occur in a large parameter space of
many simulations can be excruciating. We demonstrate how machine learning can
be applied to discover interesting areas of parameter space in colloidal self
assembly. We create numerical fingerprints -- inspired by bond orientational
order diagrams -- of structures found in self-assembly studies and use these
descriptors to both find interesting regions in a phase diagram and identify
characteristic local environments in simulations in an automated manner for
simple and complex crystal structures. Utilizing these methods allows analysis
methods to keep up with the data generation ability of modern high-throughput
computing environments.Comment: Fixed typo, added missing acknowledgment, added supplementary
informatio
Deep unsupervised clustering with Gaussian mixture variational autoencoders
We study a variant of the variational autoencoder model with a Gaussian mixture as a prior distribution, with the goal of performing unsupervised clustering through deep generative models. We observe that the standard variational approach in these models is unsuited for unsupervised clustering, and mitigate this problem by leveraging a principled information-theoretic regularisation term known as consistency violation. Adding this term to the standard variational optimisation objective yields networks with both meaningful internal representations and well-defined clusters. We demonstrate the performance of this scheme on synthetic data, MNIST and SVHN, showing that the obtained clusters are distinct, interpretable and result in achieving higher performance on unsupervised clustering classification than previous approaches
Semi-supervised model-based clustering with controlled clusters leakage
In this paper, we focus on finding clusters in partially categorized data
sets. We propose a semi-supervised version of Gaussian mixture model, called
C3L, which retrieves natural subgroups of given categories. In contrast to
other semi-supervised models, C3L is parametrized by user-defined leakage
level, which controls maximal inconsistency between initial categorization and
resulting clustering. Our method can be implemented as a module in practical
expert systems to detect clusters, which combine expert knowledge with true
distribution of data. Moreover, it can be used for improving the results of
less flexible clustering techniques, such as projection pursuit clustering. The
paper presents extensive theoretical analysis of the model and fast algorithm
for its efficient optimization. Experimental results show that C3L finds high
quality clustering model, which can be applied in discovering meaningful groups
in partially classified data
- …