8 research outputs found

    Persistence Diagram Cardinality and its Applications

    Get PDF
    This dissertation studies persistence diagrams and their usefulness in machine learning. Persistence diagrams are summaries of underlying topological structure present within data. These diagrams are especially applicable for analyzing data whose shape is a relevant descriptor, as they provide unique information as sets instead of vectors. Although there are methods for vectorizing persistence diagrams, we focus instead on statistical learning schemes that use persistence diagrams directly. In particular, the cardinality of the diagrams proves itself a useful indicator, although this cardinality is variable at higher dimensions. To better understand and use the cardinality of persistence diagrams, we prove that the cardinality is bounded using both statistics and geometry. We also prove stability of a cardinality-based persistence diagram distance in a continuous fashion. These results are then applied to analyze persistence diagrams generated from structures of materials.We also develop a Bayesian framework for modeling the cardinality and spatial distributions of points within persistence diagrams using i.i.d. cluster point processes. From Gaussian mixture and binomial priors, we derive equations for the posterior cardinality and spatial distributions. We also introduce a distribution to account for noise typical of persistence diagrams. This framework provides a means of classifying biochemical networks present within cells using Bayes factors. We also provide a favorable comparison of the Bayesian classification of networks with several other methods

    Representation of Molecular Structures with Persistent Homology Leads to the Discovery of Molecular Groups with Enhanced CO2 Binding

    No full text
    Developing alternative strategies for efficient separation of CO2 and N2 is of general interest for the reduction of anthropogenic carbon emissions. In recent years, machine learning and high-throughput computational screening have been valuable tools in accelerated first-principles screening for the discovery of the next generation of functionalized molecules and materials. The application of machine learning for chemical applications requires the conversion of molecular structures to a machine-readable format known as a molecular representation. The choice of such representations impacts the performance and outcomes of chemical machine learning methods. Herein, we present a new concise and size-consistent molecular representation derived from persistent homology,an applied branch of mathematics. We have demonstrated its applicability in a high-throughput computational screening of a large molecular database (GDB-9) with more than 133,000 organic molecules. Our target is to identify novel molecules that selectively interact with CO2. The methodology and performance of the novel molecular fingerprinting method is presented and the new chemically-driven persistence image representation is used to screen the GDB-9 database to suggest molecules and/or functional groups with enhanced properties.</p

    Author Correction: Representation of molecular structures with persistent homology for machine learning applications in chemistry

    No full text
    An amendment to this paper has been published and can be accessed via a link at the top of the paper

    Representation of molecular structures with persistent homology for machine learning applications in chemistry

    No full text
    The choice of molecular representations can severely impact the performances of machine-learning methods. Here the authors demonstrate a persistence homology based molecular representation through an active-learning approach for predicting CO2/N2 interaction energies at the density functional theory (DFT) level

    Materials Fingerprinting Classification

    Full text link
    Significant progress in many classes of materials could be made with the availability of experimentally-derived large datasets composed of atomic identities and three-dimensional coordinates. Methods for visualizing the local atomic structure, such as atom probe tomography (APT), which routinely generate datasets comprised of millions of atoms, are an important step in realizing this goal. However, state-of-the-art APT instruments generate noisy and sparse datasets that provide information about elemental type, but obscure atomic structures, thus limiting their subsequent value for materials discovery. The application of a materials fingerprinting process, a machine learning algorithm coupled with topological data analysis, provides an avenue by which here-to-fore unprecedented structural information can be extracted from an APT dataset. As a proof of concept, the material fingerprint is applied to high-entropy alloy APT datasets containing body-centered cubic (BCC) and face-centered cubic (FCC) crystal structures. A local atomic configuration centered on an arbitrary atom is assigned a topological descriptor, with which it can be characterized as a BCC or FCC lattice with near perfect accuracy, despite the inherent noise in the dataset. This successful identification of a fingerprint is a crucial first step in the development of algorithms which can extract more nuanced information, such as chemical ordering, from existing datasets of complex materials

    Materials Fingerprinting Classification

    No full text
    Significant progress in many classes of materials could be made with the availability of experimentally-derived large datasets composed of atomic identities and three-dimensional coordinates. Methods for visualizing the local atomic structure, such as atom probe tomography (APT), which routinely generate datasets comprised of millions of atoms, are an important step in realizing this goal. However, state-of-the-art APT instruments generate noisy and sparse datasets that provide information about elemental type, but obscure atomic structures, thus limiting their subsequent value for materials discovery. The application of a materials fingerprinting process, a machine learning algorithm coupled with topological data analysis, provides an avenue by which here-to-fore unprecedented structural information can be extracted from an APT dataset. As a proof of concept, the material fingerprint is applied to high-entropy alloy APT datasets containing body-centered cubic (BCC) and face-centered cubic (FCC) crystal structures. A local atomic configuration centered on an arbitrary atom is assigned a topological descriptor, with which it can be characterized as a BCC or FCC lattice with near perfect accuracy, despite the inherent noise in the dataset. This successful identification of a fingerprint is a crucial first step in the development of algorithms which can extract more nuanced information, such as chemical ordering, from existing datasets of complex materials
    corecore