228 research outputs found

    Big-Data Science in Porous Materials: Materials Genomics and Machine Learning

    Full text link
    By combining metal nodes with organic linkers we can potentially synthesize millions of possible metal organic frameworks (MOFs). At present, we have libraries of over ten thousand synthesized materials and millions of in-silico predicted materials. The fact that we have so many materials opens many exciting avenues to tailor make a material that is optimal for a given application. However, from an experimental and computational point of view we simply have too many materials to screen using brute-force techniques. In this review, we show that having so many materials allows us to use big-data methods as a powerful technique to study these materials and to discover complex correlations. The first part of the review gives an introduction to the principles of big-data science. We emphasize the importance of data collection, methods to augment small data sets, how to select appropriate training sets. An important part of this review are the different approaches that are used to represent these materials in feature space. The review also includes a general overview of the different ML techniques, but as most applications in porous materials use supervised ML our review is focused on the different approaches for supervised ML. In particular, we review the different method to optimize the ML process and how to quantify the performance of the different methods. In the second part, we review how the different approaches of ML have been applied to porous materials. In particular, we discuss applications in the field of gas storage and separation, the stability of these materials, their electronic properties, and their synthesis. The range of topics illustrates the large variety of topics that can be studied with big-data science. Given the increasing interest of the scientific community in ML, we expect this list to rapidly expand in the coming years.Comment: Editorial changes (typos fixed, minor adjustments to figures

    Feature Selection and Classifier Development for Radio Frequency Device Identification

    Get PDF
    The proliferation of simple and low-cost devices, such as IEEE 802.15.4 ZigBee and Z-Wave, in Critical Infrastructure (CI) increases security concerns. Radio Frequency Distinct Native Attribute (RF-DNA) Fingerprinting facilitates biometric-like identification of electronic devices emissions from variances in device hardware. Developing reliable classifier models using RF-DNA fingerprints is thus important for device discrimination to enable reliable Device Classification (a one-to-many looks most like assessment) and Device ID Verification (a one-to-one looks how much like assessment). AFITs prior RF-DNA work focused on Multiple Discriminant Analysis/Maximum Likelihood (MDA/ML) and Generalized Relevance Learning Vector Quantized Improved (GRLVQI) classifiers. This work 1) introduces a new GRLVQI-Distance (GRLVQI-D) classifier that extends prior GRLVQI work by supporting alternative distance measures, 2) formalizes a framework for selecting competing distance measures for GRLVQI-D, 3) introducing response surface methods for optimizing GRLVQI and GRLVQI-D algorithm settings, 4) develops an MDA-based Loadings Fusion (MLF) Dimensional Reduction Analysis (DRA) method for improved classifier-based feature selection, 5) introduces the F-test as a DRA method for RF-DNA fingerprints, 6) provides a phenomenological understanding of test statistics and p-values, with KS-test and F-test statistic values being superior to p-values for DRA, and 7) introduces quantitative dimensionality assessment methods for DRA subset selection

    Smart Monitoring and Control in the Future Internet of Things

    Get PDF
    The Internet of Things (IoT) and related technologies have the promise of realizing pervasive and smart applications which, in turn, have the potential of improving the quality of life of people living in a connected world. According to the IoT vision, all things can cooperate amongst themselves and be managed from anywhere via the Internet, allowing tight integration between the physical and cyber worlds and thus improving efficiency, promoting usability, and opening up new application opportunities. Nowadays, IoT technologies have successfully been exploited in several domains, providing both social and economic benefits. The realization of the full potential of the next generation of the Internet of Things still needs further research efforts concerning, for instance, the identification of new architectures, methodologies, and infrastructures dealing with distributed and decentralized IoT systems; the integration of IoT with cognitive and social capabilities; the enhancement of the sensing–analysis–control cycle; the integration of consciousness and awareness in IoT environments; and the design of new algorithms and techniques for managing IoT big data. This Special Issue is devoted to advancements in technologies, methodologies, and applications for IoT, together with emerging standards and research topics which would lead to realization of the future Internet of Things

    Development of a Data-Driven Scientific Methodology : From Articles to Chemometric Data Products

    Get PDF
    Acknowledgements This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.Peer reviewedPostprin

    Tracking the Temporal-Evolution of Supernova Bubbles in Numerical Simulations

    Get PDF
    The study of low-dimensional, noisy manifolds embedded in a higher dimensional space has been extremely useful in many applications, from the chemical analysis of multi-phase flows to simulations of galactic mergers. Building a probabilistic model of the manifolds has helped in describing their essential properties and how they vary in space. However, when the manifold is evolving through time, a joint spatio-temporal modelling is needed, in order to fully comprehend its nature. We propose a first-order Markovian process that propagates the spatial probabilistic model of a manifold at fixed time, to its adjacent temporal stages. The proposed methodology is demonstrated using a particle simulation of an interacting dwarf galaxy to describe the evolution of a cavity generated by a Supernov

    De novo sequencing of proteins by mass spectrometry

    Get PDF
    Introduction Proteins are crucial for every cellular activity and unraveling their sequence and structure is a crucial step to fully understand their biology. Early methods of protein sequencing were mainly based on the use of enzymatic or chemical degradation of peptide chains. With the completion of the human genome project and with the expansion of the information available for each protein, various databases containing this sequence information were formed. Areas covered De novo protein sequencing, shotgun proteomics and other mass-spectrometric techniques, along with the various software are currently available for proteogenomic analysis. Emphasis is placed on the methods for de novo sequencing, together with potential and shortcomings using databases for interpretation of protein sequence data. Expert opinion As mass-spectrometry sequencing performance is improving with better software and hardware optimizations, combined with user-friendly interfaces, de-novo protein sequencing becomes imperative in shotgun proteomic studies. Issues regarding unknown or mutated peptide sequences, as well as, unexpected post-translational modifications (PTMs) and their identification through false discovery rate searches using the target/decoy strategy need to be addressed. Ideally, it should become integrated in standard proteomic workflows as an add-on to conventional database search engines, which then would be able to provide improved identification.publishe

    Data Hiding and Its Applications

    Get PDF
    Data hiding techniques have been widely used to provide copyright protection, data integrity, covert communication, non-repudiation, and authentication, among other applications. In the context of the increased dissemination and distribution of multimedia content over the internet, data hiding methods, such as digital watermarking and steganography, are becoming increasingly relevant in providing multimedia security. The goal of this book is to focus on the improvement of data hiding algorithms and their different applications (both traditional and emerging), bringing together researchers and practitioners from different research fields, including data hiding, signal processing, cryptography, and information theory, among others

    Computer Aided Synthesis Prediction to Enable Augmented Chemical Discovery and Chemical Space Exploration

    Get PDF
    The drug-like chemical space is estimated to be 10 to the power of 60 molecules, and the largest generated database (GDB) obtained by the Reymond group is 165 billion molecules with up to 17 heavy atoms. Furthermore, deep learning techniques to explore regions of chemical space are becoming more popular. However, the key to realizing the generated structures experimentally lies in chemical synthesis. The application of which was previously limited to manual planning or slow computer assisted synthesis planning (CASP) models. Despite the 60-year history of CASP few synthesis planning tools have been open-sourced to the community. In this thesis I co-led the development of and investigated one of the only fully open-source synthesis planning tools called AiZynthFinder, trained on both public and proprietary datasets consisting of up to 17.5 million reactions. This enables synthesis guided exploration of the chemical space in a high throughput manner, to bridge the gap between compound generation and experimental realisation. I firstly investigate both public and proprietary reaction data, and their influence on route finding capability. Furthermore, I develop metrics for assessment of retrosynthetic prediction, single-step retrosynthesis models, and automated template extraction workflows. This is supplemented by a comparison of the underlying datasets and their corresponding models. Given the prevalence of ring systems in the GDB and wider medicinal chemistry domain, I developed ‘Ring Breaker’ - a data-driven approach to enable the prediction of ring-forming reactions. I demonstrate its utility on frequently found and unprecedented ring systems, in agreement with literature syntheses. Additionally, I highlight its potential for incorporation into CASP tools, and outline methodological improvements that result in the improvement of route-finding capability. To tackle the challenge of model throughput, I report a machine learning (ML) based classifier called the retrosynthetic accessibility score (RAscore), to assess the likelihood of finding a synthetic route using AiZynthFinder. The RAscore computes at least 4,500 times faster than AiZynthFinder. Thus, opens the possibility of pre-screening millions of virtual molecules from enumerated databases or generative models for synthesis informed compound prioritization. Finally, I combine chemical library visualization with synthetic route prediction to facilitate experimental engagement with synthetic chemists. I enable the navigation of chemical property space by using interactive visualization to deliver associated synthetic data as endpoints. This aids in the prioritization of compounds. The ability to view synthetic route information alongside structural descriptors facilitates a feedback mechanism for the improvement of CASP tools and enables rapid hypothesis testing. I demonstrate the workflow as applied to the GDB databases to augment compound prioritization and synthetic route design
    • 

    corecore