28 research outputs found

    Big Data of Materials Science - Critical Role of the Descriptor

    Full text link
    Statistical learning of materials properties or functions so far starts with a largely silent, non-challenged step: the choice of the set of descriptive parameters (termed descriptor). However, when the scientific connection between the descriptor and the actuating mechanisms is unclear, causality of the learned descriptor-property relation is uncertain. Thus, trustful prediction of new promising materials, identification of anomalies, and scientific advancement are doubtful. We analyse this issue and define requirements for a suited descriptor. For a classical example, the energy difference of zincblende/wurtzite and rocksalt semiconductors, we demonstrate how a meaningful descriptor can be found systematically.Comment: Accepted to Phys. Rev. Let

    Learning physical descriptors for materials science by compressed sensing

    Get PDF
    The availability of big data in materials science offers new routes for analyzing materials properties and functions and achieving scientific understanding. Finding structure in these data that is not directly visible by standard tools and exploitation of the scientific information requires new and dedicated methodology based on approaches from statistical learning, compressed sensing, and other recent methods from applied mathematics, computer science, statistics, signal processing, and information science. In this paper, we explain and demonstrate a compressed-sensing based methodology for feature selection, specifically for discovering physical descriptors, i.e., physical parameters that describe the material and its properties of interest, and associated equations that explicitly and quantitatively describe those relevant properties. As showcase application and proof of concept, we describe how to build a physical model for the quantitative prediction of the crystal structure of binary compound semiconductors

    Function spaces with dominating mixed smoothness

    Get PDF
    We study several techniques whichare well known in the case of Besov and TriebelLizorkin spaces and extend them to spaces with dominating mixed smoothness. We use the ideas of Triebel to prove three important decomposition theorems. We deal withsocalled atomic, subatomic and wavelet decompositions. All these theorems have much in common. fRoughly speaking, they say that a function belongs to some function space if, and only if, it can be decomposed into the sum of products of coefficients and corresponding building blocks, where the coefficients belong to an appropriate sequence space. These decomposition theorems estabilisha veryusefulconnection between function and sequence spaces. We use them in the study of the decay of entropy numbers of compact embeddings between two function spaces of dominating mixed smoothness reducingthis problem to the same question on the sequence space level. The considered scales cover many important specific spaces (Sobolev, Zygmund, Besov) and we get generalisations of respective assertions of Belinsky, Dinh Dung and Temlyakov

    Sparse Proteomics Analysis - A compressed sensing-based approach for feature selection and classification of high-dimensional proteomics mass spectrometry data

    Get PDF
    Background: High-throughput proteomics techniques, such as mass spectrometry (MS)-based approaches, produce very high-dimensional data-sets. In a clinical setting one is often interested in how mass spectra differ between patients of different classes, for example spectra from healthy patients vs. spectra from patients having a particular disease. Machine learning algorithms are needed to (a) identify these discriminating features and (b) classify unknown spectra based on this feature set. Since the acquired data is usually noisy, the algorithms should be robust against noise and outliers, while the identified feature set should be as small as possible. Results: We present a new algorithm, Sparse Proteomics Analysis (SPA), based on the theory of compressed sensing that allows us to identify a minimal discriminating set of features from mass spectrometry data-sets. We show (1) how our method performs on artificial and real-world data-sets, (2) that its performance is competitive with standard (and widely used) algorithms for analyzing proteomics data, and (3) that it is robust against random and systematic noise. We further demonstrate the applicability of our algorithm to two previously published clinical data-sets
    corecore