3 research outputs found

    Data driven discovery of materials properties.

    Get PDF
    The high pace of nowadays industrial evolution is creating an urgent need to design new cost efficient materials that can satisfy both current and future demands. However, with the increase of structural and functional complexity of materials, the ability to rationally design new materials with a precise set of properties has become increasingly challenging. This basic observation has triggered the idea of applying machine learning techniques in the field, which was further encouraged by the launch of the Materials Genome Initiative (MGI) by the US government since 2011. In this work, we present a novel approach to apply machine learning techniques for materials science applications. Guided by knowledge from domain experts, our approach focuses on machine learning to accelerate data-driven discovery of materials properties. Our objectives are two folds: (i) Identify the optimal set of features that best describes a given predicted variable. (ii) Boost prediction accuracy via applying various regression algorithms. Ordinary Least Square, Partial Least Square and Lasso regressions, combined with well adjusted feature selection techniques are applied and tested to predict key properties of semiconductors for two types of applications. First, we propose to build a more robust prediction model for band-gap energy (BG-E) of chalcopyrites, commonly used for solar cells industry. Compared to the results reported in [1-3] , our approach shows that learning and using only a subset of relevant features can improve the prediction accuracy by about 40%. For the second application, we propose to determine the underlying factors responsible for Defect-Induced Magnetism (DIM) in Dilute Magnetic Semiconductors (DMS) through the analysis of a set of 30 features for different DMS systems. We show that 8 of these features are more likely to contribute to this property. Using only these features to predict the total magnetic moment of new candidate DMSs has reduced the mean square error by about 90% compared to the models trained using the whole set of features. Given the scarcity of the available data sets for similar applications, this work aims not only to build robust models but also to establish a collaborative platform for future research

    Confidence-Guided Data Augmentation for Deep Semi-Supervised Training

    Full text link
    We propose a new data augmentation technique for semi-supervised learning settings that emphasizes learning from the most challenging regions of the feature space. Starting with a fully supervised reference model, we first identify low confidence predictions. These samples are then used to train a Variational AutoEncoder (VAE) that can generate an infinite number of additional images with similar distribution. Finally, using the originally labeled data and the synthetically generated labeled and unlabeled data, we retrain a new model in a semi-supervised fashion. We perform experiments on two benchmark RGB datasets: CIFAR-100 and STL-10, and show that the proposed scheme improves classification performance in terms of accuracy and robustness, while yielding comparable or superior results with respect to existing fully supervised approachesComment: 7 page
    corecore