7 research outputs found

    Linking stability with molecular geometries of perovskites and lanthanide richness using machine learning methods

    Full text link
    Oxide perovskite materials of type ABO3 have a wide range of technological applications, such as catalysts in solid oxide fuel cells and as light-absorbing materials in solar photovoltaics. These materials often exhibit differential structural and electrostatic properties through lanthanide or non-lanthanide derived A- and B- sites. Although, experimental and/or computational verification of these differences are often difficult. In this paper, we thus take a data-driven approach. Specifically, we run three analysis using the dataset Li, Jacobs, and Morgan [2018a] applying advanced machine learning tools to perform nonparametric regressions and also to produce data visualizations using latent factor analysis (LFA) and principal component analysis (PCA). We also implement a nonparametric feature screening step while performing our high dimensional regression analysis, ensuring robustness in our result

    New developments of dimension reduction

    Get PDF
    Variable selection becomes more crucial than before, since high dimensional data are frequently seen in many research areas. Many model-based variable selection methods have been developed. However, the performance might be poor when the model is mis-specified. Sufficient dimension reduction (SDR, Li 1991; Cook 1998) provides a general framework for model-free variable selection methods. In this thesis, we first propose a novel model-free variable selection method to deal with multi-population data by incorporating the grouping information. Theoretical properties of our proposed method are also presented. Simulation studies show that our new method significantly improves the selection performance compared with those ignoring the grouping information. In the second part of this dissertation, we apply partial SDR method to conduct conditional model-free variable (feature) screening for ultra-high dimensional data, when researchers have prior information regarding the importance of certain predictors based on experience or previous investigations. Comparing to the state of art conditional screening method, conditional sure independence screening (CSIS; Barut, Fan and Verhasselt, 2016), our method greatly outperforms CSIS for nonlinear models. The sure screening consistency property of our proposed method is also established --Abstract, page iv

    High-dimensional statistics for complex data

    Get PDF
    2016 - 2017High dimensional data analysis has become a popular research topic in the recent years, due to the emergence of various new applications in several fields of sciences underscoring the need for analysing massive data sets. One of the main challenge in analysing high dimensional data regards the interpretability of estimated models as well as the computational efficiency of procedures adopted. Such a purpose can be achieved through the identification of relevant variables that really affect the phenomenon of interest, so that effective models can be subsequently constructed and applied to solve practical problems. The first two chapters of the thesis are devoted in studying high dimensional statistics for variable selection. We firstly introduce a short but exhaustive review on the main developed techniques for the general problem of variable selection using nonparametric statistics. Lastly in chapter 3 we will present our proposal regarding a feature screening approach for non additive models developed by using of conditional information in the estimation procedure... [edited by Author]XXX cicl

    Nonparametric feature screening

    No full text
    corecore