7 research outputs found
Linking stability with molecular geometries of perovskites and lanthanide richness using machine learning methods
Oxide perovskite materials of type ABO3 have a wide range of technological
applications, such as catalysts in solid oxide fuel cells and as
light-absorbing materials in solar photovoltaics. These materials often exhibit
differential structural and electrostatic properties through lanthanide or
non-lanthanide derived A- and B- sites. Although, experimental and/or
computational verification of these differences are often difficult. In this
paper, we thus take a data-driven approach. Specifically, we run three analysis
using the dataset Li, Jacobs, and Morgan [2018a] applying advanced machine
learning tools to perform nonparametric regressions and also to produce data
visualizations using latent factor analysis (LFA) and principal component
analysis (PCA). We also implement a nonparametric feature screening step while
performing our high dimensional regression analysis, ensuring robustness in our
result
New developments of dimension reduction
Variable selection becomes more crucial than before, since high dimensional data are frequently seen in many research areas. Many model-based variable selection methods have been developed. However, the performance might be poor when the model is mis-specified. Sufficient dimension reduction (SDR, Li 1991; Cook 1998) provides a general framework for model-free variable selection methods.
In this thesis, we first propose a novel model-free variable selection method to deal with multi-population data by incorporating the grouping information. Theoretical properties of our proposed method are also presented. Simulation studies show that our new method significantly improves the selection performance compared with those ignoring the grouping information. In the second part of this dissertation, we apply partial SDR method to conduct conditional model-free variable (feature) screening for ultra-high dimensional data, when researchers have prior information regarding the importance of certain predictors based on experience or previous investigations. Comparing to the state of art conditional screening method, conditional sure independence screening (CSIS; Barut, Fan and Verhasselt, 2016), our method greatly outperforms CSIS for nonlinear models. The sure screening consistency property of our proposed method is also established --Abstract, page iv
High-dimensional statistics for complex data
2016 - 2017High dimensional data analysis has become a popular research topic in the
recent years, due to the emergence of various new applications in several fields
of sciences underscoring the need for analysing massive data sets.
One of the main challenge in analysing high dimensional data regards the
interpretability of estimated models as well as the computational efficiency of
procedures adopted. Such a purpose can be achieved through the identification of relevant variables that really affect the phenomenon of interest, so that
effective models can be subsequently constructed and applied to solve practical
problems. The first two chapters of the thesis are devoted in studying high
dimensional statistics for variable selection. We firstly introduce a short but
exhaustive review on the main developed techniques for the general problem
of variable selection using nonparametric statistics. Lastly in chapter 3 we will
present our proposal regarding a feature screening approach for non additive
models developed by using of conditional information in the estimation procedure... [edited by Author]XXX cicl