Search CORE

3 research outputs found

Advanced Methods for Discovering Genetic Markers Associated with High Dimensional Imaging Data

Author: Zhang Jingwen
Publication venue: University of North Carolina at Chapel Hill Graduate School
Publication date: 01/01/2019
Field of study

Imaging genetic studies have been widely applied to discover genetic factors of inherited neuropsychiatric diseases. Despite the notable contribution of genome-wide association studies (GWAS) in neuroimaging research, it has always been difficult to efficiently perform association analysis on imaging phenotypes. There are several challenges arising from this topic, such as the large dimensionality of imaging data and genetic data, the potential spatial dependency of imaging phenotypes and the computational burden of the GWAS problem. All the aforementioned issues motivate us to investigate new statistical methods in neuroimaging genetic analysis. In the first project, we develop a hierarchical functional principal regression model (HFPRM) to simultaneously study diffusion tensor bundle statistics on multiple fiber tracts. Theoretically, the asymptotic distribution of the global test statistic on the common factors has been studied. Simulations are conducted to evaluate the finite sample performance of HFPRM. Finally, we apply our method to a GWAS of a neonate population to explore important genetic architecture in early human brain development. In the second project, we consider an association test between functional data acquired on a single curve and scalar variables in a varying coefficient model. We propose a functional projection regression model and an associated global test statistic to aggregate weak signals across the domain of functional data. Theoretically, we examine the asymptotic distribution of the global test statistic and provide a strategy to adaptively select the tuning parameter. Simulation experiments show that the proposed test outperforms existing state-of-the-art methods in functional statistical inference. We also apply the proposed method to a GWAS in the UK Biobank dataset. In the third project, we introduce an adaptive projection regression model (APRM) to perform statistical inference on high dimensional imaging responses in the presence of high correlations. Dimension reduction of the phenotypes is achieved through a linear projection regression model. We also implement an adaptive inference procedure to detect signals at multiple levels. Numerical simulations demonstrate that APRM outperforms many state-of-the-art methods in high dimensional inference. Finally, we apply APRM to a GWAS of volumetric data on 93 regions of interest in the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset.Doctor of Philosoph

Carolina Digital Repository

Simultaneous feature and feature group selection through hard thresholding

Author: Foucart S.
Gong P.
Liu J.
Lozano A. C.
Nocedal J.
Swirszcz G.
Xiang S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Big-Data Science in Porous Materials: Materials Genomics and Machine Learning

Author: Adams H.
Anderson R.
Berend Smit
Bergstra J.
Bergstra J.
Bishop C. M.
Caruana R.
Caruana R.
Chen T.
Dacrema M. F.
Daniele Ongari
Forman G.
Gilmer J.
Goodfellow I.
Grünwald P. D.
Guyon I.
Géron A.
Hardt M.
Hastie T.
Hey A. J. G.
Hofer C. D.
Ioffe S.
James G.
Kevin Maik Jablonka
Maturana D.
Molnar C.
Montgomery D. C.
Noh H.
Pedregosa F.
Pettifor D. G.
Ramsundar B.
Saul N.
Seyed Mohamad Moosavi
Shafer G.
Shalev-Shwartz S.
Smit B.
Snoek J.
Srivastava N.
Sutton R. S.
Tibshirani T.
Tomek I.
Trickett C. A.
Tukey J. W.
Vishwakarma G.
Weinberger S.
Weisberg H. F.
Weyl H.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 08/06/2020
Field of study

By combining metal nodes with organic linkers we can potentially synthesize millions of possible metal organic frameworks (MOFs). At present, we have libraries of over ten thousand synthesized materials and millions of in-silico predicted materials. The fact that we have so many materials opens many exciting avenues to tailor make a material that is optimal for a given application. However, from an experimental and computational point of view we simply have too many materials to screen using brute-force techniques. In this review, we show that having so many materials allows us to use big-data methods as a powerful technique to study these materials and to discover complex correlations. The first part of the review gives an introduction to the principles of big-data science. We emphasize the importance of data collection, methods to augment small data sets, how to select appropriate training sets. An important part of this review are the different approaches that are used to represent these materials in feature space. The review also includes a general overview of the different ML techniques, but as most applications in porous materials use supervised ML our review is focused on the different approaches for supervised ML. In particular, we review the different method to optimize the ML process and how to quantify the performance of the different methods. In the second part, we review how the different approaches of ML have been applied to porous materials. In particular, we discuss applications in the field of gas storage and separation, the stability of these materials, their electronic properties, and their synthesis. The range of topics illustrates the large variety of topics that can be studied with big-data science. Given the increasing interest of the scientific community in ML, we expect this list to rapidly expand in the coming years.Comment: Editorial changes (typos fixed, minor adjustments to figures

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

eScholarship - University of California