Search CORE

22 research outputs found

Revealing Fundamental Physics from the Daya Bay Neutrino Experiment using Deep Neural Networks

Author: Baldi Pierre
Bhimji Wahid
Ko Seyoon
Oh Sang-Yun
Prabhat
Racah Evan
Sadowski Peter
Tull Craig
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Experiments in particle physics produce enormous quantities of data that must be analyzed and interpreted by teams of physicists. This analysis is often exploratory, where scientists are unable to enumerate the possible types of signal prior to performing the experiment. Thus, tools for summarizing, clustering, visualizing and classifying high-dimensional data are essential. In this work, we show that meaningful physical content can be revealed by transforming the raw data into a learned high-level representation using deep neural networks, with measurements taken at the Daya Bay Neutrino Experiment as a case study. We further show how convolutional deep neural networks can provide an effective classification filter with greater than 97% accuracy across different classes of physics events, significantly better than other machine learning approaches

arXiv.org e-Print Archive

eScholarship - University of California

High-Performance Statistical Computing in the Computing Environments of the 2020s

Author: Ko Seyoon
Won Joong-Ho
Zhou Hua
Zhou Jin J.
Publication venue
Publication date: 16/07/2021
Field of study

Technological advances in the past decade, hardware and software alike, have made access to high-performance computing (HPC) easier than ever. We review these advances from a statistical computing perspective. Cloud computing makes access to supercomputers affordable. Deep learning software libraries make programming statistical algorithms easy and enable users to write code once and run it anywhere -- from a laptop to a workstation with multiple graphics processing units (GPUs) or a supercomputer in a cloud. Highlighting how these developments benefit statisticians, we review recent optimization algorithms that are useful for high-dimensional models and can harness the power of HPC. Code snippets are provided to demonstrate the ease of programming. We also provide an easy-to-use distributed matrix data structure suitable for HPC. Employing this data structure, we illustrate various statistical applications including large-scale positron emission tomography and

\ell_1

-regularized Cox regression. Our examples easily scale up to an 8-GPU workstation and a 720-CPU-core cluster in a cloud. As a case in point, we analyze the onset of type-2 diabetes from the UK Biobank with 200,000 subjects and about 500,000 single nucleotide polymorphisms using the HPC

\ell_1

-regularized Cox regression. Fitting this half-million-variate model takes less than 45 minutes and reconfirms known associations. To our knowledge, this is the first demonstration of the feasibility of penalized regression of survival outcomes at this scale.Comment: Accepted for publication in Statistical Scienc

arXiv.org e-Print Archive

The University of Arizona

eScholarship - University of California

OPENMENDEL: A Cooperative Programming Project for Statistical Genetics

Author: Bates Douglas M.
Chu Benjamin B.
German Christopher A.
Ji Sarah S.
Keys Kevin L.
Kim Juhyun
Ko Seyoon
Lange Kenneth
Mosher Gordon D.
Papp Jeanette C.
Sinsheimer Janet S.
Sobel Eric M.
Zhai Jing
Zhou Hua
Zhou Jin J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/02/2019
Field of study

Statistical methods for genomewide association studies (GWAS) continue to improve. However, the increasing volume and variety of genetic and genomic data make computational speed and ease of data manipulation mandatory in future software. In our view, a collaborative effort of statistical geneticists is required to develop open source software targeted to genetic epidemiology. Our attempt to meet this need is called the OPENMENDELproject (https://openmendel.github.io). It aims to (1) enable interactive and reproducible analyses with informative intermediate results, (2) scale to big data analytics, (3) embrace parallel and distributed computing, (4) adapt to rapid hardware evolution, (5) allow cloud computing, (6) allow integration of varied genetic data types, and (7) foster easy communication between clinicians, geneticists, statisticians, and computer scientists. This article reviews and makes recommendations to the genetic epidemiology community in the context of the OPENMENDEL project.Comment: 16 pages, 2 figures, 2 table

arXiv.org e-Print Archive

eScholarship - University of California