22 research outputs found
Revealing Fundamental Physics from the Daya Bay Neutrino Experiment using Deep Neural Networks
Experiments in particle physics produce enormous quantities of data that must
be analyzed and interpreted by teams of physicists. This analysis is often
exploratory, where scientists are unable to enumerate the possible types of
signal prior to performing the experiment. Thus, tools for summarizing,
clustering, visualizing and classifying high-dimensional data are essential. In
this work, we show that meaningful physical content can be revealed by
transforming the raw data into a learned high-level representation using deep
neural networks, with measurements taken at the Daya Bay Neutrino Experiment as
a case study. We further show how convolutional deep neural networks can
provide an effective classification filter with greater than 97% accuracy
across different classes of physics events, significantly better than other
machine learning approaches
High-Performance Statistical Computing in the Computing Environments of the 2020s
Technological advances in the past decade, hardware and software alike, have
made access to high-performance computing (HPC) easier than ever. We review
these advances from a statistical computing perspective. Cloud computing makes
access to supercomputers affordable. Deep learning software libraries make
programming statistical algorithms easy and enable users to write code once and
run it anywhere -- from a laptop to a workstation with multiple graphics
processing units (GPUs) or a supercomputer in a cloud. Highlighting how these
developments benefit statisticians, we review recent optimization algorithms
that are useful for high-dimensional models and can harness the power of HPC.
Code snippets are provided to demonstrate the ease of programming. We also
provide an easy-to-use distributed matrix data structure suitable for HPC.
Employing this data structure, we illustrate various statistical applications
including large-scale positron emission tomography and -regularized Cox
regression. Our examples easily scale up to an 8-GPU workstation and a
720-CPU-core cluster in a cloud. As a case in point, we analyze the onset of
type-2 diabetes from the UK Biobank with 200,000 subjects and about 500,000
single nucleotide polymorphisms using the HPC -regularized Cox
regression. Fitting this half-million-variate model takes less than 45 minutes
and reconfirms known associations. To our knowledge, this is the first
demonstration of the feasibility of penalized regression of survival outcomes
at this scale.Comment: Accepted for publication in Statistical Scienc
OPENMENDEL: A Cooperative Programming Project for Statistical Genetics
Statistical methods for genomewide association studies (GWAS) continue to
improve. However, the increasing volume and variety of genetic and genomic data
make computational speed and ease of data manipulation mandatory in future
software. In our view, a collaborative effort of statistical geneticists is
required to develop open source software targeted to genetic epidemiology. Our
attempt to meet this need is called the OPENMENDELproject
(https://openmendel.github.io). It aims to (1) enable interactive and
reproducible analyses with informative intermediate results, (2) scale to big
data analytics, (3) embrace parallel and distributed computing, (4) adapt to
rapid hardware evolution, (5) allow cloud computing, (6) allow integration of
varied genetic data types, and (7) foster easy communication between
clinicians, geneticists, statisticians, and computer scientists. This article
reviews and makes recommendations to the genetic epidemiology community in the
context of the OPENMENDEL project.Comment: 16 pages, 2 figures, 2 table