160 research outputs found
Exascale Deep Learning to Accelerate Cancer Research
Deep learning, through the use of neural networks, has demonstrated
remarkable ability to automate many routine tasks when presented with
sufficient data for training. The neural network architecture (e.g. number of
layers, types of layers, connections between layers, etc.) plays a critical
role in determining what, if anything, the neural network is able to learn from
the training data. The trend for neural network architectures, especially those
trained on ImageNet, has been to grow ever deeper and more complex. The result
has been ever increasing accuracy on benchmark datasets with the cost of
increased computational demands. In this paper we demonstrate that neural
network architectures can be automatically generated, tailored for a specific
application, with dual objectives: accuracy of prediction and speed of
prediction. Using MENNDL--an HPC-enabled software stack for neural architecture
search--we generate a neural network with comparable accuracy to
state-of-the-art networks on a cancer pathology dataset that is also
faster at inference. The speedup in inference is necessary because of the
volume and velocity of cancer pathology data; specifically, the previous
state-of-the-art networks are too slow for individual researchers without
access to HPC systems to keep pace with the rate of data generation. Our new
model enables researchers with modest computational resources to analyze newly
generated data faster than it is collected.Comment: Submitted to IEEE Big Dat
The Federal Big Data Research and Development Strategic Plan
This document was developed through the contributions of the NITRD Big Data SSG members and staff. A special thanks and appreciation to the core team of editors, writers, and reviewers: Lida Beninson (NSF), Quincy Brown (NSF), Elizabeth Burrows (NSF), Dana Hunter (NSF), Craig Jolley (USAID), Meredith Lee (DHS), Nishal Mohan (NSF), Chloe Poston (NSF), Renata Rawlings-Goss (NSF), Carly Robinson (DOE Science), Alejandro Suarez (NSF), Martin Wiener (NSF), and Fen Zhao (NSF).
A national Big Data1 innovation ecosystem is essential to enabling knowledge discovery from and confident action informed by the vast resource of new and diverse datasets that are rapidly becoming available in nearly every aspect of life. Big Data has the potential to radically improve the lives of all Americans. It is now possible to combine disparate, dynamic, and distributed datasets and enable everything from predicting the future behavior of complex systems to precise medical treatments, smart energy usage, and focused educational curricula. Government agency research and public-private partnerships, together with the education and training of future data scientists, will enable applications that directly benefit society and the economy of the Nation.
To derive the greatest benefits from the many, rich sources of Big Data, the Administration announced a âBig Data Research and Development Initiativeâ on March 29, 2012.2 Dr. John P. Holdren, Assistant to the President for Science and Technology and Director of the Office of Science and Technology Policy, stated that the initiative âpromises to transform our ability to use Big Data for scientific discovery, environmental and biomedical research, education, and national security.â
The Federal Big Data Research and Development Strategic Plan (Plan) builds upon the promise and excitement of the myriad applications enabled by Big Data with the objective of guiding Federal agencies as they develop and expand their individual mission-driven programs and investments related to Big Data. The Plan is based on inputs from a series of Federal agency and public activities, and a shared vision: We envision a Big Data innovation ecosystem in which the ability to analyze, extract information from, and make decisions and discoveries based upon large, diverse, and real-time datasets enables new capabilities for Federal agencies and the Nation at large; accelerates the process of scientific discovery and innovation; leads to new fields of research and new areas of inquiry that would otherwise be impossible; educates the next generation of 21st century scientists and engineers; and promotes new economic growth.
The Plan is built around seven strategies that represent key areas of importance for Big Data research and development (R&D). Priorities listed within each strategy highlight the intended outcomes that can be addressed by the missions and research funding of NITRD agencies. These include advancing human understanding in all branches of science, medicine, and security; ensuring the Nationâs continued leadership in research and development; and enhancing the Nationâs ability to address pressing societal and environmental issues facing the Nation and the world through research and development
The Federal Big Data Research and Development Strategic Plan
This document was developed through the contributions of the NITRD Big Data SSG members and staff. A special thanks and appreciation to the core team of editors, writers, and reviewers: Lida Beninson (NSF), Quincy Brown (NSF), Elizabeth Burrows (NSF), Dana Hunter (NSF), Craig Jolley (USAID), Meredith Lee (DHS), Nishal Mohan (NSF), Chloe Poston (NSF), Renata Rawlings-Goss (NSF), Carly Robinson (DOE Science), Alejandro Suarez (NSF), Martin Wiener (NSF), and Fen Zhao (NSF).
A national Big Data1 innovation ecosystem is essential to enabling knowledge discovery from and confident action informed by the vast resource of new and diverse datasets that are rapidly becoming available in nearly every aspect of life. Big Data has the potential to radically improve the lives of all Americans. It is now possible to combine disparate, dynamic, and distributed datasets and enable everything from predicting the future behavior of complex systems to precise medical treatments, smart energy usage, and focused educational curricula. Government agency research and public-private partnerships, together with the education and training of future data scientists, will enable applications that directly benefit society and the economy of the Nation.
To derive the greatest benefits from the many, rich sources of Big Data, the Administration announced a âBig Data Research and Development Initiativeâ on March 29, 2012.2 Dr. John P. Holdren, Assistant to the President for Science and Technology and Director of the Office of Science and Technology Policy, stated that the initiative âpromises to transform our ability to use Big Data for scientific discovery, environmental and biomedical research, education, and national security.â
The Federal Big Data Research and Development Strategic Plan (Plan) builds upon the promise and excitement of the myriad applications enabled by Big Data with the objective of guiding Federal agencies as they develop and expand their individual mission-driven programs and investments related to Big Data. The Plan is based on inputs from a series of Federal agency and public activities, and a shared vision: We envision a Big Data innovation ecosystem in which the ability to analyze, extract information from, and make decisions and discoveries based upon large, diverse, and real-time datasets enables new capabilities for Federal agencies and the Nation at large; accelerates the process of scientific discovery and innovation; leads to new fields of research and new areas of inquiry that would otherwise be impossible; educates the next generation of 21st century scientists and engineers; and promotes new economic growth.
The Plan is built around seven strategies that represent key areas of importance for Big Data research and development (R&D). Priorities listed within each strategy highlight the intended outcomes that can be addressed by the missions and research funding of NITRD agencies. These include advancing human understanding in all branches of science, medicine, and security; ensuring the Nationâs continued leadership in research and development; and enhancing the Nationâs ability to address pressing societal and environmental issues facing the Nation and the world through research and development
IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads
The drug discovery process currently employed in the pharmaceutical industry typically requires about 10 years and $2â3 billion to deliver one new drug. This is both too expensive and too slow, especially in emergencies like the COVID-19 pandemic. In silico methodologies need to be improved both to select better lead compounds, so as to improve the efficiency of later stages in the drug discovery protocol, and to identify those lead compounds more quickly. No known methodological approach can deliver this combination of higher quality and speed. Here, we describe an Integrated Modeling PipEline for COVID Cure by Assessing Better LEads (IMPECCABLE) that employs multiple methodological innovations to overcome this fundamental limitation. We also describe the computational framework that we have developed to support these innovations at scale, and characterize the performance of this framework in terms of throughput, peak performance, and scientific results. We show that individual workflow components deliver 100 Ă to 1000 Ă improvement over traditional methods, and that the integration of methods, supported by scalable infrastructure, speeds up drug discovery by orders of magnitudes. IMPECCABLE has screened ⌠1011 ligands and has been used to discover a promising drug candidate. These capabilities have been used by the US DOE National Virtual Biotechnology Laboratory and the EU Centre of Excellence in Computational Biomedicine
Folding@home: achievements from over twenty years of citizen science herald the exascale era
Simulations of biomolecules have enormous potential to inform our
understanding of biology but require extremely demanding calculations. For over
twenty years, the Folding@home distributed computing project has pioneered a
massively parallel approach to biomolecular simulation, harnessing the
resources of citizen scientists across the globe. Here, we summarize the
scientific and technical advances this perspective has enabled. As the
project's name implies, the early years of Folding@home focused on driving
advances in our understanding of protein folding by developing statistical
methods for capturing long-timescale processes and facilitating insight into
complex dynamical processes. Success laid a foundation for broadening the scope
of Folding@home to address other functionally relevant conformational changes,
such as receptor signaling, enzyme dynamics, and ligand binding. Continued
algorithmic advances, hardware developments such as GPU-based computing, and
the growing scale of Folding@home have enabled the project to focus on new
areas where massively parallel sampling can be impactful. While previous work
sought to expand toward larger proteins with slower conformational changes, new
work focuses on large-scale comparative studies of different protein sequences
and chemical compounds to better understand biology and inform the development
of small molecule drugs. Progress on these fronts enabled the community to
pivot quickly in response to the COVID-19 pandemic, expanding to become the
world's first exascale computer and deploying this massive resource to provide
insight into the inner workings of the SARS-CoV-2 virus and aid the development
of new antivirals. This success provides a glimpse of what's to come as
exascale supercomputers come online, and Folding@home continues its work.Comment: 24 pages, 6 figure
MetH: A family of high-resolution and variable-shape image challenges
High-resolution and variable-shape images have not yet been properly addressed by the AI community. The approach of down-sampling data often used with convolutional neural networks is sub-optimal for many tasks, and has too many drawbacks to be considered a sustainable alternative. In sight of the increasing importance of problems that can benefit from exploiting high-resolution (HR) and variable-shape, and with the goal of promoting research in that direction, we introduce a new family of datasets (MetH). The four proposed problems include two image classification, one image regression and one super resolution task. Each of these datasets contains thousands of art pieces captured by HR and variable-shape images, labeled by experts at the Metropolitan Museum of Art. We perform an analysis, which shows how the proposed tasks go well beyond current public alternatives in both pixel size and aspect ratio variance. At the same time, the performance obtained by popular architectures on these tasks shows that there is ample room for improvement. To wrap up the relevance of the contribution we review the fields, both in AI and high-performance computing, that could benefit from the proposed challenges.This work is partially supported by the Intel-BSC Exascale Lab agreement, by the Spanish Government through Programa Severo Ochoa (SEV-2015-0493), by the Spanish Ministry of Science and Technology through TIN2015-65316-P project, and by the Generalitat de Catalunya (contracts 2017-SGR-1414).Preprin
Pandemic Drugs at Pandemic Speed: Infrastructure for Accelerating COVID-19 Drug Discovery with Hybrid Machine Learning- and Physics-based Simulations on High Performance Computers
The race to meet the challenges of the global pandemic has served as a reminder that the existing drug discovery process is expensive, inefficient and slow. There is a major bottleneck screening the vast number of potential small molecules to shortlist lead compounds for antiviral drug development. New opportunities to accelerate drug discovery lie at the interface between machine learning methods, in this case, developed for linear accelerators, and physics-based methods. The two in silico methods, each have their own advantages and limitations which, interestingly, complement each other. Here, we present an innovative infrastructural development that combines both approaches to accelerate drug discovery. The scale of the potential resulting workflow is such that it is dependent on supercomputing to achieve extremely high throughput. We have demonstrated the viability of this workflow for the study of inhibitors for four COVID-19 target proteins and our ability to perform the required large-scale calculations to identify lead antiviral compounds through repurposing on a variety of supercomputers
TensorFlow as a DSL for stencil-based computation on the Cerebras Wafer Scale Engine
The Cerebras Wafer Scale Engine (WSE) is an accelerator that combines
hundreds of thousands of AI-cores onto a single chip. Whilst this technology
has been designed for machine learning workloads, the significant amount of
available raw compute means that it is also a very interesting potential target
for accelerating traditional HPC computational codes. Many of these algorithms
are stencil-based, where update operations involve contributions from
neighbouring elements, and in this paper we explore the suitability of this
technology for such codes from the perspective of an early adopter of the
technology, compared to CPUs and GPUs. Using TensorFlow as the interface, we
explore the performance and demonstrate that, whilst there is still work to be
done around exposing the programming interface to users, performance of the WSE
is impressive as it out performs four V100 GPUs by two and a half times and two
Intel Xeon Platinum CPUs by around 114 times in our experiments. There is
significant potential therefore for this technology to play an important role
in accelerating HPC codes on future exascale supercomputers.Comment: This preprint has not undergone any post-submission improvements or
corrections. Preprint of paper submitted to Euro-Par DSL-HPC worksho
- âŠ