160 research outputs found

    Exascale Deep Learning to Accelerate Cancer Research

    Full text link
    Deep learning, through the use of neural networks, has demonstrated remarkable ability to automate many routine tasks when presented with sufficient data for training. The neural network architecture (e.g. number of layers, types of layers, connections between layers, etc.) plays a critical role in determining what, if anything, the neural network is able to learn from the training data. The trend for neural network architectures, especially those trained on ImageNet, has been to grow ever deeper and more complex. The result has been ever increasing accuracy on benchmark datasets with the cost of increased computational demands. In this paper we demonstrate that neural network architectures can be automatically generated, tailored for a specific application, with dual objectives: accuracy of prediction and speed of prediction. Using MENNDL--an HPC-enabled software stack for neural architecture search--we generate a neural network with comparable accuracy to state-of-the-art networks on a cancer pathology dataset that is also 16×16\times faster at inference. The speedup in inference is necessary because of the volume and velocity of cancer pathology data; specifically, the previous state-of-the-art networks are too slow for individual researchers without access to HPC systems to keep pace with the rate of data generation. Our new model enables researchers with modest computational resources to analyze newly generated data faster than it is collected.Comment: Submitted to IEEE Big Dat

    The Federal Big Data Research and Development Strategic Plan

    Get PDF
    This document was developed through the contributions of the NITRD Big Data SSG members and staff. A special thanks and appreciation to the core team of editors, writers, and reviewers: Lida Beninson (NSF), Quincy Brown (NSF), Elizabeth Burrows (NSF), Dana Hunter (NSF), Craig Jolley (USAID), Meredith Lee (DHS), Nishal Mohan (NSF), Chloe Poston (NSF), Renata Rawlings-Goss (NSF), Carly Robinson (DOE Science), Alejandro Suarez (NSF), Martin Wiener (NSF), and Fen Zhao (NSF). A national Big Data1 innovation ecosystem is essential to enabling knowledge discovery from and confident action informed by the vast resource of new and diverse datasets that are rapidly becoming available in nearly every aspect of life. Big Data has the potential to radically improve the lives of all Americans. It is now possible to combine disparate, dynamic, and distributed datasets and enable everything from predicting the future behavior of complex systems to precise medical treatments, smart energy usage, and focused educational curricula. Government agency research and public-private partnerships, together with the education and training of future data scientists, will enable applications that directly benefit society and the economy of the Nation. To derive the greatest benefits from the many, rich sources of Big Data, the Administration announced a “Big Data Research and Development Initiative” on March 29, 2012.2 Dr. John P. Holdren, Assistant to the President for Science and Technology and Director of the Office of Science and Technology Policy, stated that the initiative “promises to transform our ability to use Big Data for scientific discovery, environmental and biomedical research, education, and national security.” The Federal Big Data Research and Development Strategic Plan (Plan) builds upon the promise and excitement of the myriad applications enabled by Big Data with the objective of guiding Federal agencies as they develop and expand their individual mission-driven programs and investments related to Big Data. The Plan is based on inputs from a series of Federal agency and public activities, and a shared vision: We envision a Big Data innovation ecosystem in which the ability to analyze, extract information from, and make decisions and discoveries based upon large, diverse, and real-time datasets enables new capabilities for Federal agencies and the Nation at large; accelerates the process of scientific discovery and innovation; leads to new fields of research and new areas of inquiry that would otherwise be impossible; educates the next generation of 21st century scientists and engineers; and promotes new economic growth. The Plan is built around seven strategies that represent key areas of importance for Big Data research and development (R&D). Priorities listed within each strategy highlight the intended outcomes that can be addressed by the missions and research funding of NITRD agencies. These include advancing human understanding in all branches of science, medicine, and security; ensuring the Nation’s continued leadership in research and development; and enhancing the Nation’s ability to address pressing societal and environmental issues facing the Nation and the world through research and development

    The Federal Big Data Research and Development Strategic Plan

    Get PDF
    This document was developed through the contributions of the NITRD Big Data SSG members and staff. A special thanks and appreciation to the core team of editors, writers, and reviewers: Lida Beninson (NSF), Quincy Brown (NSF), Elizabeth Burrows (NSF), Dana Hunter (NSF), Craig Jolley (USAID), Meredith Lee (DHS), Nishal Mohan (NSF), Chloe Poston (NSF), Renata Rawlings-Goss (NSF), Carly Robinson (DOE Science), Alejandro Suarez (NSF), Martin Wiener (NSF), and Fen Zhao (NSF). A national Big Data1 innovation ecosystem is essential to enabling knowledge discovery from and confident action informed by the vast resource of new and diverse datasets that are rapidly becoming available in nearly every aspect of life. Big Data has the potential to radically improve the lives of all Americans. It is now possible to combine disparate, dynamic, and distributed datasets and enable everything from predicting the future behavior of complex systems to precise medical treatments, smart energy usage, and focused educational curricula. Government agency research and public-private partnerships, together with the education and training of future data scientists, will enable applications that directly benefit society and the economy of the Nation. To derive the greatest benefits from the many, rich sources of Big Data, the Administration announced a “Big Data Research and Development Initiative” on March 29, 2012.2 Dr. John P. Holdren, Assistant to the President for Science and Technology and Director of the Office of Science and Technology Policy, stated that the initiative “promises to transform our ability to use Big Data for scientific discovery, environmental and biomedical research, education, and national security.” The Federal Big Data Research and Development Strategic Plan (Plan) builds upon the promise and excitement of the myriad applications enabled by Big Data with the objective of guiding Federal agencies as they develop and expand their individual mission-driven programs and investments related to Big Data. The Plan is based on inputs from a series of Federal agency and public activities, and a shared vision: We envision a Big Data innovation ecosystem in which the ability to analyze, extract information from, and make decisions and discoveries based upon large, diverse, and real-time datasets enables new capabilities for Federal agencies and the Nation at large; accelerates the process of scientific discovery and innovation; leads to new fields of research and new areas of inquiry that would otherwise be impossible; educates the next generation of 21st century scientists and engineers; and promotes new economic growth. The Plan is built around seven strategies that represent key areas of importance for Big Data research and development (R&D). Priorities listed within each strategy highlight the intended outcomes that can be addressed by the missions and research funding of NITRD agencies. These include advancing human understanding in all branches of science, medicine, and security; ensuring the Nation’s continued leadership in research and development; and enhancing the Nation’s ability to address pressing societal and environmental issues facing the Nation and the world through research and development

    IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads

    Get PDF
    The drug discovery process currently employed in the pharmaceutical industry typically requires about 10 years and $2–3 billion to deliver one new drug. This is both too expensive and too slow, especially in emergencies like the COVID-19 pandemic. In silico methodologies need to be improved both to select better lead compounds, so as to improve the efficiency of later stages in the drug discovery protocol, and to identify those lead compounds more quickly. No known methodological approach can deliver this combination of higher quality and speed. Here, we describe an Integrated Modeling PipEline for COVID Cure by Assessing Better LEads (IMPECCABLE) that employs multiple methodological innovations to overcome this fundamental limitation. We also describe the computational framework that we have developed to support these innovations at scale, and characterize the performance of this framework in terms of throughput, peak performance, and scientific results. We show that individual workflow components deliver 100 × to 1000 × improvement over traditional methods, and that the integration of methods, supported by scalable infrastructure, speeds up drug discovery by orders of magnitudes. IMPECCABLE has screened ∌ 1011 ligands and has been used to discover a promising drug candidate. These capabilities have been used by the US DOE National Virtual Biotechnology Laboratory and the EU Centre of Excellence in Computational Biomedicine

    Folding@home: achievements from over twenty years of citizen science herald the exascale era

    Full text link
    Simulations of biomolecules have enormous potential to inform our understanding of biology but require extremely demanding calculations. For over twenty years, the Folding@home distributed computing project has pioneered a massively parallel approach to biomolecular simulation, harnessing the resources of citizen scientists across the globe. Here, we summarize the scientific and technical advances this perspective has enabled. As the project's name implies, the early years of Folding@home focused on driving advances in our understanding of protein folding by developing statistical methods for capturing long-timescale processes and facilitating insight into complex dynamical processes. Success laid a foundation for broadening the scope of Folding@home to address other functionally relevant conformational changes, such as receptor signaling, enzyme dynamics, and ligand binding. Continued algorithmic advances, hardware developments such as GPU-based computing, and the growing scale of Folding@home have enabled the project to focus on new areas where massively parallel sampling can be impactful. While previous work sought to expand toward larger proteins with slower conformational changes, new work focuses on large-scale comparative studies of different protein sequences and chemical compounds to better understand biology and inform the development of small molecule drugs. Progress on these fronts enabled the community to pivot quickly in response to the COVID-19 pandemic, expanding to become the world's first exascale computer and deploying this massive resource to provide insight into the inner workings of the SARS-CoV-2 virus and aid the development of new antivirals. This success provides a glimpse of what's to come as exascale supercomputers come online, and Folding@home continues its work.Comment: 24 pages, 6 figure

    MetH: A family of high-resolution and variable-shape image challenges

    Get PDF
    High-resolution and variable-shape images have not yet been properly addressed by the AI community. The approach of down-sampling data often used with convolutional neural networks is sub-optimal for many tasks, and has too many drawbacks to be considered a sustainable alternative. In sight of the increasing importance of problems that can benefit from exploiting high-resolution (HR) and variable-shape, and with the goal of promoting research in that direction, we introduce a new family of datasets (MetH). The four proposed problems include two image classification, one image regression and one super resolution task. Each of these datasets contains thousands of art pieces captured by HR and variable-shape images, labeled by experts at the Metropolitan Museum of Art. We perform an analysis, which shows how the proposed tasks go well beyond current public alternatives in both pixel size and aspect ratio variance. At the same time, the performance obtained by popular architectures on these tasks shows that there is ample room for improvement. To wrap up the relevance of the contribution we review the fields, both in AI and high-performance computing, that could benefit from the proposed challenges.This work is partially supported by the Intel-BSC Exascale Lab agreement, by the Spanish Government through Programa Severo Ochoa (SEV-2015-0493), by the Spanish Ministry of Science and Technology through TIN2015-65316-P project, and by the Generalitat de Catalunya (contracts 2017-SGR-1414).Preprin

    Pandemic Drugs at Pandemic Speed: Infrastructure for Accelerating COVID-19 Drug Discovery with Hybrid Machine Learning- and Physics-based Simulations on High Performance Computers

    Get PDF
    The race to meet the challenges of the global pandemic has served as a reminder that the existing drug discovery process is expensive, inefficient and slow. There is a major bottleneck screening the vast number of potential small molecules to shortlist lead compounds for antiviral drug development. New opportunities to accelerate drug discovery lie at the interface between machine learning methods, in this case, developed for linear accelerators, and physics-based methods. The two in silico methods, each have their own advantages and limitations which, interestingly, complement each other. Here, we present an innovative infrastructural development that combines both approaches to accelerate drug discovery. The scale of the potential resulting workflow is such that it is dependent on supercomputing to achieve extremely high throughput. We have demonstrated the viability of this workflow for the study of inhibitors for four COVID-19 target proteins and our ability to perform the required large-scale calculations to identify lead antiviral compounds through repurposing on a variety of supercomputers

    TensorFlow as a DSL for stencil-based computation on the Cerebras Wafer Scale Engine

    Full text link
    The Cerebras Wafer Scale Engine (WSE) is an accelerator that combines hundreds of thousands of AI-cores onto a single chip. Whilst this technology has been designed for machine learning workloads, the significant amount of available raw compute means that it is also a very interesting potential target for accelerating traditional HPC computational codes. Many of these algorithms are stencil-based, where update operations involve contributions from neighbouring elements, and in this paper we explore the suitability of this technology for such codes from the perspective of an early adopter of the technology, compared to CPUs and GPUs. Using TensorFlow as the interface, we explore the performance and demonstrate that, whilst there is still work to be done around exposing the programming interface to users, performance of the WSE is impressive as it out performs four V100 GPUs by two and a half times and two Intel Xeon Platinum CPUs by around 114 times in our experiments. There is significant potential therefore for this technology to play an important role in accelerating HPC codes on future exascale supercomputers.Comment: This preprint has not undergone any post-submission improvements or corrections. Preprint of paper submitted to Euro-Par DSL-HPC worksho
    • 

    corecore