1,017 research outputs found
Enabling On-Demand Database Computing with MIT SuperCloud Database Management System
The MIT SuperCloud database management system allows for rapid creation and
flexible execution of a variety of the latest scientific databases, including
Apache Accumulo and SciDB. It is designed to permit these databases to run on a
High Performance Computing Cluster (HPCC) platform as seamlessly as any other
HPCC job. It ensures the seamless migration of the databases to the resources
assigned by the HPCC scheduler and centralized storage of the database files
when not running. It also permits snapshotting of databases to allow
researchers to experiment and push the limits of the technology without
concerns for data or productivity loss if the database becomes unstable.Comment: 6 pages; accepted to IEEE High Performance Extreme Computing (HPEC)
conference 2015. arXiv admin note: text overlap with arXiv:1406.492
Havens: Explicit Reliable Memory Regions for HPC Applications
Supporting error resilience in future exascale-class supercomputing systems
is a critical challenge. Due to transistor scaling trends and increasing memory
density, scientific simulations are expected to experience more interruptions
caused by transient errors in the system memory. Existing hardware-based
detection and recovery techniques will be inadequate to manage the presence of
high memory fault rates.
In this paper we propose a partial memory protection scheme based on
region-based memory management. We define the concept of regions called havens
that provide fault protection for program objects. We provide reliability for
the regions through a software-based parity protection mechanism. Our approach
enables critical program objects to be placed in these havens. The fault
coverage provided by our approach is application agnostic, unlike
algorithm-based fault tolerance techniques.Comment: 2016 IEEE High Performance Extreme Computing Conference (HPEC '16),
September 2016, Waltham, MA, US
Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks via Nonlinear Multigrid
A Multigrid Full Approximation Storage algorithm for solving Deep Residual
Networks is developed to enable neural network parallelized layer-wise training
and concurrent computational kernel execution on GPUs. This work demonstrates a
10.2x speedup over traditional layer-wise model parallelism techniques using
the same number of compute units.Comment: 7 pages, 6 figures, 27 citations. Accepted to 2020 IEEE High
Performance Extreme Computing Conference - Outstanding Paper Awar
Training Behavior of Sparse Neural Network Topologies
Improvements in the performance of deep neural networks have often come
through the design of larger and more complex networks. As a result, fast
memory is a significant limiting factor in our ability to improve network
performance. One approach to overcoming this limit is the design of sparse
neural networks, which can be both very large and efficiently trained. In this
paper we experiment training on sparse neural network topologies. We test
pruning-based topologies, which are derived from an initially dense network
whose connections are pruned, as well as RadiX-Nets, a class of network
topologies with proven connectivity and sparsity properties. Results show that
sparse networks obtain accuracies comparable to dense networks, but extreme
levels of sparsity cause instability in training, which merits further study.Comment: 6 pages. Presented at the 2019 IEEE High Performance Extreme
Computing (HPEC) Conference. Received "Best Paper" awar
Genetic Sequence Matching Using D4M Big Data Approaches
Recent technological advances in Next Generation Sequencing tools have led to
increasing speeds of DNA sample collection, preparation, and sequencing. One
instrument can produce over 600 Gb of genetic sequence data in a single run.
This creates new opportunities to efficiently handle the increasing workload.
We propose a new method of fast genetic sequence analysis using the Dynamic
Distributed Dimensional Data Model (D4M) - an associative array environment for
MATLAB developed at MIT Lincoln Laboratory. Based on mathematical and
statistical properties, the method leverages big data techniques and the
implementation of an Apache Acculumo database to accelerate computations
one-hundred fold over other methods. Comparisons of the D4M method with the
current gold-standard for sequence analysis, BLAST, show the two are comparable
in the alignments they find. This paper will present an overview of the D4M
genetic sequence algorithm and statistical comparisons with BLAST.Comment: 6 pages; to appear in IEEE High Performance Extreme Computing (HPEC)
201
Fast and accurate object detection in high resolution 4K and 8K video using GPUs
Machine learning has celebrated a lot of achievements on computer vision
tasks such as object detection, but the traditionally used models work with
relatively low resolution images. The resolution of recording devices is
gradually increasing and there is a rising need for new methods of processing
high resolution data. We propose an attention pipeline method which uses two
staged evaluation of each image or video frame under rough and refined
resolution to limit the total number of necessary evaluations. For both stages,
we make use of the fast object detection model YOLO v2. We have implemented our
model in code, which distributes the work across GPUs. We maintain high
accuracy while reaching the average performance of 3-6 fps on 4K video and 2
fps on 8K video.Comment: 6 pages, 12 figures, Best Paper Finalist at IEEE High Performance
Extreme Computing Conference (HPEC) 2018; copyright 2018 IEEE; (DOI will be
filled when known
Hypersparse Neural Network Analysis of Large-Scale Internet Traffic
The Internet is transforming our society, necessitating a quantitative
understanding of Internet traffic. Our team collects and curates the largest
publicly available Internet traffic data containing 50 billion packets.
Utilizing a novel hypersparse neural network analysis of "video" streams of
this traffic using 10,000 processors in the MIT SuperCloud reveals a new
phenomena: the importance of otherwise unseen leaf nodes and isolated links in
Internet traffic. Our neural network approach further shows that a
two-parameter modified Zipf-Mandelbrot distribution accurately describes a wide
variety of source/destination statistics on moving sample windows ranging from
100,000 to 100,000,000 packets over collections that span years and continents.
The inferred model parameters distinguish different network streams and the
model leaf parameter strongly correlates with the fraction of the traffic in
different underlying network topologies. The hypersparse neural network
pipeline is highly adaptable and different network statistics and training
models can be incorporated with simple changes to the image filter functions.Comment: 11 pages, 10 figures, 3 tables, 60 citations; to appear in IEEE High
Performance Extreme Computing (HPEC) 201
Deploying AI Frameworks on Secure HPC Systems with Containers
The increasing interest in the usage of Artificial Intelligence techniques
(AI) from the research community and industry to tackle "real world" problems,
requires High Performance Computing (HPC) resources to efficiently compute and
scale complex algorithms across thousands of nodes. Unfortunately, typical data
scientists are not familiar with the unique requirements and characteristics of
HPC environments. They usually develop their applications with high-level
scripting languages or frameworks such as TensorFlow and the installation
process often requires connection to external systems to download open source
software during the build. HPC environments, on the other hand, are often based
on closed source applications that incorporate parallel and distributed
computing API's such as MPI and OpenMP, while users have restricted
administrator privileges, and face security restrictions such as not allowing
access to external systems. In this paper we discuss the issues associated with
the deployment of AI frameworks in a secure HPC environment and how we
successfully deploy AI frameworks on SuperMUC-NG with Charliecloud.Comment: 6 pages, 2 figures, 2019 IEEE High Performance Extreme Computing
Conferenc
A Linear Algebra Approach to Fast DNA Mixture Analysis Using GPUs
Analysis of DNA samples is an important step in forensics, and the speed of
analysis can impact investigations. Comparison of DNA sequences is based on the
analysis of short tandem repeats (STRs), which are short DNA sequences of 2-5
base pairs. Current forensics approaches use 20 STR loci for analysis. The use
of single nucleotide polymorphisms (SNPs) has utility for analysis of complex
DNA mixtures. The use of tens of thousands of SNPs loci for analysis poses
significant computational challenges because the forensic analysis scales by
the product of the loci count and number of DNA samples to be analyzed. In this
paper, we discuss the implementation of a DNA sequence comparison algorithm by
re-casting the algorithm in terms of linear algebra primitives. By developing
an overloaded matrix multiplication approach to DNA comparisons, we can
leverage advances in GPU hardware and algoithms for Dense Generalized
Matrix-Multiply (DGEMM) to speed up DNA sample comparisons. We show that it is
possible to compare 2048 unknown DNA samples with 20 million known samples in
under 6 seconds using a NVIDIA K80 GPU.Comment: Accepted for publication at the 2017 IEEE High Performance Extreme
Computing conferenc
- …