146 research outputs found
Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis
Interactive massively parallel computations are critical for machine learning
and data analysis. These computations are a staple of the MIT Lincoln
Laboratory Supercomputing Center (LLSC) and has required the LLSC to develop
unique interactive supercomputing capabilities. Scaling interactive machine
learning frameworks, such as TensorFlow, and data analysis environments, such
as MATLAB/Octave, to tens of thousands of cores presents many technical
challenges - in particular, rapidly dispatching many tasks through a scheduler,
such as Slurm, and starting many instances of applications with thousands of
dependencies. Careful tuning of launches and prepositioning of applications
overcome these challenges and allow the launching of thousands of tasks in
seconds on a 40,000-core supercomputer. Specifically, this work demonstrates
launching 32,000 TensorFlow processes in 4 seconds and launching 262,000 Octave
processes in 40 seconds. These capabilities allow researchers to rapidly
explore novel machine learning architecture and data analysis algorithms.Comment: 6 pages, 7 figures, IEEE High Performance Extreme Computing
Conference 201
pPython Performance Study
pPython seeks to provide a parallel capability that provides good speed-up
without sacrificing the ease of programming in Python by implementing
partitioned global array semantics (PGAS) on top of a simple file-based
messaging library (PythonMPI) in pure Python. pPython follows a SPMD (single
program multiple data) model of computation. pPython runs on a single-node
(e.g., a laptop) running Windows, Linux, or MacOS operating systems or on any
combination of heterogeneous systems that support Python, including on a
cluster through a Slurm scheduler interface so that pPython can be executed in
a massively parallel computing environment. It is interesting to see what
performance pPython can achieve compared to the traditional socket-based MPI
communication because of its unique file-based messaging implementation. In
this paper, we present the point-to-point and collective communication
performances of pPython and compare them with those obtained by using mpi4py
with OpenMPI. For large messages, pPython demonstrates comparable performance
as compared to mpi4py.Comment: arXiv admin note: substantial text overlap with arXiv:2208.1490
SamBaS: Sampling-Based Stochastic Block Partitioning
Community detection is a well-studied problem with applications in domains
ranging from networking to bioinformatics. Due to the rapid growth in the
volume of real-world data, there is growing interest in accelerating
contemporary community detection algorithms. However, the more accurate and
statistically robust methods tend to be hard to parallelize. One such method is
stochastic block partitioning (SBP) - a community detection algorithm that
works well on graphs with complex and heterogeneous community structure. In
this paper, we present a sampling-based SBP (SamBaS) for accelerating SBP on
sparse graphs. We characterize how various graph parameters affect the speedup
and result quality of community detection with SamBaS and quantify the
trade-offs therein. To evaluate SamBas on real-world web graphs without known
ground-truth communities, we introduce partition quality score (PQS), an
evaluation metric that outperforms modularity in terms of correlation with F1
score. Overall, SamBaS achieves speedups of up to 10X while maintaining result
quality (and even improving result quality by over 150% on certain graphs,
relative to F1 score).Comment: Updated to latest submitted versio
Hardware as a service - enabling dynamic, user-level bare metal provisioning of pools of data center resources.
We describe a “Hardware as a Service (HaaS)” tool for isolating pools of compute, storage and networking resources. The goal of HaaS is to enable dynamic and flexible, user-level provisioning of pools of resources at the so-called “bare-metal” layer. It allows experimental or untrusted services to co-exist alongside trusted services. By functioning only as a resource isolation system, users are free to choose between different system scheduling and provisioning systems and to manage isolated resources as they see fit. We describe key HaaS use cases and features. We show how HaaS can provide a valuable, and somehwat overlooked, layer in the software architecture of modern data center management. Documentation and source code for HaaS software are available at: https://github.com/CCI-MOC/haasPartial support for this work was provided by the MassTech Collaborative Research Matching Grant Program, National Science Foundation award #1347525 and several commercial partners of the Mass Open Cloud who may be found at http://www.massopencloud.org.http://www.ieee-hpec.org/2014/CD/index_htm_files/FinalPapers/116.pd
A Virtual Reality Tool for Representing, Visualizing and Updating Deep Learning Models
Deep learning is ubiquitous, but its lack of transparency limits its impact
on several potential application areas. We demonstrate a virtual reality tool
for automating the process of assigning data inputs to different categories. A
dataset is represented as a cloud of points in virtual space. The user explores
the cloud through movement and uses hand gestures to categorise portions of the
cloud. This triggers gradual movements in the cloud: points of the same
category are attracted to each other, different groups are pushed apart, while
points are globally distributed in a way that utilises the entire space. The
space, time, and forces observed in virtual reality can be mapped to
well-defined machine learning concepts, namely the latent space, the training
epochs and the backpropagation. Our tool illustrates how the inner workings of
deep neural networks can be made tangible and transparent. We expect this
approach to accelerate the autonomous development of deep learning applications
by end users in novel areas
Exact Distributed Stochastic Block Partitioning
Stochastic block partitioning (SBP) is a community detection algorithm that
is highly accurate even on graphs with a complex community structure, but its
inherently serial nature hinders its widespread adoption by the wider
scientific community. To make it practical to analyze large real-world graphs
with SBP, there is a growing need to parallelize and distribute the algorithm.
The current state-of-the-art distributed SBP algorithm is a divide-and-conquer
approach that limits communication between compute nodes until the end of
inference. This leads to the breaking of computational dependencies, which
causes convergence issues as the number of compute nodes increases, and when
the graph is sufficiently sparse. In this paper, we introduce EDiSt - an exact
distributed stochastic block partitioning algorithm. Under EDiSt, compute nodes
periodically share community assignments during inference. Due to this
additional communication, EDiSt improves upon the divide-and-conquer algorithm
by allowing it to scale out to a larger number of compute nodes without
suffering from convergence issues, even on sparse graphs. We show that EDiSt
provides speedups of up to 23.8X over the divide-and-conquer approach, and
speedups up to 38.0X over shared memory parallel SBP when scaled out to 64
compute nodes
Processing of Crowdsourced Observations of Aircraft in a High Performance Computing Environment
As unmanned aircraft systems (UASs) continue to integrate into the U.S.
National Airspace System (NAS), there is a need to quantify the risk of
airborne collisions between unmanned and manned aircraft to support regulation
and standards development. Both regulators and standards developing
organizations have made extensive use of Monte Carlo collision risk analysis
simulations using probabilistic models of aircraft flight. We've previously
determined that the observations of manned aircraft by the OpenSky Network, a
community network of ground-based sensors, are appropriate to develop models of
the low altitude environment. This works overviews the high performance
computing workflow designed and deployed on the Lincoln Laboratory
Supercomputing Center to process 3.9 billion observations of aircraft. We then
trained the aircraft models using more than 250,000 flight hours at 5,000 feet
above ground level or below. A key feature of the workflow is that all the
aircraft observations and supporting datasets are available as open source
technologies or been released to the public domain.Comment: 6 pages, 4 figures, 4 table
Towards an Objective Metric for the Performance of Exact Triangle Count
The performance of graph algorithms is often measured in terms of the number
of traversed edges per second (TEPS). However, this performance metric is
inadequate for a graph operation such as exact triangle counting. In triangle
counting, execution times on graphs with a similar number of edges can be
distinctly different as demonstrated by results from the past Graph Challenge
entries. We discuss the need for an objective performance metric for graph
operations and the desired characteristics of such a metric such that it more
accurately captures the interactions between the amount of work performed and
the capabilities of the hardware on which the code is executed. Using exact
triangle counting as an example, we derive a metric that captures how certain
techniques employed in many implementations improve performance. We demonstrate
that our proposed metric can be used to evaluate and compare multiple
approaches for triangle counting, using a SIMD approach as a case study against
a scalar baseline.Comment: 6 Pages, 2020 IEEE High Performance Extreme Computing
Conference(HPEC
Обнаружение объектов на изображениях с большим разрешением на основе их пирамидально-блочной обработки
In the paper the algorithm for object detection in high resolution images is proposed. The approach uses multiscale image representation followed by block processing with the overlapping value. For each block the object detection with convolutional neural network was performed. Number of pyramid layers is limited by the Convolutional Neural Network layer size and input image resolution. Overlapping blocks splitting to improve the classification and detection accuracy is performed on each layer of pyramid except the highest one. Detected areas are merged into one if they have high overlapping value and the same class. Experimental results for the algorithm are presented in the paper.Предлагается алгоритм для обнаружения объектов на изображениях с большим разрешением, основанный на многомасштабном представлении изображения, пирамидально-блочной обработке с перекрытием, применении сверточной нейронной сети для каждого блока и объединении обнаруженных областей. Количество слоев пирамиды определяется размерами изображения и входного слоя используемой сверточной нейронной сети. На всех уровнях, кроме самого верхнего, выполняется блочное разбиение, а применение при этом перекрытия позволяет улучшить правильную классификацию объектов, которые разделяются на фрагменты и расположены в соседних блоках. Решение об объединении таких областей принимается на основе анализа метрики пересечения над объединением для них и принадлежности к одному классу. Представленные результаты тестирования алгоритма подтверждают, что рассмотренный подход позволяет повысить точность обнаружения объектов небольших размеров на изображениях с большим разрешением
- …