146 research outputs found

    Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis

    Full text link
    Interactive massively parallel computations are critical for machine learning and data analysis. These computations are a staple of the MIT Lincoln Laboratory Supercomputing Center (LLSC) and has required the LLSC to develop unique interactive supercomputing capabilities. Scaling interactive machine learning frameworks, such as TensorFlow, and data analysis environments, such as MATLAB/Octave, to tens of thousands of cores presents many technical challenges - in particular, rapidly dispatching many tasks through a scheduler, such as Slurm, and starting many instances of applications with thousands of dependencies. Careful tuning of launches and prepositioning of applications overcome these challenges and allow the launching of thousands of tasks in seconds on a 40,000-core supercomputer. Specifically, this work demonstrates launching 32,000 TensorFlow processes in 4 seconds and launching 262,000 Octave processes in 40 seconds. These capabilities allow researchers to rapidly explore novel machine learning architecture and data analysis algorithms.Comment: 6 pages, 7 figures, IEEE High Performance Extreme Computing Conference 201

    pPython Performance Study

    Full text link
    pPython seeks to provide a parallel capability that provides good speed-up without sacrificing the ease of programming in Python by implementing partitioned global array semantics (PGAS) on top of a simple file-based messaging library (PythonMPI) in pure Python. pPython follows a SPMD (single program multiple data) model of computation. pPython runs on a single-node (e.g., a laptop) running Windows, Linux, or MacOS operating systems or on any combination of heterogeneous systems that support Python, including on a cluster through a Slurm scheduler interface so that pPython can be executed in a massively parallel computing environment. It is interesting to see what performance pPython can achieve compared to the traditional socket-based MPI communication because of its unique file-based messaging implementation. In this paper, we present the point-to-point and collective communication performances of pPython and compare them with those obtained by using mpi4py with OpenMPI. For large messages, pPython demonstrates comparable performance as compared to mpi4py.Comment: arXiv admin note: substantial text overlap with arXiv:2208.1490

    SamBaS: Sampling-Based Stochastic Block Partitioning

    Full text link
    Community detection is a well-studied problem with applications in domains ranging from networking to bioinformatics. Due to the rapid growth in the volume of real-world data, there is growing interest in accelerating contemporary community detection algorithms. However, the more accurate and statistically robust methods tend to be hard to parallelize. One such method is stochastic block partitioning (SBP) - a community detection algorithm that works well on graphs with complex and heterogeneous community structure. In this paper, we present a sampling-based SBP (SamBaS) for accelerating SBP on sparse graphs. We characterize how various graph parameters affect the speedup and result quality of community detection with SamBaS and quantify the trade-offs therein. To evaluate SamBas on real-world web graphs without known ground-truth communities, we introduce partition quality score (PQS), an evaluation metric that outperforms modularity in terms of correlation with F1 score. Overall, SamBaS achieves speedups of up to 10X while maintaining result quality (and even improving result quality by over 150% on certain graphs, relative to F1 score).Comment: Updated to latest submitted versio

    Hardware as a service - enabling dynamic, user-level bare metal provisioning of pools of data center resources.

    Full text link
    We describe a “Hardware as a Service (HaaS)” tool for isolating pools of compute, storage and networking resources. The goal of HaaS is to enable dynamic and flexible, user-level provisioning of pools of resources at the so-called “bare-metal” layer. It allows experimental or untrusted services to co-exist alongside trusted services. By functioning only as a resource isolation system, users are free to choose between different system scheduling and provisioning systems and to manage isolated resources as they see fit. We describe key HaaS use cases and features. We show how HaaS can provide a valuable, and somehwat overlooked, layer in the software architecture of modern data center management. Documentation and source code for HaaS software are available at: https://github.com/CCI-MOC/haasPartial support for this work was provided by the MassTech Collaborative Research Matching Grant Program, National Science Foundation award #1347525 and several commercial partners of the Mass Open Cloud who may be found at http://www.massopencloud.org.http://www.ieee-hpec.org/2014/CD/index_htm_files/FinalPapers/116.pd

    A Virtual Reality Tool for Representing, Visualizing and Updating Deep Learning Models

    Full text link
    Deep learning is ubiquitous, but its lack of transparency limits its impact on several potential application areas. We demonstrate a virtual reality tool for automating the process of assigning data inputs to different categories. A dataset is represented as a cloud of points in virtual space. The user explores the cloud through movement and uses hand gestures to categorise portions of the cloud. This triggers gradual movements in the cloud: points of the same category are attracted to each other, different groups are pushed apart, while points are globally distributed in a way that utilises the entire space. The space, time, and forces observed in virtual reality can be mapped to well-defined machine learning concepts, namely the latent space, the training epochs and the backpropagation. Our tool illustrates how the inner workings of deep neural networks can be made tangible and transparent. We expect this approach to accelerate the autonomous development of deep learning applications by end users in novel areas

    Exact Distributed Stochastic Block Partitioning

    Full text link
    Stochastic block partitioning (SBP) is a community detection algorithm that is highly accurate even on graphs with a complex community structure, but its inherently serial nature hinders its widespread adoption by the wider scientific community. To make it practical to analyze large real-world graphs with SBP, there is a growing need to parallelize and distribute the algorithm. The current state-of-the-art distributed SBP algorithm is a divide-and-conquer approach that limits communication between compute nodes until the end of inference. This leads to the breaking of computational dependencies, which causes convergence issues as the number of compute nodes increases, and when the graph is sufficiently sparse. In this paper, we introduce EDiSt - an exact distributed stochastic block partitioning algorithm. Under EDiSt, compute nodes periodically share community assignments during inference. Due to this additional communication, EDiSt improves upon the divide-and-conquer algorithm by allowing it to scale out to a larger number of compute nodes without suffering from convergence issues, even on sparse graphs. We show that EDiSt provides speedups of up to 23.8X over the divide-and-conquer approach, and speedups up to 38.0X over shared memory parallel SBP when scaled out to 64 compute nodes

    Processing of Crowdsourced Observations of Aircraft in a High Performance Computing Environment

    Full text link
    As unmanned aircraft systems (UASs) continue to integrate into the U.S. National Airspace System (NAS), there is a need to quantify the risk of airborne collisions between unmanned and manned aircraft to support regulation and standards development. Both regulators and standards developing organizations have made extensive use of Monte Carlo collision risk analysis simulations using probabilistic models of aircraft flight. We've previously determined that the observations of manned aircraft by the OpenSky Network, a community network of ground-based sensors, are appropriate to develop models of the low altitude environment. This works overviews the high performance computing workflow designed and deployed on the Lincoln Laboratory Supercomputing Center to process 3.9 billion observations of aircraft. We then trained the aircraft models using more than 250,000 flight hours at 5,000 feet above ground level or below. A key feature of the workflow is that all the aircraft observations and supporting datasets are available as open source technologies or been released to the public domain.Comment: 6 pages, 4 figures, 4 table

    Towards an Objective Metric for the Performance of Exact Triangle Count

    Full text link
    The performance of graph algorithms is often measured in terms of the number of traversed edges per second (TEPS). However, this performance metric is inadequate for a graph operation such as exact triangle counting. In triangle counting, execution times on graphs with a similar number of edges can be distinctly different as demonstrated by results from the past Graph Challenge entries. We discuss the need for an objective performance metric for graph operations and the desired characteristics of such a metric such that it more accurately captures the interactions between the amount of work performed and the capabilities of the hardware on which the code is executed. Using exact triangle counting as an example, we derive a metric that captures how certain techniques employed in many implementations improve performance. We demonstrate that our proposed metric can be used to evaluate and compare multiple approaches for triangle counting, using a SIMD approach as a case study against a scalar baseline.Comment: 6 Pages, 2020 IEEE High Performance Extreme Computing Conference(HPEC

    Обнаружение объектов на изображениях с большим разрешением на основе их пирамидально-блочной обработки

    Get PDF
    In the paper the algorithm for object detection in high resolution images is proposed. The approach uses multiscale image representation followed by block processing with the overlapping value. For each block the object detection with convolutional neural network was performed. Number of pyramid layers is limited by the Convolutional Neural Network layer size and input image resolution. Overlapping blocks splitting to improve the classification and detection accuracy is performed on each layer of pyramid except the highest one. Detected areas are merged into one if they have high overlapping value and the same class. Experimental results for the algorithm are presented in the paper.Предлагается алгоритм для обнаружения объектов на изображениях с большим разрешением, основанный на многомасштабном представлении изображения, пирамидально-блочной обработке с перекрытием, применении сверточной нейронной сети для каждого блока и объединении обнаруженных областей. Количество слоев пирамиды определяется размерами изображения и входного слоя используемой сверточной нейронной сети. На всех уровнях, кроме самого верхнего, выполняется блочное разбиение, а применение при этом перекрытия позволяет улучшить правильную классификацию объектов, которые разделяются на фрагменты и расположены в соседних блоках. Решение об объединении таких областей принимается на основе анализа метрики пересечения над объединением для них и принадлежности к одному классу. Представленные результаты тестирования алгоритма подтверждают, что рассмотренный подход позволяет повысить точность обнаружения объектов небольших размеров на изображениях с большим разрешением
    corecore