584 research outputs found
Using Containers to Create More Interactive Online Training and Education Materials
Containers are excellent hands-on learning environments for computing topics
because they are customizable, portable, and reproducible. The Cornell
University Center for Advanced Computing has developed the Cornell Virtual
Workshop in high performance computing topics for many years, and we have
always sought to make the materials as rich and interactive as possible. Toward
the goal of building a more hands-on experimental learning experience directly
into web-based online training environments, we developed the Cornell Container
Runner Service, which allows online content developers to build container-based
interactive edit and run commands directly into their web pages. Using
containers along with CCRS has the potential to increase learner engagement and
outcomes.Comment: 10 pages, 3 figures, PEARC '20 conference pape
Distributed workflows with Jupyter
The designers of a new coordination interface enacting complex workflows have to tackle a dichotomy: choosing a language-independent or language-dependent approach. Language-independent approaches decouple workflow models from the host code's business logic and advocate portability. Language-dependent approaches foster flexibility and performance by adopting the same host language for business and coordination code. Jupyter Notebooks, with their capability to describe both imperative and declarative code in a unique format, allow taking the best of the two approaches, maintaining a clear separation between application and coordination layers but still providing a unified interface to both aspects. We advocate the Jupyter Notebooks’ potential to express complex distributed workflows, identifying the general requirements for a Jupyter-based Workflow Management System (WMS) and introducing a proof-of-concept portable implementation working on hybrid Cloud-HPC infrastructures. As a byproduct, we extended the vanilla IPython kernel with workflow-based parallel and distributed execution capabilities. The proposed Jupyter-workflow (Jw) system is evaluated on common scenarios for High Performance Computing (HPC) and Cloud, showing its potential in lowering the barriers between prototypical Notebooks and production-ready implementations
A Container-Based Workflow for Distributed Training of Deep Learning Algorithms in HPC Clusters
Deep learning has been postulated as a solution for numerous problems in
different branches of science. Given the resource-intensive nature of these
models, they often need to be executed on specialized hardware such graphical
processing units (GPUs) in a distributed manner. In the academic field,
researchers get access to this kind of resources through High Performance
Computing (HPC) clusters. This kind of infrastructures make the training of
these models difficult due to their multi-user nature and limited user
permission. In addition, different HPC clusters may possess different
peculiarities that can entangle the research cycle (e.g., libraries
dependencies). In this paper we develop a workflow and methodology for the
distributed training of deep learning models in HPC clusters which provides
researchers with a series of novel advantages. It relies on udocker as
containerization tool and on Horovod as library for the distribution of the
models across multiple GPUs. udocker does not need any special permission,
allowing researchers to run the entire workflow without relying on any
administrator. Horovod ensures the efficient distribution of the training
independently of the deep learning framework used. Additionally, due to
containerization and specific features of the workflow, it provides researchers
with a cluster-agnostic way of running their models. The experiments carried
out show that the workflow offers good scalability in the distributed training
of the models and that it easily adapts to different clusters.Comment: Under review for Cluster Computin
A container-based workflow for distributed training of deep learning algorithms in HPC clusters
Deep learning has been postulated as a solution for numerous problems in different branches of science. Given the resource-intensive nature of these models, they often need to be executed on specialized hardware such graphical processing units (GPUs) in a distributed manner. In the academic field, researchers get access to this kind of resources through High Performance Computing (HPC) clusters. This kind of infrastructures make the training of these models difficult due to their multi-user nature and limited user permission. In addition, different HPC clusters may possess different peculiarities that can entangle the research cycle (e.g., libraries dependencies). In this paper we develop a workflow and methodology for the distributed training of deep learning models in HPC clusters which provides researchers with a series of novel advantages. It relies on udocker as containerization tool and on Horovod as library for the distribution of the models across multiple GPUs. udocker does not need any special permission, allowing researchers to run the entire workflow without relying on any administrator. Horovod ensures the efficient distribution of the training independently of the deep learning framework used. Additionally, due to containerization and specific features of the workflow, it provides researchers with a cluster-agnostic way of running their models. The experiments carried out show that the workflow offers good scalability in the distributed training of the models and that it easily adapts to different clusters
Real-time cortical simulations: energy and interconnect scaling on distributed systems
We profile the impact of computation and inter-processor communication on the
energy consumption and on the scaling of cortical simulations approaching the
real-time regime on distributed computing platforms. Also, the speed and energy
consumption of processor architectures typical of standard HPC and embedded
platforms are compared. We demonstrate the importance of the design of
low-latency interconnect for speed and energy consumption. The cost of cortical
simulations is quantified using the Joule per synaptic event metric on both
architectures. Reaching efficient real-time on large scale cortical simulations
is of increasing relevance for both future bio-inspired artificial intelligence
applications and for understanding the cognitive functions of the brain, a
scientific quest that will require to embed large scale simulations into highly
complex virtual or real worlds. This work stands at the crossroads between the
WaveScalES experiment in the Human Brain Project (HBP), which includes the
objective of large scale thalamo-cortical simulations of brain states and their
transitions, and the ExaNeSt and EuroExa projects, that investigate the design
of an ARM-based, low-power High Performance Computing (HPC) architecture with a
dedicated interconnect scalable to million of cores; simulation of deep sleep
Slow Wave Activity (SWA) and Asynchronous aWake (AW) regimes expressed by
thalamo-cortical models are among their benchmarks.Comment: 8 pages, 8 figures, 4 tables, submitted after final publication on
PDP2019 proceedings, corrected final DOI. arXiv admin note: text overlap with
arXiv:1812.04974, arXiv:1804.0344
Commodity single board computer clusters and their applications
© 2018 Current commodity Single Board Computers (SBCs) are sufficiently powerful to run mainstream operating systems and workloads. Many of these boards may be linked together, to create small, low-cost clusters that replicate some features of large data center clusters. The Raspberry Pi Foundation produces a series of SBCs with a price/performance ratio that makes SBC clusters viable, perhaps even expendable. These clusters are an enabler for Edge/Fog Compute, where processing is pushed out towards data sources, reducing bandwidth requirements and decentralizing the architecture. In this paper we investigate use cases driving the growth of SBC clusters, we examine the trends in future hardware developments, and discuss the potential of SBC clusters as a disruptive technology. Compared to traditional clusters, SBC clusters have a reduced footprint, are low-cost, and have low power requirements. This enables different models of deployment—particularly outside traditional data center environments. We discuss the applicability of existing software and management infrastructure to support exotic deployment scenarios and anticipate the next generation of SBC. We conclude that the SBC cluster is a new and distinct computational deployment paradigm, which is applicable to a wider range of scenarios than current clusters. It facilitates Internet of Things and Smart City systems and is potentially a game changer in pushing application logic out towards the network edge
Portuguese SKA white book
Sem resumo disponível.publishe
- …