9,366 research outputs found
Democratizing Production-Scale Distributed Deep Learning
The interest and demand for training deep neural networks have been
experiencing rapid growth, spanning a wide range of applications in both
academia and industry. However, training them distributed and at scale remains
difficult due to the complex ecosystem of tools and hardware involved. One
consequence is that the responsibility of orchestrating these complex
components is often left to one-off scripts and glue code customized for
specific problems. To address these restrictions, we introduce \emph{Alchemist}
- an internal service built at Apple from the ground up for \emph{easy},
\emph{fast}, and \emph{scalable} distributed training. We discuss its design,
implementation, and examples of running different flavors of distributed
training. We also present case studies of its internal adoption in the
development of autonomous systems, where training times have been reduced by
10x to keep up with the ever-growing data collection
Aneka: A Software Platform for .NET-based Cloud Computing
Aneka is a platform for deploying Clouds developing applications on top of
it. It provides a runtime environment and a set of APIs that allow developers
to build .NET applications that leverage their computation on either public or
private clouds. One of the key features of Aneka is the ability of supporting
multiple programming models that are ways of expressing the execution logic of
applications by using specific abstractions. This is accomplished by creating a
customizable and extensible service oriented runtime environment represented by
a collection of software containers connected together. By leveraging on these
architecture advanced services including resource reservation, persistence,
storage management, security, and performance monitoring have been implemented.
On top of this infrastructure different programming models can be plugged to
provide support for different scenarios as demonstrated by the engineering,
life science, and industry applications.Comment: 30 pages, 10 figure
On-Demand Virtual Research Environments using Microservices
The computational demands for scientific applications are continuously
increasing. The emergence of cloud computing has enabled on-demand resource
allocation. However, relying solely on infrastructure as a service does not
achieve the degree of flexibility required by the scientific community. Here we
present a microservice-oriented methodology, where scientific applications run
in a distributed orchestration platform as software containers, referred to as
on-demand, virtual research environments. The methodology is vendor agnostic
and we provide an open source implementation that supports the major cloud
providers, offering scalable management of scientific pipelines. We demonstrate
applicability and scalability of our methodology in life science applications,
but the methodology is general and can be applied to other scientific domains
MaRe: a MapReduce-Oriented Framework for Processing Big Data with Application Containers
Background. Life science is increasingly driven by Big Data analytics, and
the MapReduce programming model has been proven successful for data-intensive
analyses. However, current MapReduce frameworks offer poor support for reusing
existing processing tools in bioinformatics pipelines. Further, these
frameworks do not have native support for application containers, which are
becoming popular in scientific data processing.
Results. Here we present MaRe, a programming model with an associated
open-source implementation, which introduces support for application containers
in MapReduce. MaRe is based on Apache Spark and Docker, the MapReduce framework
and container engine that have collected the largest open source community,
thus providing interoperability with the cutting-edge software ecosystem. We
demonstrate MaRe on two data-intensive applications in life science, showing
ease of use and scalability.
Conclusions. MaRe enables scalable data-intensive processing in life science
with MapReduce and application containers. When compared with current best
practices, that involve the use of workflow systems, MaRe has the advantage of
providing data locality, ingestion from heterogeneous storage systems and
interactive processing. MaRe is generally-applicable and available as open
source software
NSML: Meet the MLaaS platform with a real-world case study
The boom of deep learning induced many industries and academies to introduce
machine learning based approaches into their concern, competitively. However,
existing machine learning frameworks are limited to sufficiently fulfill the
collaboration and management for both data and models. We proposed NSML, a
machine learning as a service (MLaaS) platform, to meet these demands. NSML
helps machine learning work be easily launched on a NSML cluster and provides a
collaborative environment which can afford development at enterprise scale.
Finally, NSML users can deploy their own commercial services with NSML cluster.
In addition, NSML furnishes convenient visualization tools which assist the
users in analyzing their work. To verify the usefulness and accessibility of
NSML, we performed some experiments with common examples. Furthermore, we
examined the collaborative advantages of NSML through three competitions with
real-world use cases
The ISTI Rapid Response on Exploring Cloud Computing 2018
This report describes eighteen projects that explored how commercial cloud
computing services can be utilized for scientific computation at national
laboratories. These demonstrations ranged from deploying proprietary software
in a cloud environment to leveraging established cloud-based analytics
workflows for processing scientific datasets. By and large, the projects were
successful and collectively they suggest that cloud computing can be a valuable
computational resource for scientific computation at national laboratories
Multiple Workflows Scheduling in Multi-tenant Distributed Systems: A Taxonomy and Future Directions
The workflow is a general notion representing the automated processes along
with the flow of data. The automation ensures the processes being executed in
the order. Therefore, this feature attracts users from various background to
build the workflow. However, the computational requirements are enormous and
investing for a dedicated infrastructure for these workflows is not always
feasible. To cater to the broader needs, multi-tenant platforms for executing
workflows were began to be built. In this paper, we identify the problems and
challenges in the multiple workflows scheduling that adhere to the platforms.
We present a detailed taxonomy from the existing solutions on scheduling and
resource provisioning aspects followed by the survey of relevant works in this
area. We open up the problems and challenges to shove up the research on
multiple workflows scheduling in multi-tenant distributed systems.Comment: Several changes has been done based on reviewers' comments after
first round review. This is a pre-print for paper (currently under second
round review) submitted to ACM Computing Survey
ECHO: An Adaptive Orchestration Platform for Hybrid Dataflows across Cloud and Edge
The Internet of Things (IoT) is offering unprecedented observational data
that are used for managing Smart City utilities. Edge and Fog gateway devices
are an integral part of IoT deployments to acquire real-time data and enact
controls. Recently, Edge-computing is emerging as first-class paradigm to
complement Cloud-centric analytics. But a key limitation is the lack of a
platform-as-a-service for applications spanning Edge and Cloud. Here, we
propose ECHO, an orchestration platform for dataflows across distributed
resources. ECHO's hybrid dataflow composition can operate on diverse data
models -- streams, micro-batches and files, and interface with native runtime
engines like TensorFlow and Storm to execute them. It manages the application's
lifecycle, including container-based deployment and a registry for state
management. ECHO can schedule the dataflow on different Edge, Fog and Cloud
resources, and also perform dynamic task migration between resources. We
validate the ECHO platform for executing video analytics and sensor streams for
Smart Traffic and Smart Utility applications on Raspberry Pi, NVidia TX1, ARM64
and Azure Cloud VM resources, and present our results.Comment: 17 pages, 5 figures, 2 tables, submitted to ICSOC-201
On Energy Efficiency and Performance Evaluation of SBC based Clusters: A Hadoop case study
Energy efficiency in a data center is a challenge and has garnered
researchers interest. In this paper we address the energy efficiency issue of a
small scale data center by utilizing Single Board Computer (SBC) based
clusters. A compact design layout is presented to build two clusters using 20
nodes each. Extensive testing was carried out to analyze the performance of
these clusters using popular performance benchmarks for task execution time,
memory/storage utilization, network throughput and energy consumption. Further,
we investigate the cost of operating SBC based clusters by correlating energy
utilization for the execution time of various benchmarks using workloads of
different sizes. Results show that, although the low-cost benefit of a cluster
built with ARM-based SBCs is desirable, these clusters yield low comparable
performance and energy efficiency due to limited onboard capabilities. It is
possible to tweak Hadoop configuration parameters for an ARM-based SBC cluster
to efficiently utilize resources. We present, a discussion on the effectiveness
of the SBC-based clusters as a testbed for inexpensive and green cloud
computing research.Comment: 12 pages. Submitted to Electronics Journa
Hoard: A Distributed Data Caching System to Accelerate Deep Learning Training on the Cloud
Deep Learning system architects strive to design a balanced system where the
computational accelerator -- FPGA, GPU, etc, is not starved for data. Feeding
training data fast enough to effectively keep the accelerator utilization high
is difficult when utilizing dedicated hardware like GPUs. As accelerators are
getting faster, the storage media \& data buses feeding the data have not kept
pace and the ever increasing size of training data further compounds the
problem. We describe the design and implementation of a distributed caching
system called Hoard that stripes the data across fast local disks of multiple
GPU nodes using a distributed file system that efficiently feeds the data to
ensure minimal degradation in GPU utilization due to I/O starvation. Hoard can
cache the data from a central storage system before the start of the job or
during the initial execution of the job and feeds the cached data for
subsequent epochs of the same job and for different invocations of the jobs
that share the same data requirements, e.g. hyper-parameter tuning. Hoard
exposes a POSIX file system interface so the existing deep learning frameworks
can take advantage of the cache without any modifications. We show that Hoard,
using two NVMe disks per node and a distributed file system for caching,
achieves a 2.1x speed-up over a 10Gb/s NFS central storage system on a 16 GPU
(4 nodes, 4 GPUs per node) cluster for a challenging AlexNet ImageNet image
classification benchmark with 150GB of input dataset. As a result of the
caching, Hoard eliminates the I/O bottlenecks introduced by the shared storage
and increases the utilization of the system by 2x compared to using the shared
storage without the cache.Comment: 12 pages, 5 figure
- …