67,519 research outputs found
Checkpointing as a Service in Heterogeneous Cloud Environments
A non-invasive, cloud-agnostic approach is demonstrated for extending
existing cloud platforms to include checkpoint-restart capability. Most cloud
platforms currently rely on each application to provide its own fault
tolerance. A uniform mechanism within the cloud itself serves two purposes: (a)
direct support for long-running jobs, which would otherwise require a custom
fault-tolerant mechanism for each application; and (b) the administrative
capability to manage an over-subscribed cloud by temporarily swapping out jobs
when higher priority jobs arrive. An advantage of this uniform approach is that
it also supports parallel and distributed computations, over both TCP and
InfiniBand, thus allowing traditional HPC applications to take advantage of an
existing cloud infrastructure. Additionally, an integrated health-monitoring
mechanism detects when long-running jobs either fail or incur exceptionally low
performance, perhaps due to resource starvation, and proactively suspends the
job. The cloud-agnostic feature is demonstrated by applying the implementation
to two very different cloud platforms: Snooze and OpenStack. The use of a
cloud-agnostic architecture also enables, for the first time, migration of
applications from one cloud platform to another.Comment: 20 pages, 11 figures, appears in CCGrid, 201
Machine Learning Based Auto-tuning for Enhanced OpenCL Performance Portability
Heterogeneous computing, which combines devices with different architectures,
is rising in popularity, and promises increased performance combined with
reduced energy consumption. OpenCL has been proposed as a standard for
programing such systems, and offers functional portability. It does, however,
suffer from poor performance portability, code tuned for one device must be
re-tuned to achieve good performance on another device. In this paper, we use
machine learning-based auto-tuning to address this problem. Benchmarks are run
on a random subset of the entire tuning parameter configuration space, and the
results are used to build an artificial neural network based model. The model
can then be used to find interesting parts of the parameter space for further
search. We evaluate our method with different benchmarks, on several devices,
including an Intel i7 3770 CPU, an Nvidia K40 GPU and an AMD Radeon HD 7970
GPU. Our model achieves a mean relative error as low as 6.1%, and is able to
find configurations as little as 1.3% worse than the global minimum.Comment: This is a pre-print version an article to be published in the
Proceedings of the 2015 IEEE International Parallel and Distributed
Processing Symposium Workshops (IPDPSW). For personal use onl
Efficient Generation of Parallel Spin-images Using Dynamic Loop Scheduling
High performance computing (HPC) systems underwent a significant increase in
their processing capabilities. Modern HPC systems combine large numbers of
homogeneous and heterogeneous computing resources. Scalability is, therefore,
an essential aspect of scientific applications to efficiently exploit the
massive parallelism of modern HPC systems. This work introduces an efficient
version of the parallel spin-image algorithm (PSIA), called EPSIA. The PSIA is
a parallel version of the spin-image algorithm (SIA). The (P)SIA is used in
various domains, such as 3D object recognition, categorization, and 3D face
recognition. EPSIA refers to the extended version of the PSIA that integrates
various well-known dynamic loop scheduling (DLS) techniques. The present work:
(1) Proposes EPSIA, a novel flexible version of PSIA; (2) Showcases the
benefits of applying DLS techniques for optimizing the performance of the PSIA;
(3) Assesses the performance of the proposed EPSIA by conducting several
scalability experiments. The performance results are promising and show that
using well-known DLS techniques, the performance of the EPSIA outperforms the
performance of the PSIA by a factor of 1.2 and 2 for homogeneous and
heterogeneous computing resources, respectively
Distributed computing methodology for training neural networks in an image-guided diagnostic application
Distributed computing is a process through which a set of computers connected by a network is used collectively to solve a single problem. In this paper, we propose a distributed computing methodology for training neural networks for the detection of lesions in colonoscopy. Our approach is based on partitioning the training set across multiple processors using a parallel virtual machine. In this way, interconnected computers of varied architectures can be used for the distributed evaluation of the error function and gradient values, and, thus, training neural networks utilizing various learning methods. The proposed methodology has large granularity and low synchronization, and has been implemented and tested. Our results indicate that the parallel virtual machine implementation of the training algorithms developed leads to considerable speedup, especially when large network architectures and training sets are used
- âŠ