67,519 research outputs found

    Checkpointing as a Service in Heterogeneous Cloud Environments

    Get PDF
    A non-invasive, cloud-agnostic approach is demonstrated for extending existing cloud platforms to include checkpoint-restart capability. Most cloud platforms currently rely on each application to provide its own fault tolerance. A uniform mechanism within the cloud itself serves two purposes: (a) direct support for long-running jobs, which would otherwise require a custom fault-tolerant mechanism for each application; and (b) the administrative capability to manage an over-subscribed cloud by temporarily swapping out jobs when higher priority jobs arrive. An advantage of this uniform approach is that it also supports parallel and distributed computations, over both TCP and InfiniBand, thus allowing traditional HPC applications to take advantage of an existing cloud infrastructure. Additionally, an integrated health-monitoring mechanism detects when long-running jobs either fail or incur exceptionally low performance, perhaps due to resource starvation, and proactively suspends the job. The cloud-agnostic feature is demonstrated by applying the implementation to two very different cloud platforms: Snooze and OpenStack. The use of a cloud-agnostic architecture also enables, for the first time, migration of applications from one cloud platform to another.Comment: 20 pages, 11 figures, appears in CCGrid, 201

    Machine Learning Based Auto-tuning for Enhanced OpenCL Performance Portability

    Full text link
    Heterogeneous computing, which combines devices with different architectures, is rising in popularity, and promises increased performance combined with reduced energy consumption. OpenCL has been proposed as a standard for programing such systems, and offers functional portability. It does, however, suffer from poor performance portability, code tuned for one device must be re-tuned to achieve good performance on another device. In this paper, we use machine learning-based auto-tuning to address this problem. Benchmarks are run on a random subset of the entire tuning parameter configuration space, and the results are used to build an artificial neural network based model. The model can then be used to find interesting parts of the parameter space for further search. We evaluate our method with different benchmarks, on several devices, including an Intel i7 3770 CPU, an Nvidia K40 GPU and an AMD Radeon HD 7970 GPU. Our model achieves a mean relative error as low as 6.1%, and is able to find configurations as little as 1.3% worse than the global minimum.Comment: This is a pre-print version an article to be published in the Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). For personal use onl

    Efficient Generation of Parallel Spin-images Using Dynamic Loop Scheduling

    Get PDF
    High performance computing (HPC) systems underwent a significant increase in their processing capabilities. Modern HPC systems combine large numbers of homogeneous and heterogeneous computing resources. Scalability is, therefore, an essential aspect of scientific applications to efficiently exploit the massive parallelism of modern HPC systems. This work introduces an efficient version of the parallel spin-image algorithm (PSIA), called EPSIA. The PSIA is a parallel version of the spin-image algorithm (SIA). The (P)SIA is used in various domains, such as 3D object recognition, categorization, and 3D face recognition. EPSIA refers to the extended version of the PSIA that integrates various well-known dynamic loop scheduling (DLS) techniques. The present work: (1) Proposes EPSIA, a novel flexible version of PSIA; (2) Showcases the benefits of applying DLS techniques for optimizing the performance of the PSIA; (3) Assesses the performance of the proposed EPSIA by conducting several scalability experiments. The performance results are promising and show that using well-known DLS techniques, the performance of the EPSIA outperforms the performance of the PSIA by a factor of 1.2 and 2 for homogeneous and heterogeneous computing resources, respectively

    Distributed computing methodology for training neural networks in an image-guided diagnostic application

    Get PDF
    Distributed computing is a process through which a set of computers connected by a network is used collectively to solve a single problem. In this paper, we propose a distributed computing methodology for training neural networks for the detection of lesions in colonoscopy. Our approach is based on partitioning the training set across multiple processors using a parallel virtual machine. In this way, interconnected computers of varied architectures can be used for the distributed evaluation of the error function and gradient values, and, thus, training neural networks utilizing various learning methods. The proposed methodology has large granularity and low synchronization, and has been implemented and tested. Our results indicate that the parallel virtual machine implementation of the training algorithms developed leads to considerable speedup, especially when large network architectures and training sets are used
    • 

    corecore