226 research outputs found
AutoParallel: A Python module for automatic parallelization and distributed execution of affine loop nests
The last improvements in programming languages, programming models, and
frameworks have focused on abstracting the users from many programming issues.
Among others, recent programming frameworks include simpler syntax, automatic
memory management and garbage collection, which simplifies code re-usage
through library packages, and easily configurable tools for deployment. For
instance, Python has risen to the top of the list of the programming languages
due to the simplicity of its syntax, while still achieving a good performance
even being an interpreted language. Moreover, the community has helped to
develop a large number of libraries and modules, tuning them to obtain great
performance.
However, there is still room for improvement when preventing users from
dealing directly with distributed and parallel computing issues. This paper
proposes and evaluates AutoParallel, a Python module to automatically find an
appropriate task-based parallelization of affine loop nests to execute them in
parallel in a distributed computing infrastructure. This parallelization can
also include the building of data blocks to increase task granularity in order
to achieve a good execution performance. Moreover, AutoParallel is based on
sequential programming and only contains a small annotation in the form of a
Python decorator so that anyone with little programming skills can scale up an
application to hundreds of cores.Comment: Accepted to the 8th Workshop on Python for High-Performance and
Scientific Computing (PyHPC 2018
Towards Automatic Application Migration to Clouds
Porting applications to Clouds is one of the key challenges in software industry. The available approaches to perform this task are basically either services derived from
alliances of major software vendors and Cloud providers focusing on their own products, or small platform providers focusing on the most popular software stacks. For migrating other types of software, the options are limited to infrastructure-as-a-Service (IaaS) solutions which require a lot of programming effort for adapting the software to a Cloud provider’s API. Moreover, if it must be deployed in different providers, new integration procedures must be designed and implemented which could be a nightmare. This paper presents a solution for facilitating the migration of any application to the cloud, inferring the most suitable deployment model for the application and automatically deploying it in the available Cloud providers
Hyperparameter optimization using agents for large scale machine learning
Machine learning (ML) has become an essential tool for humans to get rational predictions in different aspects of their lives. Hyperparameter algorithms are a tool for creating better ML models. The hyperparameter algorithms are an iterative execution of trial sets. Usually, the trials tend to have a different execution time. In this paper we are optimizing the grid and random search with cross-validation from the Dislib [1] an ML library for distributed computing built on top of PyCOMPSs[2] programming model, inspired by the Maggy [3], an open-source framework based on Spark. This optimization will use agents and avoid the trials to wait for each other, achieving a speed-up of over x2.5 compared to the previous implementation
A Programming Model for Hybrid Workflows: combining Task-based Workflows and Dataflows all-in-one
This paper tries to reduce the effort of learning, deploying, and integrating
several frameworks for the development of e-Science applications that combine
simulations with High-Performance Data Analytics (HPDA). We propose a way to
extend task-based management systems to support continuous input and output
data to enable the combination of task-based workflows and dataflows (Hybrid
Workflows from now on) using a single programming model. Hence, developers can
build complex Data Science workflows with different approaches depending on the
requirements. To illustrate the capabilities of Hybrid Workflows, we have built
a Distributed Stream Library and a fully functional prototype extending COMPSs,
a mature, general-purpose, task-based, parallel programming model. The library
can be easily integrated with existing task-based frameworks to provide support
for dataflows. Also, it provides a homogeneous, generic, and simple
representation of object and file streams in both Java and Python; enabling
complex workflows to handle any data type without dealing directly with the
streaming back-end.Comment: Accepted in Future Generation Computer Systems (FGCS). Licensed under
CC-BY-NC-N
Extension of a task-based model to functional programming
Recently, efforts have been made to bring together the areas of high-performance computing (HPC) and massive data processing (Big Data). Traditional HPC frameworks, like COMPSs, are mostly task-based, while popular big-data environments, like Spark, are based on functional programming principles. The earlier are know for their good performance for regular, matrix-based computations; on the other hand, for fine-grained, data-parallel workloads, the later has often been considered more successful. In this paper we present our experience with the integration of some dataflow techniques into COMPSs, a task-based framework, in an effort to bring together the best aspects of both worlds. We present our API, called DDF, which provides a new data abstraction that addresses the challenges of integrating Big Data application scenarios into COMPSs. DDF has a functional-based interface, similar to many Data Science tools, that allows us to use dynamic evaluation to adapt the task execution in runtime. Besides the performance optimization it provides, the API facilitates the development of applications by experts in the application domain. In this paper we evaluate DDF's effectiveness by comparing the resulting programs to their original versions in COMPSs and Spark. The results show that DDF can improve COMPSs execution time and even outperform Spark in many use cases.This work was partially supported by CAPES, CNPq, Fapemig and NIC.BR, and by projects Atmosphere (H2020-EU.2.1.1 777154) and INCT-Cyber.Peer ReviewedPostprint (author's final draft
Service Orchestration on a Heterogeneous Cloud Federation
During the last years, the cloud computing technology has emerged as a new way to obtain computing resources on demand in a very dynamic fashion and only paying for what you consume. Nowadays, there are several hosting providers which follow this approach, offering resources with different capabilities, prices and SLAs. Therefore, depending on the users' preferences and the application requirements, a resource provider can fit better with them than another one. In this paper, we present an architecture for federating clouds, aggregating resources from different providers, deciding which resources and providers are the best for the users' interests, and coordinating the application deployment in the selected resources giving to the user the impression that a single cloud is used
Semantic resource allocation with historical data based predictions
One of the most important issues for Service Providers in Cloud Computing is delivering a good quality of service. This is achieved by means of the adaptation to a changing environment where different failures can occur during the execution of different services and tasks. Some of these failures can be predicted taking into account the information obtained from previous executions. The results of these predictions will help the schedulers to improve the allocation of resources to the
different tasks. In this paper, we present a framework which uses semantically enhanced historical data for predicting the behavior of tasks and resources in the system, and allocating the resources according to these predictions
A Study of Checkpointing in Large Scale Training of Deep Neural Networks
Deep learning (DL) applications are increasingly being deployed on HPC
systems, to leverage the massive parallelism and computing power of those
systems for DL model training. While significant effort has been put to
facilitate distributed training by DL frameworks, fault tolerance has been
largely ignored. In this work, we evaluate checkpoint-restart, a common fault
tolerance technique in HPC workloads. We perform experiments with three
state-of-the-art DL frameworks common in HPC Chainer, PyTorch, and TensorFlow).
We evaluate the computational cost of checkpointing, file formats and file
sizes, the impact of scale, and deterministic checkpointing. Our evaluation
shows some critical differences in checkpoint mechanisms and exposes several
bottlenecks in existing checkpointing implementations. We provide discussion
points that can aid users in selecting a fault-tolerant framework to use in
HPC. We also provide takeaway points that framework developers can use to
facilitate better checkpointing of DL workloads in HPC
Enabling System Wide Shared Memory for Performance Improvement in PyCOMPSs Applications
Python has been gaining some traction for years in the world of scientific applications. However, the high-level abstraction it provides may not allow the developer to use the machines to their peak performance. To address this, multiple strategies, sometimes complementary, have been developed to enrich the software ecosystem either by relying on additional libraries dedicated to efficient computation (e.g., NumPy) or by providing a framework to better use HPC scale infrastructures (e.g., PyCOMPSs).In this paper, we present a Python extension based on SharedArray that enables the support of system-provided shared memory and its integration into the PyCOMPSs programming model as an example of integration to a complex Python environment. We also evaluate the impact such a tool may have on performance in two types of distributed execution-flows, one for linear algebra with a blocked matrix multiplication application and the other in the context of data-clustering with a k-means application. We show that with very little modification of the original decorator (3 lines of code to be modified) of the task-based application the gain in performance can rise above 40% for tasks relying heavily on data reuse on a distributed environment, especially when loading the data is prominent in the execution time.This work was partly funded by the EXPERTISE project (http://www.msca-expertise.eu/), which has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 721865.
BSC authors have also been supported by the Spanish Government through contracts SEV2015-0493 and TIN2015-65316-P, and by Generalitat de Catalunya through contract 2014-SGR-1051.Peer ReviewedPostprint (author's final draft
- …