138 research outputs found
A N-dimensional Stochastic Control Algorithm for Electricity Asset Management on PC cluster and Blue Gene Supercomputer
International audienceManagement of French electricity production to control cost while satisfying demand, leads to solve a stochastic optimization problem where the main sources of uncertainty are the demand load, the electricity and fuel market prices, the hydraulicity, and the availability of the thermal production assets. A stochastic dynamic programming method is an interesting solution, but is both CPU and memory consuming. It requires parallelization to achieve speedup and size up, and to deal with a big number of stocks (N) and a big number of uncertainty factors. This paper introduces a distribution of a N-dimension stochastic dynamic programming application, on PC clusters and IBM Blue Gene/L super-computer. It has needed to parallelize input and output file accesses from thousands of processors, to load balance a N-dimension cube of data and computation evolving at each time step, and to compute Monte-Carlo simulations requiring data spread in many separate files managed by different processors. Finally, a successful experiment of a 7-stock problem using up to 8192 processors validates this distribution strategy
AdaBoost Parallelization on PC Clusters with Virtual Shared Memory for Fast Feature Selection
©2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.International audienceFeature selection is a key issue in many machine learning applications and the need to test lots of candidate features is real while computational time required to do so is often huge. In this paper, we introduce a parallel version of the well- known AdaBoost algorithm to speed up and size up feature selection for binary classification tasks using large training datasets and a wide range of elementary features. This parallelization is done without any modification to the AdaBoost algorithm and designed for PC clusters using Java and the JavaSpace distributed framework. JavaSpace is a memory sharing paradigm implemented on top of a virtual shared memory, that appears both efficient and easy-to-use. Results and performances on a face detection system trained with the proposed parallel AdaBoost are presented
FT-GReLoSSS: a Skeletal-Based Approach towards Application Parallelization and Low-Overhead Fault Tolerance
International audienceFT-GReLoSSS (FTG) is a C++/MPI framework to ease the development of fault-tolerant parallel applications belonging to a SPMD family termed GReLoSSS. The originality of FTG is to rely on the MoLOToF programming model principles to facilitate the addition of an efficient checkpoint-based fault tolerance at the application level. Main features of MoLOToF encompass a structured application development based on fault tolerant "skeletons" and lay emphasis on collaborations. The latter exist between the programmer, the framework and the underlying runtime middleware/environment. Together with the structured approach they contribute into achieving reduced checkpoint sizes, as well as reduced checkpoint and recovery overhead at runtime. This paper introduces the main principles of MoLOToF and the design of the FTG framework. To properly assess the framework's ease of use for a programmer as well as fault tolerance efficiency, a series of benchmarks were conducted up to 128 nodes on a multicore PC cluster. These benchmarks involved an existing parallel financial application for gas storage valuation, originally developed in collaboration with EDF company, and a rewritten version which made use of the FTG framework and its features. Experiments results display low-overhead compared to existing system-level counterparts
Resource Centered Computing delivering high parallel performance
International audienceModern parallel programming requires a combination of differentparadigms, expertise and tuning, that correspond to the differentlevels in today's hierarchical architectures. To cope with theinherent difficulty, ORWL (ordered read-write locks) presents a newparadigm and toolbox centered around local or remote resources, suchas data, processors or accelerators. ORWL programmers describe theircomputation in terms of access to these resources during criticalsections. Exclusive or shared access to the resources is grantedthrough FIFOs and with read-write semantic. ORWL partially replaces aclassical runtime and offers a new API for resource centric parallelprogramming. We successfully ran an ORWL benchmark application ondifferent parallel architectures (a multicore CPU cluster, a NUMAmachine, a CPU+GPU cluster). When processing large data we achievedscalability and performance similar to a reference code built on topof MPI+OpenMP+CUDA. The integration of optimized kernels of scientificcomputing libraries (ATLAS and cuBLAS) has been almost effortless, andwe were able to increase performance using both CPU and GPU cores onour hybrid hierarchical cluster simultaneously. We aim to make ORWL anew easy-to-use and efficient programming model and toolbox forparallel developers
Multi-Target Vectorization with MTPS C++ Generic Library
International audienceThis article introduces a C++ template library dedicated at vectorizing algorithms for different target architectures: Multi-Target Parallel Skeleton (MTPS). Skeletons describing the data structures and algorithms are provided and allow MTPS to generate a code with optimized memory access patterns for the choosen architecture. MTPS currently supports x86-64 multicore CPUs and CUDA enabled GPUs. On these architectures, performances close to hardware limits are observed
parXXL: A Fine Grained Development Environment on Coarse Grained Architectures
http://www.hpc2n.umu.se/para06/papers/paper_48.pdfWe present a new integrated environment for cellular computing and other fine grained applications. It is based upon previous developments concerning cellular computing environments (the ParCeL family) and coarse grained algorithms (the SSCRAP toolbox). It is aimed to be portable and efficient, and at the same time to offer a comfortable abstraction for the developer of fine grained programs. A first campaign of benchmarks shows promising results on clusters and mainframes
Impact of Asynchronism on GPU Accelerated Parallel Iterative Computations
International audienceWe study the impact of asynchronism on parallel iterative algorithms in the particular context of local clusters of workstations including GPUs. The application test is a classical PDE problem of advection-diffusion-reaction in 3D. We propose an asynchronous version of a previously developed PDE solver using GPUs for the inner computations. The algorithm is tested with two kinds of clusters, a homogeneous one and a heterogeneous one (with different CPUs and GPUs)
An Interactive Problem Modeller and PDE Solver, Distributed on Large Scale Architectures
http://lifc.univ-fcomte.fr/dfma07/International audienceThis paper introduces a research project and a software environment to speed up and size up problem modeling with Partial Differential Equations (PDE). These PDE are defined from a \Mathematica interface, and are automatically solved by a devoted cellular automata program generated to be run on mainframes, clusters and Grids. Moreover, an interactive and graphic control of the cellular automata running allows to analyze the PDE model relevance. This environment improves large scale simulation usage in early stages of research projects
European Option Princing on a GPU Cluster
Presentation given at 2nd JTE "GPGPU", University Paris-6, 4 décembre 2008, Paris, France.The aim of this presentation is a comparison in terms of speed and energy consumption between CPU and GPU clusters using financial application as a benchmark. After a fast introduction on the field of application we will give details on the hardware and software architectures used. Then we will introduce a multiparadigm parallel algorithm, mixing coarse and fine grained parallelism, and its implementations using MPI+OpenMP on CPU cluster and MPI+CUDA on GPU cluster. Finally, some computing and energetic performances on different clusters will be compared
Aide à la parallélisation des réseaux connexionnistes
Colloque avec actes et comité de lecture.Des architectures informatiques parallèles commencent à être accessibles aujourd'hui. Elles peuvent se révéler intéressantes pour le développement de modèles connexionnistes. Outre une présentation rapide du parallélisme, nous présentons dans cet article notre approche de la parallélisation des réseaux de neurones artificiels. Nous cherchons à développer une bibliothèque d'outils logiciels permettant, à terme, aux connexionnistes de profiter des avantages du parallélisme sans connaissances approfondies de ce domaine. Nous présentons donc les différentes étapes ayant guidées nos choix stratégiques, l'avancement actuel de notre réflexion, et un exemple d'implantation parallèle d'un réseau de neurones artificiels
- …