Search CORE

1,265,558 research outputs found

Process-Oriented Parallel Programming with an Application to Data-Intensive Computing

Author: Givelberg Edward
Publication venue
Publication date: 21/07/2014
Field of study

We introduce process-oriented programming as a natural extension of object-oriented programming for parallel computing. It is based on the observation that every class of an object-oriented language can be instantiated as a process, accessible via a remote pointer. The introduction of process pointers requires no syntax extension, identifies processes with programming objects, and enables processes to exchange information simply by executing remote methods. Process-oriented programming is a high-level language alternative to multithreading, MPI and many other languages, environments and tools currently used for parallel computations. It implements natural object-based parallelism using only minimal syntax extension of existing languages, such as C++ and Python, and has therefore the potential to lead to widespread adoption of parallel programming. We implemented a prototype system for running processes using C++ with MPI and used it to compute a large three-dimensional Fourier transform on a computer cluster built of commodity hardware components. Three-dimensional Fourier transform is a prototype of a data-intensive application with a complex data-access pattern. The process-oriented code is only a few hundred lines long, and attains very high data throughput by achieving massive parallelism and maximizing hardware utilization.Comment: 20 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Enhancing Job Scheduling of an Atmospheric Intensive Data Application

Author: Caragnano G.
Goga K.
Mossucca L.
Ruiu Pietro
Terzo Olivier
Publication venue: International Journal On Advances in Telecommunications, IARIA
Publication date: 01/01/2012
Field of study

Nowadays, e-Science applications involve great deal of data to have more accurate analysis. One of its application domains is the Radio Occultation which manages satellite data. Grid Processing Management is a physical infrastructure geographically distributed based on Grid Computing, that is implemented for the overall processing Radio Occultation analysis. After a brief description of algorithms adopted to characterize atmospheric profiles, the paper presents an improvement of job scheduling in order to decrease processing time and optimize resource utilization. Extension of grid computing capacity is implemented by virtual machines in existing physical Grid in order to satisfy temporary job requests. Also scheduling plays an important role in the infrastructure that is handled by a couple of schedulers which are developed to manage data automaticall

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Spark deployment and performance evaluation on the MareNostrum supercomputer

Author: Ayguadé Parra Eduard
Becerra Fontal Yolanda
Carrera Pérez David
Girona Turell Sergi
Gounaris Anastasios
Labarta Mancho Jesús José
Torres Viñals Jordi
Tous Liesa Rubén
Tripiana Carlos
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

In this paper we present a framework to enable data-intensive Spark workloads on MareNostrum, a petascale supercomputer designed mainly for compute-intensive applications. As far as we know, this is the first attempt to investigate optimized deployment configurations of Spark on a petascale HPC setup. We detail the design of the framework and present some benchmark data to provide insights into the scalability of the system. We examine the impact of different configurations including parallelism, storage and networking alternatives, and we discuss several aspects in executing Big Data workloads on a computing system that is based on the compute-centric paradigm. Further, we derive conclusions aiming to pave the way towards systematic and optimized methodologies for fine-tuning data-intensive application on large clusters emphasizing on parallelism configurations.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

UPCommons (Universitat Politècnica de Catalunya)

A Cost-Benefit Study of Doing Astrophysics On The Cloud: Production of Image Mosaics

Author: Berriman G. B.
Deelman E.
Good J. C.
Livny M.
Singh G.
Publication venue: 'Astronomical Society of the Pacific Conference Series'
Publication date: 01/01/2009
Field of study

Utility grids such as the Amazon EC2 and Amazon S3 clouds offer computational and storage resources that can be used on-demand for a fee by compute- and data-intensive applications. The cost of running an application on such a cloud depends on the compute, storage and communication resources it will provision and consume. Different execution plans of the same application may result in significantly different costs. We studied via simulation the cost performance trade-offs of different execution and resource provisioning plans by creating, under the Amazon cloud fee structure, mosaics with the Montage image mosaic engine, a widely used data- and compute-intensive application. Specifically, we studied the cost of building mosaics of 2MASS data that have sizes of 1, 2 and 4 square degrees, and a 2MASS all-sky mosaic. These are examples of mosaics commonly generated by astronomers. We also study these trade-offs in the context of the storage and communication fees of Amazon S3 when used for long-term application data archiving. Our results show that by provisioning the right amount of storage and compute resources cost can be significantly reduced with no significant impact on application performance

Caltech Authors

Elementary transformation analysis for Array-OL

Author: Feautrier Paul
Publication venue
Publication date: 01/01/2007
Field of study

Array-OL is a high-level specification language dedicated to the definition of intensive signal processing applications. Several tools exist for implementing an Array-OL specification as a data parallel program. While Array-OL can be used directly, it is often convenient to be able to deduce part of the specification from a sequential version of the application. This paper proposes such an analysis and examines its feasibility and its limits

arXiv.org e-Print Archive

HAL-ENS-LYON

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot