Search CORE

611,005 research outputs found

ACOTES project: Advanced compiler technologies for embedded streaming

Author: Albert Cohen
Alex Ramírez
Andrea Ornstein
Antoniu Pop
Ayal Zaks
Cupertino Miranda
Cédric Bastoul
David Ródenas
Dorit Nuzman
E. Blossom
E.A. Lee
Eduard Ayguadé
Erven Rohou
Harm Munk
Ira Rosen
J. Hoogerbrugge
Konrad Trifunović
Louis-Noël Pouchet
M. Gschwind
M. Wolfe
Marc Duranton
Marco Cornero
Menno Lindwer
Mohammed Fellahi
Paul Carpenter
Philippe Dumont
R. Allen
R.G. Scarborough
Razya Ladelsky
Roger Ferrer
S. Campanoni
Sebastian Pop
Uzi Shvadron
Xavier Martorell
Zbigniew Chamski
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Streaming applications are built of data-driven, computational components, consuming and producing unbounded data streams. Streaming oriented systems have become dominant in a wide range of domains, including embedded applications and DSPs. However, programming efficiently for streaming architectures is a challenging task, having to carefully partition the computation and map it to processes in a way that best matches the underlying streaming architecture, taking into account the distributed resources (memory, processing, real-time requirements) and communication overheads (processing and delay). These challenges have led to a number of suggested solutions, whose goal is to improve the programmer’s productivity in developing applications that process massive streams of data on programmable, parallel embedded architectures. StreamIt is one such example. Another more recent approach is that developed by the ACOTES project (Advanced Compiler Technologies for Embedded Streaming). The ACOTES approach for streaming applications consists of compiler-assisted mapping of streaming tasks to highly parallel systems in order to maximize cost-effectiveness, both in terms of energy and in terms of design effort. The analysis and transformation techniques automate large parts of the partitioning and mapping process, based on the properties of the application domain, on the quantitative information about the target systems, and on programmer directives. This paper presents the outcomes of the ACOTES project, a 3-year collaborative work of industrial (NXP, ST, IBM, Silicon Hive, NOKIA) and academic (UPC, INRIA, MINES ParisTech) partners, and advocates the use of Advanced Compiler Technologies that we developed to support Embedded Streaming.Peer ReviewedPostprint (published version

HAL-CentraleSupelec

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

UPCommons. Portal del coneixement obert de la UPC

INRIA a CCSD electronic archive server

HAL-MINES ParisTech

The University of Manchester - Institutional Repository

HAL-Rennes 1

The symmetric-Toeplitz linear system problem in parallel

Author: C.V. Loan
D.J. Evans
E. Anderson
G. Heinig
I. Gohberg
I. Gohberg
K. Gallivan
L. Blackford
P. Alonso
T. Kailath
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

[EN] Many algorithms exist that exploit the special structure of Toeplitz matrices for solving linear systems. Nevertheless, these algorithms are difficult to parallelize due to its lower computational cost and the great dependency of the operations involved that produces a great communication cost. The foundation of the parallel algorithm presented in this paper consists of transforming the Toeplitz matrix into a another structured matrix called Cauchy¿like. The particular properties of Cauchy¿like matrices are exploited in order to obtain two levels of parallelism that makes possible to highly reduce the execution time. The experimental results were obtained in a cluster of PC¿s.Supported by Spanish MCYT and FEDER under Grant TIC 2003-08238-C02-02Alonso-Jordá, P.; Vidal Maciá, AM. (2005). The symmetric-Toeplitz linear system problem in parallel. Computational Science -- ICCS 2005,Pt 1, Proceedings. 3514:220-228. https://doi.org/10.1007/11428831_28S2202283514Sweet, D.R.: The use of linear-time systolic algorithms for the solution of toeplitz problems. k Technical Report JCU-CS-91/1, Department of Computer Science, James Cook University, Tue, 23 April 1996 15, 17, 55 GMT (1991)Evans, D.J., Oka, G.: Parallel solution of symmetric positive definite Toeplitz systems. Parallel Algorithms and Applications 12, 297–303 (1998)Gohberg, I., Koltracht, I., Averbuch, A., Shoham, B.: Timing analysis of a parallel algorithm for Toeplitz matrices on a MIMD parallel machine. Parallel Computing 17, 563–577 (1991)Gallivan, K., Thirumalai, S., Dooren, P.V.: On solving block toeplitz systems using a block schur algorithm. In: Proceedings of the 23rd International Conference on Parallel Processing, Boca Raton, FL, USA, vol. 3, pp. 274–281. CRC Press, Boca Raton (1994)Thirumalai, S.: High performance algorithms to solve Toeplitz and block Toeplitz systems. Ph.d. th., Grad. College of the U. of Illinois at Urbana–Champaign (1996)Alonso, P., Badía, J.M., Vidal, A.M.: Parallel algorithms for the solution of toeplitz systems of linear equations. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Waśniewski, J. (eds.) PPAM 2004. LNCS, vol. 3019, pp. 969–976. Springer, Heidelberg (2004)Anderson, E., et al.: LAPACK Users’ Guide. SIAM, Philadelphia (1995)Blackford, L., et al.: ScaLAPACK Users’ Guide. SIAM, Philadelphia (1997)Alonso, P., Badía, J.M., González, A., Vidal, A.M.: Parallel design of multichannel inverse filters for audio reproduction. In: Parallel and Distributed Computing and Systems, IASTED, Marina del Rey, CA, USA, vol. II, pp. 719–724 (2003)Loan, C.V.: Computational Frameworks for the Fast Fourier Transform. SIAM Press, Philadelphia (1992)Heinig, G.: Inversion of generalized Cauchy matrices and other classes of structured matrices. Linear Algebra and Signal Proc., IMA, Math. Appl. 69, 95–114 (1994)Gohberg, I., Kailath, T., Olshevsky, V.: Fast Gaussian elimination with partial pivoting for matrices with displacement structure. Mathematics of Computation 64, 1557–1576 (1995)Alonso, P., Vidal, A.M.: An efficient and stable parallel solution for symmetric toeplitz linear systems. TR DSIC-II/2005, DSIC–Univ. Polit. Valencia (2005)Kailath, T., Sayed, A.H.: Displacement structure: Theory and applications. SIAM Review 37, 297–386 (1995

Crossref

RiuNet

Recommended from our members

Preparing sparse solvers for exascale computing.

Author: Anzt Hartwig
Boman Erik
Curfman McInnes Lois
Falgout Rob
Ghysels Pieter
Heroux Michael
Li Xiaoye
Meier Yang Ulrike
Rajamanickam Sivasankaran
Rupp Karl
Smith Barry
Tran Mills Richard
Yamazaki Ichitaro
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

eScholarship - University of California

Efficient HTTP based I/O on very large datasets for high performance computing with the libdavix library

Author: C Coarfa
DC Ster van der
F Furano
I Antcheva
I Vukotic
J Heidemann
J Martin
JC Anderson
K Jackson
R Battle
R Buyya
T White
Publication venue
Publication date: 15/10/2014
Field of study

Remote data access for data analysis in high performance computing is commonly done with specialized data access protocols and storage systems. These protocols are highly optimized for high throughput on very large datasets, multi-streams, high availability, low latency and efficient parallel I/O. The purpose of this paper is to describe how we have adapted a generic protocol, the Hyper Text Transport Protocol (HTTP) to make it a competitive alternative for high performance I/O and data analysis applications in a global computing grid: the Worldwide LHC Computing Grid. In this work, we first analyze the design differences between the HTTP protocol and the most common high performance I/O protocols, pointing out the main performance weaknesses of HTTP. Then, we describe in detail how we solved these issues. Our solutions have been implemented in a toolkit called davix, available through several recent Linux distributions. Finally, we describe the results of our benchmarks where we compare the performance of davix against a HPC specific protocol for a data analysis use case.Comment: Presented at: Very large Data Bases (VLDB) 2014, Hangzho

arXiv.org e-Print Archive

Crossref

FooPar: A Functional Object Oriented Parallel Framework in Scala

Author: A Grama
A Grama
E Dekel
F Darema
GL Taboada
K Hwang
M Odersky
MJ Quinn
R Loogen
SJ Thompson
V Kumar
Publication venue
Publication date: 13/06/2013
Field of study

We present FooPar, an extension for highly efficient Parallel Computing in the multi-paradigm programming language Scala. Scala offers concise and clean syntax and integrates functional programming features. Our framework FooPar combines these features with parallel computing techniques. FooPar is designed modular and supports easy access to different communication backends for distributed memory architectures as well as high performance math libraries. In this article we use it to parallelize matrix matrix multiplication and show its scalability by a isoefficiency analysis. In addition, results based on a empirical analysis on two supercomputers are given. We achieve close-to-optimal performance wrt. theoretical peak performance. Based on this result we conclude that FooPar allows to fully access Scala's design features without suffering from performance drops when compared to implementations purely based on C and MPI

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Southern Denmark Research Output

Multiple strategy process migration

Author: De Paoli Damien.
Publication venue: Deakin University, Faculty of Science and Technology, School of Computing and Mathematics
Publication date: 01/01/1996
Field of study

The future of computing lies with distributed systems, i.e. a network of workstations controlled by a modern distributed operating system. By supporting load balancing and parallel execution, the overall performance of a distributed system can be improved dramatically. Process migration, the act of moving a running process from a highly loaded machine to a lightly loaded machine, could be used to support load balancing, parallel execution, reliability etc. This thesis identifies the problems past process migration facilities have had and determines the possible differing strategies that can be used to resolve these problems. The result of this analysis has led to a new design philosophy. This philosophy requires the design of a process migration facility and the design of an operating system to be conducted in parallel. Modern distributed operating systems follow the microkernel and client/server paradigms. Applying these design paradigms, in conjunction with the requirements of both process migration and a distributed operating system, results in a system where each resource is controlled by a separate server process. However, a process is a complex resource composed of simple resources such as data structures, an address space and communication state. For this reason, a process migration facility does not directly migrate the resources of a process. Instead, it requests the appropriate servers to transfer the resources. This novel solution yields a modular, high performance facility that is easy to create, debug and maintain. Furthermore, the design easily incorporates providing multiple migration strategies. In order to verify the validity of this design, a process migration facility was developed and tested within RHODOS (ResearcH Oriented Distributed Operating System). RHODOS is a modern microkernel and client/server based distributed operating system. In RHODOS, a process is composed of at least three separate resources: process state - maintained by a process manager, address space - maintained by a memory manager and communication state - maintained by an InterProcess Communication Manager (IPCM). The RHODOS multiple strategy migration manager utilises the services of the process, memory and IPC Managers to migrate the resources of a process. Performance testing of this facility indicates that this design is as fast or better than existing systems which use faster hardware. Furthermore, by studying the results of the performance test ing, the conditions under which a particular strategy should be employed have been identified. This thesis also addresses heterogeneous process migration. The current trend is to have islands of homogeneous workstations amid a sea of heterogeneity. From this situation and the current literature on the topic, heterogeneous process migration can be seen as too inefficient for general use. Instead, only homogeneous workstations should be used for process migration. This implies a need to locate homogeneous workstations. Entities called traders, which store and disseminate knowledge about the resources of several workstations, should be used to provide resource discovery. Resource discovery will enable the detection of homogeneous workstations to which processes can be migrated

Deakin Research Online

A runtime heuristic to selectively replicate tasks for application-specific reliability targets

Author: Labarta Mancho Jesús José
Subasi Omer
Unsal Osman Sabri
Yalcin Gulay
Zyulkyarov Ferad
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

In this paper we propose a runtime-based selective task replication technique for task-parallel high performance computing applications. Our selective task replication technique is automatic and does not require modification/recompilation of OS, compiler or application code. Our heuristic, we call App_FIT, selects tasks to replicate such that the specified reliability target for an application is achieved. In our experimental evaluation, we show that App FIT selective replication heuristic is low-overhead and highly scalable. In addition, results indicate that complete task replication is overkill for achieving reliability targets. We show that with App FIT, we can tolerate pessimistic exascale error rates with only 53% of the tasks being replicated.This work was supported by FI-DGR 2013 scholarship and the European Community’s Seventh Framework Programme [FP7/2007-2013] under the Mont-blanc 2 Project (www.montblanc-project.eu), grant agreement no. 610402 and in part by the European Union (FEDER funds) under contract TIN2015-65316-P.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

RELEASE: A High-level Paradigm for Reliable Large-scale Server Software

Author: Chechina Natalia
Trinder Phil
Publication venue
Publication date: 01/01/2012
Field of study

Erlang is a functional language with a much-emulated model for building reliable distributed systems. This paper outlines the RELEASE project, and describes the progress in the rst six months. The project aim is to scale the Erlang's radical concurrency-oriented programming paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines. Currently Erlang has inherently scalable computation and reliability models, but in practice scalability is constrained by aspects of the language and virtual machine. We are working at three levels to address these challenges: evolving the Erlang virtual machine so that it can work effectively on large scale multicore systems; evolving the language to Scalable Distributed (SD) Erlang; developing a scalable Erlang infrastructure to integrate multiple, heterogeneous clusters. We are also developing state of the art tools that allow programmers to understand the behaviour of massively parallel SD Erlang programs. We will demonstrate the e ectiveness of the RELEASE approach using demonstrators and two large case studies on a Blue Gene

Enlighten