611,005 research outputs found

    ACOTES project: Advanced compiler technologies for embedded streaming

    Get PDF
    Streaming applications are built of data-driven, computational components, consuming and producing unbounded data streams. Streaming oriented systems have become dominant in a wide range of domains, including embedded applications and DSPs. However, programming efficiently for streaming architectures is a challenging task, having to carefully partition the computation and map it to processes in a way that best matches the underlying streaming architecture, taking into account the distributed resources (memory, processing, real-time requirements) and communication overheads (processing and delay). These challenges have led to a number of suggested solutions, whose goal is to improve the programmer’s productivity in developing applications that process massive streams of data on programmable, parallel embedded architectures. StreamIt is one such example. Another more recent approach is that developed by the ACOTES project (Advanced Compiler Technologies for Embedded Streaming). The ACOTES approach for streaming applications consists of compiler-assisted mapping of streaming tasks to highly parallel systems in order to maximize cost-effectiveness, both in terms of energy and in terms of design effort. The analysis and transformation techniques automate large parts of the partitioning and mapping process, based on the properties of the application domain, on the quantitative information about the target systems, and on programmer directives. This paper presents the outcomes of the ACOTES project, a 3-year collaborative work of industrial (NXP, ST, IBM, Silicon Hive, NOKIA) and academic (UPC, INRIA, MINES ParisTech) partners, and advocates the use of Advanced Compiler Technologies that we developed to support Embedded Streaming.Peer ReviewedPostprint (published version

    The symmetric-Toeplitz linear system problem in parallel

    Full text link
    [EN] Many algorithms exist that exploit the special structure of Toeplitz matrices for solving linear systems. Nevertheless, these algorithms are difficult to parallelize due to its lower computational cost and the great dependency of the operations involved that produces a great communication cost. The foundation of the parallel algorithm presented in this paper consists of transforming the Toeplitz matrix into a another structured matrix called Cauchy¿like. The particular properties of Cauchy¿like matrices are exploited in order to obtain two levels of parallelism that makes possible to highly reduce the execution time. The experimental results were obtained in a cluster of PC¿s.Supported by Spanish MCYT and FEDER under Grant TIC 2003-08238-C02-02Alonso-Jordá, P.; Vidal Maciá, AM. (2005). The symmetric-Toeplitz linear system problem in parallel. Computational Science -- ICCS 2005,Pt 1, Proceedings. 3514:220-228. https://doi.org/10.1007/11428831_28S2202283514Sweet, D.R.: The use of linear-time systolic algorithms for the solution of toeplitz problems. k Technical Report JCU-CS-91/1, Department of Computer Science, James Cook University, Tue, 23 April 1996 15, 17, 55 GMT (1991)Evans, D.J., Oka, G.: Parallel solution of symmetric positive definite Toeplitz systems. Parallel Algorithms and Applications 12, 297–303 (1998)Gohberg, I., Koltracht, I., Averbuch, A., Shoham, B.: Timing analysis of a parallel algorithm for Toeplitz matrices on a MIMD parallel machine. Parallel Computing 17, 563–577 (1991)Gallivan, K., Thirumalai, S., Dooren, P.V.: On solving block toeplitz systems using a block schur algorithm. In: Proceedings of the 23rd International Conference on Parallel Processing, Boca Raton, FL, USA, vol. 3, pp. 274–281. CRC Press, Boca Raton (1994)Thirumalai, S.: High performance algorithms to solve Toeplitz and block Toeplitz systems. Ph.d. th., Grad. College of the U. of Illinois at Urbana–Champaign (1996)Alonso, P., Badía, J.M., Vidal, A.M.: Parallel algorithms for the solution of toeplitz systems of linear equations. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Waśniewski, J. (eds.) PPAM 2004. LNCS, vol. 3019, pp. 969–976. Springer, Heidelberg (2004)Anderson, E., et al.: LAPACK Users’ Guide. SIAM, Philadelphia (1995)Blackford, L., et al.: ScaLAPACK Users’ Guide. SIAM, Philadelphia (1997)Alonso, P., Badía, J.M., González, A., Vidal, A.M.: Parallel design of multichannel inverse filters for audio reproduction. In: Parallel and Distributed Computing and Systems, IASTED, Marina del Rey, CA, USA, vol. II, pp. 719–724 (2003)Loan, C.V.: Computational Frameworks for the Fast Fourier Transform. SIAM Press, Philadelphia (1992)Heinig, G.: Inversion of generalized Cauchy matrices and other classes of structured matrices. Linear Algebra and Signal Proc., IMA, Math. Appl. 69, 95–114 (1994)Gohberg, I., Kailath, T., Olshevsky, V.: Fast Gaussian elimination with partial pivoting for matrices with displacement structure. Mathematics of Computation 64, 1557–1576 (1995)Alonso, P., Vidal, A.M.: An efficient and stable parallel solution for symmetric toeplitz linear systems. TR DSIC-II/2005, DSIC–Univ. Polit. Valencia (2005)Kailath, T., Sayed, A.H.: Displacement structure: Theory and applications. SIAM Review 37, 297–386 (1995

    Efficient HTTP based I/O on very large datasets for high performance computing with the libdavix library

    Full text link
    Remote data access for data analysis in high performance computing is commonly done with specialized data access protocols and storage systems. These protocols are highly optimized for high throughput on very large datasets, multi-streams, high availability, low latency and efficient parallel I/O. The purpose of this paper is to describe how we have adapted a generic protocol, the Hyper Text Transport Protocol (HTTP) to make it a competitive alternative for high performance I/O and data analysis applications in a global computing grid: the Worldwide LHC Computing Grid. In this work, we first analyze the design differences between the HTTP protocol and the most common high performance I/O protocols, pointing out the main performance weaknesses of HTTP. Then, we describe in detail how we solved these issues. Our solutions have been implemented in a toolkit called davix, available through several recent Linux distributions. Finally, we describe the results of our benchmarks where we compare the performance of davix against a HPC specific protocol for a data analysis use case.Comment: Presented at: Very large Data Bases (VLDB) 2014, Hangzho

    FooPar: A Functional Object Oriented Parallel Framework in Scala

    Full text link
    We present FooPar, an extension for highly efficient Parallel Computing in the multi-paradigm programming language Scala. Scala offers concise and clean syntax and integrates functional programming features. Our framework FooPar combines these features with parallel computing techniques. FooPar is designed modular and supports easy access to different communication backends for distributed memory architectures as well as high performance math libraries. In this article we use it to parallelize matrix matrix multiplication and show its scalability by a isoefficiency analysis. In addition, results based on a empirical analysis on two supercomputers are given. We achieve close-to-optimal performance wrt. theoretical peak performance. Based on this result we conclude that FooPar allows to fully access Scala's design features without suffering from performance drops when compared to implementations purely based on C and MPI

    Multiple strategy process migration

    Full text link
    The future of computing lies with distributed systems, i.e. a network of workstations controlled by a modern distributed operating system. By supporting load balancing and parallel execution, the overall performance of a distributed system can be improved dramatically. Process migration, the act of moving a running process from a highly loaded machine to a lightly loaded machine, could be used to support load balancing, parallel execution, reliability etc. This thesis identifies the problems past process migration facilities have had and determines the possible differing strategies that can be used to resolve these problems. The result of this analysis has led to a new design philosophy. This philosophy requires the design of a process migration facility and the design of an operating system to be conducted in parallel. Modern distributed operating systems follow the microkernel and client/server paradigms. Applying these design paradigms, in conjunction with the requirements of both process migration and a distributed operating system, results in a system where each resource is controlled by a separate server process. However, a process is a complex resource composed of simple resources such as data structures, an address space and communication state. For this reason, a process migration facility does not directly migrate the resources of a process. Instead, it requests the appropriate servers to transfer the resources. This novel solution yields a modular, high performance facility that is easy to create, debug and maintain. Furthermore, the design easily incorporates providing multiple migration strategies. In order to verify the validity of this design, a process migration facility was developed and tested within RHODOS (ResearcH Oriented Distributed Operating System). RHODOS is a modern microkernel and client/server based distributed operating system. In RHODOS, a process is composed of at least three separate resources: process state - maintained by a process manager, address space - maintained by a memory manager and communication state - maintained by an InterProcess Communication Manager (IPCM). The RHODOS multiple strategy migration manager utilises the services of the process, memory and IPC Managers to migrate the resources of a process. Performance testing of this facility indicates that this design is as fast or better than existing systems which use faster hardware. Furthermore, by studying the results of the performance test ing, the conditions under which a particular strategy should be employed have been identified. This thesis also addresses heterogeneous process migration. The current trend is to have islands of homogeneous workstations amid a sea of heterogeneity. From this situation and the current literature on the topic, heterogeneous process migration can be seen as too inefficient for general use. Instead, only homogeneous workstations should be used for process migration. This implies a need to locate homogeneous workstations. Entities called traders, which store and disseminate knowledge about the resources of several workstations, should be used to provide resource discovery. Resource discovery will enable the detection of homogeneous workstations to which processes can be migrated

    A runtime heuristic to selectively replicate tasks for application-specific reliability targets

    Get PDF
    In this paper we propose a runtime-based selective task replication technique for task-parallel high performance computing applications. Our selective task replication technique is automatic and does not require modification/recompilation of OS, compiler or application code. Our heuristic, we call App_FIT, selects tasks to replicate such that the specified reliability target for an application is achieved. In our experimental evaluation, we show that App FIT selective replication heuristic is low-overhead and highly scalable. In addition, results indicate that complete task replication is overkill for achieving reliability targets. We show that with App FIT, we can tolerate pessimistic exascale error rates with only 53% of the tasks being replicated.This work was supported by FI-DGR 2013 scholarship and the European Community’s Seventh Framework Programme [FP7/2007-2013] under the Mont-blanc 2 Project (www.montblanc-project.eu), grant agreement no. 610402 and in part by the European Union (FEDER funds) under contract TIN2015-65316-P.Peer ReviewedPostprint (author's final draft

    RELEASE: A High-level Paradigm for Reliable Large-scale Server Software

    Get PDF
    Erlang is a functional language with a much-emulated model for building reliable distributed systems. This paper outlines the RELEASE project, and describes the progress in the rst six months. The project aim is to scale the Erlang's radical concurrency-oriented programming paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines. Currently Erlang has inherently scalable computation and reliability models, but in practice scalability is constrained by aspects of the language and virtual machine. We are working at three levels to address these challenges: evolving the Erlang virtual machine so that it can work effectively on large scale multicore systems; evolving the language to Scalable Distributed (SD) Erlang; developing a scalable Erlang infrastructure to integrate multiple, heterogeneous clusters. We are also developing state of the art tools that allow programmers to understand the behaviour of massively parallel SD Erlang programs. We will demonstrate the e ectiveness of the RELEASE approach using demonstrators and two large case studies on a Blue Gene
    • …
    corecore