Search CORE

22,738 research outputs found

Task parallelism and high-performance languages

Author: Foster I.
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 01/01/1996
Field of study

The definition of High Performance Fortran (HPF) is a significant event in the maturation of parallel computing: it represents the first parallel language that has gained widespread support from vendors and users. The subject of this paper is to incorporate support for task parallelism. The term task parallelism refers to the explicit creation of multiple threads of control, or tasks, which synchronize and communicate under programmer control. Task and data parallelism are complementary rather than competing programming models. While task parallelism is more general and can be used to implement algorithms that are not amenable to data-parallel solutions, many problems can benefit from a mixed approach, with for example a task-parallel coordination layer integrating multiple data-parallel computations. Other problems admit to both data- and task-parallel solutions, with the better solution depending on machine characteristics, compiler performance, or personal taste. For these reasons, we believe that a general-purpose high-performance language should integrate both task- and data-parallel constructs. The challenge is to do so in a way that provides the expressivity needed for applications, while preserving the flexibility and portability of a high-level language. In this paper, we examine and illustrate the considerations that motivate the use of task parallelism. We also describe one particular approach to task parallelism in Fortran, namely the Fortran M extensions. Finally, we contrast Fortran M with other proposed approaches and discuss the implications of this work for task parallelism and high-performance languages

CiteSeerX

UNT Digital Library

Identifying and evaluating parallel design activities using the design structure matrix

Author: Duffy A.H.B.
Kortabarria L.
Whitfield R.I.
Publication venue: 'The Design Society'
Publication date: 01/01/2005
Field of study

This paper describes an approach based upon the Design Structure Matrix (DSM) for identifying, evaluating and optimising one aspect of CE: activity parallelism. Concurrent Engineering (CE) has placed emphasis on the management of the product development process and one of its major benefits is the reduction in lead-time and product cost [1]. One approach that CE promotes for the reduction of lead-time is the simultaneous enactment of activities otherwise known as Simultaneous Engineering. Whilst activity parallelism may contribute to the reduction in lead-time and product cost, the effect of iteration is also recognised as a contributing factor on lead-time, and hence was also combined within the investigation. The paper describes how parallel activities may be identified within the DSM, before detailing how a process may be evaluated with respect to parallelism and iteration using the DSM. An optimisation algorithm is then utilised to establish a near-optimal sequence for the activities with respect to parallelism and iteration. DSM-based processes from previously published research are used to describe the development of the approach

University of Strathclyde Institutional Repository

A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Applications

Author: Nabi Syed Waqar
Vanderbauwhede Wim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2016
Field of study

Heterogeneous High-Performance Computing (HPC) platforms present a significant programming challenge, especially because the key users of HPC resources are scientists, not parallel programmers. We contend that compiler technology has to evolve to automatically create the best program variant by transforming a given original program. We have developed a novel methodology based on type transformations for generating correct-by-construction design variants, and an associated light-weight cost model for evaluating these variants for implementation on FPGAs. In this paper we present a key enabler of our approach, the cost model. We discuss how we are able to quickly derive accurate estimates of performance and resource-utilization from the design’s representation in our intermediate language. We show results confirming the accuracy of our cost model by testing it on three different scientific kernels. We conclude with a case-study that compares a solution generated by our framework with one from a conventional high-level synthesis tool, showing better performance and power-efficiency using our cost model based approach

Crossref

Enlighten

GPUs as Storage System Accelerators

Author: Al-Kiswany Samer
Gharaibeh Abdullah
Ripeanu Matei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/05/2012
Field of study

Massively multicore processors, such as Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any order-of-magnitude drop in the cost per unit of performance for a class of system components, triggers the opportunity to redesign systems and to explore new ways to engineer them to recalibrate the cost-to-performance relation. This project explores the feasibility of harnessing GPUs' computational power to improve the performance, reliability, or security of distributed storage systems. In this context, we present the design of a storage system prototype that uses GPU offloading to accelerate a number of computationally intensive primitives based on hashing, and introduce techniques to efficiently leverage the processing power of GPUs. We evaluate the performance of this prototype under two configurations: as a content addressable storage system that facilitates online similarity detection between successive versions of the same file and as a traditional system that uses hashing to preserve data integrity. Further, we evaluate the impact of offloading to the GPU on competing applications' performance. Our results show that this technique can bring tangible performance gains without negatively impacting the performance of concurrently running applications.Comment: IEEE Transactions on Parallel and Distributed Systems, 201

arXiv.org e-Print Archive

Crossref