Search CORE

25,419 research outputs found

Integrating Task and Data Parallelism

Author: Massingill Berna Linda
Publication venue
Publication date: 01/01/1993
Field of study

Many models of concurrency and concurrent programming have been proposed; most can be categorized as either task-parallel (based on functional decomposition) or data-parallel (based on data decomposition). Task-parallel models are most effective for expressing irregular computations; data-parallel models are most effective for expressing regular computations. Some computations, however, exhibit both regular and irregular aspects. For such computations, a better programming model is one that integrates task and data parallelism. This report describes one model of integrating task and data parallelism, some problem classes for which it is effective, and a prototype implementation

Caltech Authors

Caltech Theses and Dissertations

Integrating Task and Data Parallelism with the Collective Communication Archetype

Author: Chandy K. Mani
Manohar Rajit
Massingill Berna L.
Meiron Daniel I.
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/1994
Field of study

A parallel program archetype aids in the development of reliable, efficient parallel applications with common computation/communication structures by providing stepwise refinement methods and code libraries specific to the structure. The methods and libraries help in transforming a sequential program into a parallel program via a sequence of refinement steps that help maintain correctness while refining the program to obtain the appropriate level of granularity for a target machine. The specific archetype discussed here deals with the integration of task and data parallelism by using collective (or group) communication. This archetype has been used to develop several applications

CiteSeerX

Caltech Authors

Automatic Detection of Performance Anomalies in Task-Parallel Programs

Author: Cohen Albert
Drach Nathalie
Drebes Andi
Heydemann Karine
Pop Antoniu
Publication venue
Publication date: 12/05/2014
Field of study

To efficiently exploit the resources of new many-core architectures, integrating dozens or even hundreds of cores per chip, parallel programming models have evolved to expose massive amounts of parallelism, often in the form of fine-grained tasks. Task-parallel languages, such as OpenStream, X10, Habanero Java and C or StarSs, simplify the development of applications for new architectures, but tuning task-parallel applications remains a major challenge. Performance bottlenecks can occur at any level of the implementation, from the algorithmic level (e.g., lack of parallelism or over-synchronization), to interactions with the operating and runtime systems (e.g., data placement on NUMA architectures), to inefficient use of the hardware (e.g., frequent cache misses or misaligned memory accesses); detecting such issues and determining the exact cause is a difficult task. In previous work, we developed Aftermath, an interactive tool for trace-based performance analysis and debugging of task-parallel programs and run-time systems. In contrast to other trace-based analysis tools, such as Paraver or Vampir, Aftermath offers native support for tasks, i.e., visualization, statistics and analysis tools adapted for performance debugging at task granularity. However, the tool currently does not provide support for the automatic detection of performance bottlenecks and it is up to the user to investigate the relevant aspects of program execution by focusing the inspection on specific slices of a trace file. In this paper, we present ongoing work on two extensions that guide the user through this process.Comment: Presented at 1st Workshop on Resource Awareness and Adaptivity in Multi-Core Computing (Racing 2014) (arXiv:1405.2281

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Task parallelism and high-performance languages

Author: Foster I.
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 01/01/1996
Field of study

The definition of High Performance Fortran (HPF) is a significant event in the maturation of parallel computing: it represents the first parallel language that has gained widespread support from vendors and users. The subject of this paper is to incorporate support for task parallelism. The term task parallelism refers to the explicit creation of multiple threads of control, or tasks, which synchronize and communicate under programmer control. Task and data parallelism are complementary rather than competing programming models. While task parallelism is more general and can be used to implement algorithms that are not amenable to data-parallel solutions, many problems can benefit from a mixed approach, with for example a task-parallel coordination layer integrating multiple data-parallel computations. Other problems admit to both data- and task-parallel solutions, with the better solution depending on machine characteristics, compiler performance, or personal taste. For these reasons, we believe that a general-purpose high-performance language should integrate both task- and data-parallel constructs. The challenge is to do so in a way that provides the expressivity needed for applications, while preserving the flexibility and portability of a high-level language. In this paper, we examine and illustrate the considerations that motivate the use of task parallelism. We also describe one particular approach to task parallelism in Fortran, namely the Fortran M extensions. Finally, we contrast Fortran M with other proposed approaches and discuss the implications of this work for task parallelism and high-performance languages

CiteSeerX

UNT Digital Library

Implementing PRISMA/DB in an OOPL

Author: Apers P.M.G.
Grefen P.W.P.J.
Kersten M.L.
Wilschut A.N.
Publication venue: Springer
Publication date: 01/01/1989
Field of study

PRISMA/DB is implemented in a parallel object-oriented language to gain insight in the usage of parallelism. This environment allows us to experiment with parallelism by simply changing the allocation of objects to the processors of the PRISMA machine. These objects are obtained by a strictly modular design of PRISMA/DB. Communication between the objects is required to cooperatively handle the various tasks, but it limits the potential for parallelism. From this approach, we hope to gain a better understanding of parallelism, which can be used to enhance the performance of PRISMA/DB.\ud The work reported in this document was conducted as part of the PRISMA project, a joint effort with Philips Research Eindhoven, partially supported by the Dutch "Stimuleringsprojectteam Informaticaonderzoek (SPIN)

University of Twente Research Information

Integrating Job Parallelism in Real-Time Scheduling Theory

Author: Baker
Baker
Chandra
Geist
Goossens
Gorlatch
Joël Goossens
Leiss
Liliana Cucu
Liu
Manimaran
Srinivasan
Sunderam
Sébastien Collette
Zomaya
Publication venue
Publication date: 01/01/2008
Field of study

We investigate the global scheduling of sporadic, implicit deadline, real-time task systems on multiprocessor platforms. We provide a task model which integrates job parallelism. We prove that the time-complexity of the feasibility problem of these systems is linear relatively to the number of (sporadic) tasks for a fixed number of processors. We propose a scheduling algorithm theoretically optimal (i.e., preemptions and migrations neglected). Moreover, we provide an exact feasibility utilization bound. Lastly, we propose a technique to limit the number of migrations and preemptions

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

DI-fusion

Recommended from our members

Hierarchical incremental class learning with reduced pattern training

Author: Bao C
Guan SU
Sun RT
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 20/09/2006
Field of study

Hierarchical Incremental Class Learning (HICL) is a new task decomposition method that addresses the pattern classification problem. HICL is proven to be a good classifier but closer examination reveals areas for potential improvement. This paper proposes a theoretical model to evaluate the performance of HICL and presents an approach to improve the classification accuracy of HICL by applying the concept of Reduced Pattern Training (RPT). The theoretical analysis shows that HICL can achieve better classification accuracy than Output Parallelism [1]. The procedure for RPT is described and compared with the original training procedure. RPT reduces systematically the size of the training data set based on the order of sub-networks built. The results from four benchmark classification problems show much promise for the improved model

Brunel University Research Archive

Task decomposition using pattern distributor

Author: Bao C
Guan SU
Neo T
Publication venue: Freund & Pettman
Publication date: 01/01/2004
Field of study

In this paper, we propose a new task decomposition method for multilayered feedforward neural networks, namely Task Decomposition with Pattern Distributor in order to shorten the training time and improve the generalization accuracy of a network under training. This new method uses the combination of modules (small-size feedforward network) in parallel and series, to produce the overall solution for a complex problem. Based on a “divide-and-conquer” technique, the original problem is decomposed into several simpler sub-problems by a pattern distributor module in the network, where each sub-problem is composed of the whole input vector and a fraction of the output vector of the original problem. These sub-problems are then solved by the corresponding groups of modules, where each group of modules is connected in series with the pattern distributor module and the modules in each group are connected in parallel. The design details and implementation of this new method are introduced in this paper. Several benchmark classification problems are used to test this new method. The analysis and experimental results show that this new method could reduce training time and improve generalization accuracy

Directory of Open Access Journals

Brunel University Research Archive

ScholarBank@NUS

Identifying and evaluating parallel design activities using the design structure matrix

Author: Duffy A.H.B.
Kortabarria L.
Whitfield R.I.
Publication venue: 'The Design Society'
Publication date: 01/01/2005
Field of study

This paper describes an approach based upon the Design Structure Matrix (DSM) for identifying, evaluating and optimising one aspect of CE: activity parallelism. Concurrent Engineering (CE) has placed emphasis on the management of the product development process and one of its major benefits is the reduction in lead-time and product cost [1]. One approach that CE promotes for the reduction of lead-time is the simultaneous enactment of activities otherwise known as Simultaneous Engineering. Whilst activity parallelism may contribute to the reduction in lead-time and product cost, the effect of iteration is also recognised as a contributing factor on lead-time, and hence was also combined within the investigation. The paper describes how parallel activities may be identified within the DSM, before detailing how a process may be evaluated with respect to parallelism and iteration using the DSM. An optimisation algorithm is then utilised to establish a near-optimal sequence for the activities with respect to parallelism and iteration. DSM-based processes from previously published research are used to describe the development of the approach

University of Strathclyde Institutional Repository