25,419 research outputs found

    Integrating Task and Data Parallelism

    Get PDF
    Many models of concurrency and concurrent programming have been proposed; most can be categorized as either task-parallel (based on functional decomposition) or data-parallel (based on data decomposition). Task-parallel models are most effective for expressing irregular computations; data-parallel models are most effective for expressing regular computations. Some computations, however, exhibit both regular and irregular aspects. For such computations, a better programming model is one that integrates task and data parallelism. This report describes one model of integrating task and data parallelism, some problem classes for which it is effective, and a prototype implementation

    Integrating Task and Data Parallelism with the Collective Communication Archetype

    Get PDF
    A parallel program archetype aids in the development of reliable, efficient parallel applications with common computation/communication structures by providing stepwise refinement methods and code libraries specific to the structure. The methods and libraries help in transforming a sequential program into a parallel program via a sequence of refinement steps that help maintain correctness while refining the program to obtain the appropriate level of granularity for a target machine. The specific archetype discussed here deals with the integration of task and data parallelism by using collective (or group) communication. This archetype has been used to develop several applications

    Automatic Detection of Performance Anomalies in Task-Parallel Programs

    Full text link
    To efficiently exploit the resources of new many-core architectures, integrating dozens or even hundreds of cores per chip, parallel programming models have evolved to expose massive amounts of parallelism, often in the form of fine-grained tasks. Task-parallel languages, such as OpenStream, X10, Habanero Java and C or StarSs, simplify the development of applications for new architectures, but tuning task-parallel applications remains a major challenge. Performance bottlenecks can occur at any level of the implementation, from the algorithmic level (e.g., lack of parallelism or over-synchronization), to interactions with the operating and runtime systems (e.g., data placement on NUMA architectures), to inefficient use of the hardware (e.g., frequent cache misses or misaligned memory accesses); detecting such issues and determining the exact cause is a difficult task. In previous work, we developed Aftermath, an interactive tool for trace-based performance analysis and debugging of task-parallel programs and run-time systems. In contrast to other trace-based analysis tools, such as Paraver or Vampir, Aftermath offers native support for tasks, i.e., visualization, statistics and analysis tools adapted for performance debugging at task granularity. However, the tool currently does not provide support for the automatic detection of performance bottlenecks and it is up to the user to investigate the relevant aspects of program execution by focusing the inspection on specific slices of a trace file. In this paper, we present ongoing work on two extensions that guide the user through this process.Comment: Presented at 1st Workshop on Resource Awareness and Adaptivity in Multi-Core Computing (Racing 2014) (arXiv:1405.2281

    Task parallelism and high-performance languages

    Get PDF
    The definition of High Performance Fortran (HPF) is a significant event in the maturation of parallel computing: it represents the first parallel language that has gained widespread support from vendors and users. The subject of this paper is to incorporate support for task parallelism. The term task parallelism refers to the explicit creation of multiple threads of control, or tasks, which synchronize and communicate under programmer control. Task and data parallelism are complementary rather than competing programming models. While task parallelism is more general and can be used to implement algorithms that are not amenable to data-parallel solutions, many problems can benefit from a mixed approach, with for example a task-parallel coordination layer integrating multiple data-parallel computations. Other problems admit to both data- and task-parallel solutions, with the better solution depending on machine characteristics, compiler performance, or personal taste. For these reasons, we believe that a general-purpose high-performance language should integrate both task- and data-parallel constructs. The challenge is to do so in a way that provides the expressivity needed for applications, while preserving the flexibility and portability of a high-level language. In this paper, we examine and illustrate the considerations that motivate the use of task parallelism. We also describe one particular approach to task parallelism in Fortran, namely the Fortran M extensions. Finally, we contrast Fortran M with other proposed approaches and discuss the implications of this work for task parallelism and high-performance languages

    Implementing PRISMA/DB in an OOPL

    Get PDF
    PRISMA/DB is implemented in a parallel object-oriented language to gain insight in the usage of parallelism. This environment allows us to experiment with parallelism by simply changing the allocation of objects to the processors of the PRISMA machine. These objects are obtained by a strictly modular design of PRISMA/DB. Communication between the objects is required to cooperatively handle the various tasks, but it limits the potential for parallelism. From this approach, we hope to gain a better understanding of parallelism, which can be used to enhance the performance of PRISMA/DB.\ud The work reported in this document was conducted as part of the PRISMA project, a joint effort with Philips Research Eindhoven, partially supported by the Dutch "Stimuleringsprojectteam Informaticaonderzoek (SPIN)

    Integrating Job Parallelism in Real-Time Scheduling Theory

    Get PDF
    We investigate the global scheduling of sporadic, implicit deadline, real-time task systems on multiprocessor platforms. We provide a task model which integrates job parallelism. We prove that the time-complexity of the feasibility problem of these systems is linear relatively to the number of (sporadic) tasks for a fixed number of processors. We propose a scheduling algorithm theoretically optimal (i.e., preemptions and migrations neglected). Moreover, we provide an exact feasibility utilization bound. Lastly, we propose a technique to limit the number of migrations and preemptions

    Task decomposition using pattern distributor

    Get PDF
    In this paper, we propose a new task decomposition method for multilayered feedforward neural networks, namely Task Decomposition with Pattern Distributor in order to shorten the training time and improve the generalization accuracy of a network under training. This new method uses the combination of modules (small-size feedforward network) in parallel and series, to produce the overall solution for a complex problem. Based on a “divide-and-conquer” technique, the original problem is decomposed into several simpler sub-problems by a pattern distributor module in the network, where each sub-problem is composed of the whole input vector and a fraction of the output vector of the original problem. These sub-problems are then solved by the corresponding groups of modules, where each group of modules is connected in series with the pattern distributor module and the modules in each group are connected in parallel. The design details and implementation of this new method are introduced in this paper. Several benchmark classification problems are used to test this new method. The analysis and experimental results show that this new method could reduce training time and improve generalization accuracy

    Identifying and evaluating parallel design activities using the design structure matrix

    Get PDF
    This paper describes an approach based upon the Design Structure Matrix (DSM) for identifying, evaluating and optimising one aspect of CE: activity parallelism. Concurrent Engineering (CE) has placed emphasis on the management of the product development process and one of its major benefits is the reduction in lead-time and product cost [1]. One approach that CE promotes for the reduction of lead-time is the simultaneous enactment of activities otherwise known as Simultaneous Engineering. Whilst activity parallelism may contribute to the reduction in lead-time and product cost, the effect of iteration is also recognised as a contributing factor on lead-time, and hence was also combined within the investigation. The paper describes how parallel activities may be identified within the DSM, before detailing how a process may be evaluated with respect to parallelism and iteration using the DSM. An optimisation algorithm is then utilised to establish a near-optimal sequence for the activities with respect to parallelism and iteration. DSM-based processes from previously published research are used to describe the development of the approach
    corecore