3 research outputs found

    Support of Collective Effort Towards Performance Portability

    Get PDF
    International audiencePerformance portability, in the sense that a single source can run with good performance across a wide vari- ation of parallel hardware platforms, is strongly desired by industry and actively being researched. How- ever, evidence is mounting that performance portability cannot be realized at just the toolchain level, or just at the runtime level or just at the hardware abstraction level. This is a position paper, making a suggestion for how the groups involved can more efficiently solve the performance portability problem together. We don't propose a solution, at all, but rather a support system for the players to self organize and collectively find one. The support system is based on a new extendable virtualization mechanism called VMS (Virtualized Master-Slave), that fulfills the needs of an organizing principle, and provides focus that may increase research efficiency. The difficult work will be the on-going research efforts on parallel language design, compilers, source-to-source transform tools, binary optimization, run-time schedulers, and hardware support for parallelism. Although it doesn't in itself solve the problem, such an organizing principle may be a valuable step towards a solution - the problem may be too complex and require cooperation of too many real-world entities for a single-entity solution. We briefly review VMS, and illustrate how it could be used to give rise to an eco-system in which perfor- mance portability is collectively realized. To support the suggestion, we give measurements of the time to implement three parallelism-construct libraries, and performance numbers for them, along with measure- ments of the basic overhead of VMS

    Architecture-Independent Request-Scheduling with Tight Waiting-Time Estimations

    No full text
    In the course of the last few years, the user's interaction with parallel computer-systems has changed. A continuous growth in the number of interactive HPCapplications can be observed. When considering partitionable MPP-systems with exclusive usage of the physically separated regions, issues like the average waiting-time become more dominant for the users than the total system-throughput. In this paper, we focus on the problem of scheduling an arbitrary mixture of resource-requests for batch and interactive applications in an architecture-independent manner. To help users plan their daily work tight waiting-time estimations are indispensable. However, the resulting scheduling problem interferes with the problem of mapping requests onto certain MPParchitectures to reduce their internal fragmentations. We will show that this conflict can be alleviated by a distributed prover-verifier methodology. At first, we will introduce the distributed resourcemanagement software CCS with its archit..
    corecore