Search CORE

154,279 research outputs found

General Routing Algorithms for Star Graphs

Author: Palis Michael A
Rajasekaran Sanguthevar
Wei David S.L.
Publication venue: ScholarlyCommons
Publication date: 01/05/1989
Field of study

In designing algorithms for a specific parallel architecture, a programmer has to cope with topological and cardinality variations. Both these problems always increase the programmer\u27s effort. However, an ideal shared memory abstract parallel model called the parallel random access machine (PRAM) [KRUS86, KRUS88] that avoids these problems and also simple-to-program has been proposed. Unfortunately, the PRAM does not seem to be realizable in the present or even foreseeable technologies. On the other hand, a packet routing technique can be employed to simulate the PRAM on a feasible parallel architecture without significant loss of efficiency. The problem of routing is also important due to its intrinsic significance in distributed processing and its important role in the simulations among parallel models. The routing problem is defined as follows: Given a specific network and a set of packets of information in which a packet is an (origin, destination) pair. To start with, the packets are placed on their origins, one per node. These packets must be routed in parallel to their own destinations such that at most one packet passes through any link of the network at any time and all packets arrive at their destinations as quickly as possible. We are interested in a special case of the general routing problem called permutation routing in which the destinations form some permutation of the origins. A routing algorithm is said to be oblivious if the path taken by each packet is only dependent on its source and destination. An oblivious routing strategy is preferable since it will lead to a simple control structure for the individual processing elements. Also oblivious routing algorithms can be used in a distributed environment. In this paper we are concerned with only oblivious routing strategies

Programming MPSoC platforms: Road works ahead

Author: Bekooij Marco
Domer Rainer
Leupers Rainer
Nohl Achim
Soonhoi Ha
Vajda Andras
Publication venue: IEEE Computer Society Press
Publication date: 01/01/2009
Field of study

This paper summarizes a special session on multicore/multi-processor system-on-chip (MPSoC) programming challenges. The current trend towards MPSoC platforms in most computing domains does not only mean a radical change in computer architecture. Even more important from a SW developer´s viewpoint, at the same time the classical sequential von Neumann programming model needs to be overcome. Efficient utilization of the MPSoC HW resources demands for radically new models and corresponding SW development tools, capable of exploiting the available parallelism and guaranteeing bug-free parallel SW. While several standards are established in the high-performance computing domain (e.g. OpenMP), it is clear that more innovations are required for successful\ud deployment of heterogeneous embedded MPSoC. On the other hand, at least for coming years, the freedom for disruptive programming technologies is limited by the huge amount of certified sequential code that demands for a more pragmatic, gradual tool and code replacement strategy

University of Twente Research Information

High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures

Author: Chai J.
Ren J.
Su H. Y.
Wen M.
Wu N.
Zhang C. Y.
Publication venue: Společnost pro radioelektronické inženýrství
Publication date: 01/04/2012
Field of study

This article presents two high-efficient parallel realizations of the context-based adaptive variable length coding (CAVLC) based on heterogeneous multicore processors. By optimizing the architecture of the CAVLC encoder, three kinds of dependences are eliminated or weaken, including the context-based data dependence, the memory accessing dependence and the control dependence. The CAVLC pipeline is divided into three stages: two scans, coding, and lag packing, and be implemented on two typical heterogeneous multicore architectures. One is a block-based SIMD parallel CAVLC encoder on multicore stream processor STORM. The other is a component-oriented SIMT parallel encoder on massively parallel architecture GPU. Both of them exploited rich data-level parallelism. Experiments results show that compared with the CPU version, more than 70 times of speedup can be obtained for STORM and over 50 times for GPU. The implementation of encoder on STORM can make a real-time processing for 1080p @30fps and GPU-based version can satisfy the requirements for 720p real-time encoding. The throughput of the presented CAVLC encoders is more than 10 times higher than that of published software encoders on DSP and multicore platforms

Directory of Open Access Journals

Digital library of Brno University of Technology

OpenCL + OpenSHMEM Hybrid Programming Model for the Adapteva Epiphany Architecture

Author: D Wentzlaff
J Ross
JE Stone
M Baker
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/08/2016
Field of study

There is interest in exploring hybrid OpenSHMEM + X programming models to extend the applicability of the OpenSHMEM interface to more hardware architectures. We present a hybrid OpenCL + OpenSHMEM programming model for device-level programming for architectures like the Adapteva Epiphany many-core RISC array processor. The Epiphany architecture comprises a 2D array of low-power RISC cores with minimal uncore functionality connected by a 2D mesh Network-on-Chip (NoC). The Epiphany architecture offers high computational energy efficiency for integer and floating point calculations as well as parallel scalability. The Epiphany-III is available as a coprocessor in platforms that also utilize an ARM CPU host. OpenCL provides good functionality for supporting a co-design programming model in which the host CPU offloads parallel work to a coprocessor. However, the OpenCL memory model is inconsistent with the Epiphany memory architecture and lacks support for inter-core communication. We propose a hybrid programming model in which OpenSHMEM provides a better solution by replacing the non-standard OpenCL extensions introduced to achieve high performance with the Epiphany architecture. We demonstrate the proposed programming model for matrix-matrix multiplication based on Cannon's algorithm showing that the hybrid model addresses the deficiencies of using OpenCL alone to achieve good benchmark performance.Comment: 12 pages, 5 figures, OpenSHMEM 2016: Third workshop on OpenSHMEM and Related Technologie

arXiv.org e-Print Archive