154,279 research outputs found
General Routing Algorithms for Star Graphs
In designing algorithms for a specific parallel architecture, a programmer has to cope with topological and cardinality variations. Both these problems always increase the programmer\u27s effort. However, an ideal shared memory abstract parallel model called the parallel random access machine (PRAM) [KRUS86, KRUS88] that avoids these problems and also simple-to-program has been proposed. Unfortunately, the PRAM does not seem to be realizable in the present or even foreseeable technologies. On the other hand, a packet routing technique can be employed to simulate the PRAM on a feasible parallel architecture without significant loss of efficiency. The problem of routing is also important due to its intrinsic significance in distributed processing and its important role in the simulations among parallel models.
The routing problem is defined as follows: Given a specific network and a set of packets of information in which a packet is an (origin, destination) pair. To start with, the packets are placed on their origins, one per node. These packets must be routed in parallel to their own destinations such that at most one packet passes through any link of the network at any time and all packets arrive at their destinations as quickly as possible. We are interested in a special case of the general routing problem called permutation routing in which the destinations form some permutation of the origins. A routing algorithm is said to be oblivious if the path taken by each packet is only dependent on its source and destination. An oblivious routing strategy is preferable since it will lead to a simple control structure for the individual processing elements. Also oblivious routing algorithms can be used in a distributed environment. In this paper we are concerned with only oblivious routing strategies
Programming MPSoC platforms: Road works ahead
This paper summarizes a special session on multicore/multi-processor system-on-chip (MPSoC) programming challenges. The current trend towards MPSoC platforms in most computing domains does not only mean a radical change in computer architecture. Even more important from a SW developer´s viewpoint, at the same time the classical sequential von Neumann programming model needs to be overcome. Efficient utilization of the MPSoC HW resources demands for radically new models and corresponding SW development tools, capable of exploiting the available parallelism and guaranteeing bug-free parallel SW. While several standards are established in the high-performance computing domain (e.g. OpenMP), it is clear that more innovations are required for successful\ud
deployment of heterogeneous embedded MPSoC. On the other hand, at least for coming years, the freedom for disruptive programming technologies is limited by the huge amount of certified sequential code that demands for a more pragmatic, gradual tool and code replacement strategy
High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures
This article presents two high-efficient parallel realizations of the context-based adaptive variable length coding (CAVLC) based on heterogeneous multicore processors. By optimizing the architecture of the CAVLC encoder, three kinds of dependences are eliminated or weaken, including the context-based data dependence, the memory accessing dependence and the control dependence. The CAVLC pipeline is divided into three stages: two scans, coding, and lag packing, and be implemented on two typical heterogeneous multicore architectures. One is a block-based SIMD parallel CAVLC encoder on multicore stream processor STORM. The other is a component-oriented SIMT parallel encoder on massively parallel architecture GPU. Both of them exploited rich data-level parallelism. Experiments results show that compared with the CPU version, more than 70 times of speedup can be obtained for STORM and over 50 times for GPU. The implementation of encoder on STORM can make a real-time processing for 1080p @30fps and GPU-based version can satisfy the requirements for 720p real-time encoding. The throughput of the presented CAVLC encoders is more than 10 times higher than that of published software encoders on DSP and multicore platforms
OpenCL + OpenSHMEM Hybrid Programming Model for the Adapteva Epiphany Architecture
There is interest in exploring hybrid OpenSHMEM + X programming models to
extend the applicability of the OpenSHMEM interface to more hardware
architectures. We present a hybrid OpenCL + OpenSHMEM programming model for
device-level programming for architectures like the Adapteva Epiphany many-core
RISC array processor. The Epiphany architecture comprises a 2D array of
low-power RISC cores with minimal uncore functionality connected by a 2D mesh
Network-on-Chip (NoC). The Epiphany architecture offers high computational
energy efficiency for integer and floating point calculations as well as
parallel scalability. The Epiphany-III is available as a coprocessor in
platforms that also utilize an ARM CPU host. OpenCL provides good functionality
for supporting a co-design programming model in which the host CPU offloads
parallel work to a coprocessor. However, the OpenCL memory model is
inconsistent with the Epiphany memory architecture and lacks support for
inter-core communication. We propose a hybrid programming model in which
OpenSHMEM provides a better solution by replacing the non-standard OpenCL
extensions introduced to achieve high performance with the Epiphany
architecture. We demonstrate the proposed programming model for matrix-matrix
multiplication based on Cannon's algorithm showing that the hybrid model
addresses the deficiencies of using OpenCL alone to achieve good benchmark
performance.Comment: 12 pages, 5 figures, OpenSHMEM 2016: Third workshop on OpenSHMEM and
Related Technologie
- …