10,362 research outputs found
MORA - an architecture and programming model for a resource efficient coarse grained reconfigurable processor
This paper presents an architecture and implementation details for MORA, a novel coarse grained reconfigurable processor for accelerating media processing applications. The MORA architecture involves a 2-D array of several such processors, to deliver low cost, high throughput performance in media processing applications. A distinguishing feature of the MORA architecture is the co-design of hardware architecture and low-level programming language throughout the design cycle. The implementation details for the single MORA processor, and benchmark evaluation using a cycle accurate simulator are presented
Recommended from our members
The scheduling of sparse matrix-vector multiplication on a massively parallel dap computer
An efficient data structure is presented which supports general unstructured sparse matrix-vector multiplications on a Distributed Array of Processors (DAP). This approach seeks to reduce the inter-processor data movements and organises the operations in batches of massively parallel steps by a heuristic scheduling procedure performed on the host computer.
The resulting data structure is of particular relevance to iterative schemes for solving linear systems. Performance results for matrices taken from well known Linear Programming (LP) test problems are presented and analysed
Architectures for block Toeplitz systems
In this paper efficient VLSI architectures of highly concurrent algorithms for the solution of block linear systems with Toeplitz or near-to-Toeplitz entries are presented. The main features of the proposed scheme are the use of scalar only operations, multiplications/divisions and additions, and the local communication which enables the development of wavefront array architecture. Both the mean squared error and the total squared error formulations are described and a variety of implementations are given
A C++-embedded Domain-Specific Language for programming the MORA soft processor array
MORA is a novel platform for high-level FPGA programming of streaming vector and matrix operations, aimed at multimedia applications. It consists of soft array of pipelined low-complexity SIMD processors-in-memory (PIM). We present a Domain-Specific Language (DSL) for high-level programming of the MORA soft processor array. The DSL is embedded in C++, providing designers with a familiar language framework and the ability to compile designs using a standard compiler for functional testing before generating the FPGA bitstream using the MORA toolchain. The paper discusses the MORA-C++ DSL and the compilation route into the assembly for the MORA machine and provides examples to illustrate the programming model and performance
Recommended from our members
Parallel data compression
Data compression schemes remove data redundancy in communicated and stored data and increase the effective capacities of communication and storage devices. Parallel algorithms and implementations for textual data compression are surveyed. Related concepts from parallel computation and information theory are briefly discussed. Static and dynamic methods for codeword construction and transmission on various models of parallel computation are described. Included are parallel methods which boost system speed by coding data concurrently, and approaches which employ multiple compression techniques to improve compression ratios. Theoretical and empirical comparisons are reported and areas for future research are suggested
OpenCL + OpenSHMEM Hybrid Programming Model for the Adapteva Epiphany Architecture
There is interest in exploring hybrid OpenSHMEM + X programming models to
extend the applicability of the OpenSHMEM interface to more hardware
architectures. We present a hybrid OpenCL + OpenSHMEM programming model for
device-level programming for architectures like the Adapteva Epiphany many-core
RISC array processor. The Epiphany architecture comprises a 2D array of
low-power RISC cores with minimal uncore functionality connected by a 2D mesh
Network-on-Chip (NoC). The Epiphany architecture offers high computational
energy efficiency for integer and floating point calculations as well as
parallel scalability. The Epiphany-III is available as a coprocessor in
platforms that also utilize an ARM CPU host. OpenCL provides good functionality
for supporting a co-design programming model in which the host CPU offloads
parallel work to a coprocessor. However, the OpenCL memory model is
inconsistent with the Epiphany memory architecture and lacks support for
inter-core communication. We propose a hybrid programming model in which
OpenSHMEM provides a better solution by replacing the non-standard OpenCL
extensions introduced to achieve high performance with the Epiphany
architecture. We demonstrate the proposed programming model for matrix-matrix
multiplication based on Cannon's algorithm showing that the hybrid model
addresses the deficiencies of using OpenCL alone to achieve good benchmark
performance.Comment: 12 pages, 5 figures, OpenSHMEM 2016: Third workshop on OpenSHMEM and
Related Technologie
- âŠ