1,768 research outputs found

    Bidirectional fano algorithm for high throughput sequential decoding

    Get PDF

    MORA - an architecture and programming model for a resource efficient coarse grained reconfigurable processor

    Get PDF
    This paper presents an architecture and implementation details for MORA, a novel coarse grained reconfigurable processor for accelerating media processing applications. The MORA architecture involves a 2-D array of several such processors, to deliver low cost, high throughput performance in media processing applications. A distinguishing feature of the MORA architecture is the co-design of hardware architecture and low-level programming language throughout the design cycle. The implementation details for the single MORA processor, and benchmark evaluation using a cycle accurate simulator are presented

    High-Level Design Space and Flexibility Exploration for Adaptive, Energy-Efficient WCDMA Channel Estimation Architectures

    Get PDF
    Due to the fast changing wireless communication standards coupled with strict performance constraints, the demand for flexible yet high-performance architectures is increasing. To tackle the flexibility requirement, software-defined radio (SDR) is emerging as an obvious solution, where the underlying hardware implementation is tuned via software layers to the varied standards depending on power-performance and quality requirements leading to adaptable, cognitive radio. In this paper, we conduct a case study for representatives of two complexity classes of WCDMA channel estimation algorithms and explore the effect of flexibility on energy efficiency using different implementation options. Furthermore, we propose new design guidelines for both highly specialized architectures and highly flexible architectures using high-level synthesis, to enable the required performance and flexibility to support multiple applications. Our experiments with various design points show that the resulting architectures meet the performance constraints of WCDMA and a wide range of options are offered for tuning such architectures depending on power/performance/area constraints of SDR

    Reconfigurable architectures for beyond 3G wireless communication systems

    Get PDF

    NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

    Get PDF
    Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving many state-of-the-art (SOA) visual processing tasks. Even though Graphical Processing Units (GPUs) are most often used in training and deploying CNNs, their power efficiency is less than 10 GOp/s/W for single-frame runtime inference. We propose a flexible and efficient CNN accelerator architecture called NullHop that implements SOA CNNs useful for low-power and low-latency application scenarios. NullHop exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements. The flexible architecture allows high utilization of available computing resources across kernel sizes ranging from 1x1 to 7x7. NullHop can process up to 128 input and 128 output feature maps per layer in a single pass. We implemented the proposed architecture on a Xilinx Zynq FPGA platform and present results showing how our implementation reduces external memory transfers and compute time in five different CNNs ranging from small ones up to the widely known large VGG16 and VGG19 CNNs. Post-synthesis simulations using Mentor Modelsim in a 28nm process with a clock frequency of 500 MHz show that the VGG19 network achieves over 450 GOp/s. By exploiting sparsity, NullHop achieves an efficiency of 368%, maintains over 98% utilization of the MAC units, and achieves a power efficiency of over 3TOp/s/W in a core area of 6.3mm2^2. As further proof of NullHop's usability, we interfaced its FPGA implementation with a neuromorphic event camera for real time interactive demonstrations

    New Design Techniques for Dynamic Reconfigurable Architectures

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    A High-Throughput Energy-Efficient Implementation of Successive-Cancellation Decoder for Polar Codes Using Combinational Logic

    Get PDF
    This paper proposes a high-throughput energy-efficient Successive Cancellation (SC) decoder architecture for polar codes based on combinational logic. The proposed combinational architecture operates at relatively low clock frequencies compared to sequential circuits, but takes advantage of the high degree of parallelism inherent in such architectures to provide a favorable tradeoff between throughput and energy efficiency at short to medium block lengths. At longer block lengths, the paper proposes a hybrid-logic SC decoder that combines the advantageous aspects of the combinational decoder with the low-complexity nature of sequential-logic decoders. Performance characteristics on ASIC and FPGA are presented with a detailed power consumption analysis for combinational decoders. Finally, the paper presents an analysis of the complexity and delay of combinational decoders, and of the throughput gains obtained by hybrid-logic decoders with respect to purely synchronous architectures.Comment: 12 pages, 10 figures, 8 table
    corecore