359 research outputs found

    Rapid Digital Architecture Design of Computationally Complex Algorithms

    Get PDF
    Traditional digital design techniques hardly keep up with the rising abundance of programmable circuitry found on recent Field-Programmable Gate Arrays. Therefore, the novel Rapid Data Type-Agnostic Digital Design Methodology (RDAM) elevates the design perspective of digital design engineers away from the register-transfer level to the algorithmic level. It is founded on the capabilities of High-Level Synthesis tools. By consequently working with data type-agnostic source codes, the RDAM brings significant simplifications to the fixed-point conversion of algorithms and the design of complex-valued architectures. Signal processing applications from the field of Compressed Sensing illustrate the efficacy of the RDAM in the context of multi-user wireless communications. For instance, a complex-valued digital architecture of Orthogonal Matching Pursuit with rank-1 updating has successfully been implemented and tested

    FPGA-Based Channel Coding Architectures for 5G Wireless Using High-Level Synthesis

    Get PDF
    We propose strategies to achieve a high-throughput FPGA architecture for quasi-cyclic low-density parity-check codes based on circulant-1 identity matrix construction. By splitting the node processing operation in the min-sum approximation algorithm, we achieve pipelining in the layered decoding schedule without utilizing additional hardware resources. High-level synthesis compilation is used to design and develop the architecture on the FPGA hardware platform. To validate this architecture, an IEEE 802.11n compliant 608ā€‰Mb/s decoder is implemented on the Xilinx Kintex-7 FPGA using the LabVIEW FPGA Compiler in the LabVIEW Communication System Design Suite. Architecture scalability was leveraged to accomplish a 2.48ā€‰Gb/s decoder on a single Xilinx Kintex-7 FPGA. Further, we present rapidly prototyped experimentation of an IEEE 802.16 compliant hybrid automatic repeat request system based on the efficient decoder architecture developed. In spite of the mixed nature of data processingā€”digital signal processing and finite-state machinesā€”LabVIEW FPGA Compiler significantly reduced time to explore the system parameter space and to optimize in terms of error performance and resource utilization. A 4x improvement in the system throughput, relative to a CPU-based implementation, was achieved to measure the error-rate performance of the system over large, realistic data sets using accelerated, in-hardware simulation

    Toward High-Performance Implementation of 5G SCMA Algorithms

    Get PDF
    International audienceThe recent evolution of mobile communication systems toward a 5G network is associated with the search for new types of non-orthogonal modulations such as Sparse Code Multiple Access (SCMA). Such modulations are proposed in response to demands for increasing the number of connected users. SCMA is a non-orthogonal multiple access technique that offers improved Bit Error Rate (BER) performance and higher spectral efficiency than other comparable techniques, but these improvements come at the cost of complex decoders. There are many challenges in designing near-optimum high throughput SCMA decoders. This paper explores means to enhance the performance of SCMA decoders. To achieve this goal, various improvements to the MPA algorithms are proposed. They notably aim at adapting SCMA decoding to the Single Instruction Multiple Data (SIMD) paradigm. An approximate modeling of noise is performed to reduce the complexity of floating-point calculations. The effects of Forward Error Corrections (FEC) such as polar, turbo and LDPC codes, as well as different ways of accessing memory and improving power efficiency of modified MPAs are investigated. The results show that the throughput of a SCMA decoder can be increased by 3.1 to 21 times when compared to the original MPA on different computing platforms using the suggested improvements

    Optimising algorithm and hardware for deep neural networks on FPGAs

    Get PDF
    This thesis proposes novel algorithm and hardware optimisation approaches to accelerate Deep Neural Networks (DNNs), including both Convolutional Neural Networks (CNNs) and Bayesian Neural Networks (BayesNNs). The first contribution of this thesis is to propose an adaptable and reconfigurable hardware design to accelerate CNNs. By analysing the computational patterns of different CNNs, a unified hardware architecture is proposed for both 2-Dimension and 3-Dimension CNNs. The accelerator is also designed with runtime adaptability, which adopts different parallelism strategies for different convolutional layers at runtime. The second contribution of this thesis is to propose a novel neural network architecture and hardware design co-optimisation approach, which improves the performance of CNNs at both algorithm and hardware levels. Our proposed three-phase co-design framework decouples network training from design space exploration, which significantly reduces the time-cost of the co-optimisation process. The third contribution of this thesis is to propose an algorithmic and hardware co-optimisation framework for accelerating BayesNNs. At the algorithmic level, three categories of structured sparsity are explored to reduce the computational complexity of BayesNNs. At the hardware level, we propose a novel hardware architecture with the aim of exploiting the structured sparsity for BayesNNs. Both algorithmic and hardware optimisations are jointly applied to push the performance limit.Open Acces

    Architectural Support for Medical Imaging

    Full text link
    Advancements in medical imaging research are continuously providing doctors with better diagnostic information, removing the need for unnecessary surgeries and increasing accuracy in predicting life-threatening conditions. However, newly developed techniques are currently limited by the capabilities of existing computer hardware, restricting them to expensive, custom-designed machines that only the largest hospital systems can afford or even worse, precluding them entirely. Many of these issues are due to existing hardware being ill-suited for these types of algorithms and not designed with medical imaging in mind. In this thesis we discuss our efforts to motivate and democratize architectural support for advanced medical imaging tasks with MIRAQLE, a medical image reconstruction benchmark suite. In particular, MIRAQLE focuses on advanced image reconstruction techniques for 3D ultrasound, low-dose X-ray CT, and dynamic MRI. For each imaging modality we provide a detailed background and parallel implementations to enable future hardware development. In addition to providing baseline algorithms for these workloads, we also develop a unique analysis tool that provides image quality feedback for each simulation. This allows hardware designers to explore acceptable image quality trade-offs in algorithm-hardware co-design, potentially allowing for even more efficient solutions than hardware innovations alone could provide. We also motivate the need for such tools by discussing Sonic Millip3De, our low-power, highly parallel hardware for 3D ultrasound. Using Sonic Millip3De, we illustrate the orders-of-magnitude power efficiency improvement that better medical imaging hardware can provide, especially when developed with a hardware-software co-design. We also show validation of the design using a scaled-down FPGA proof-of-concept and discuss our further refinement of the hardware to support a wider range of applications and produce higher frame rates. Overall, with this thesis we hope to enable application specific hardware support for the critical medical imaging tasks in MIRAQLE to make them practical for wide clinical use.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/137105/1/rsamp_1.pd

    Multiple Parallel Concatenated Gallager Codes: High Throughput Architecture Design and Implementation

    Get PDF
    The design of advanced wireless communication systems has been one of the most important research areas in recent years. High performance error correction schemes and high speed data services are at the heart of these systems. Due to the excellent performance of Low-Density Parity-Check (LDPC) codes, they are good candidates for many new wireless communication standards. However, complexity, latency scalability and flexibility remain a challenge. This thesis is concerned with investigating a new approach to coding and decoding LDPC codes based on Parallel Concatenated Gallager Code (PCGCs) using multiple constituent codes. These are a class of concatenated codes built from the direct parallel concatenation of LDPC codes without interleavers. They are characterized by a competitive BER performance while still maintaining the low complexity and flexibility attributes. New methods for encoding and decoding are presented together with BER simulation results showing the performance of these codes. Analysis in terms of the number of constituent codes is also carried out. Complexity analysis is performed and preliminary implementation results are also given based on a proposed high throughput architecture
    • ā€¦
    corecore