69 research outputs found

    Low power techniques and architectures for multicarrier wireless receivers

    Get PDF

    Study of hardware and software optimizations of SPEA2 on hybrid FPGAs

    Get PDF
    Traditional radar technology consists of multiple platforms, each designed to process only a single mission objective, such as Ground Moving Target Indication (GMTI), Airborne Moving Target Indication (AMTI) or Synthetic Aperture Radar (SAR). This is no longer considered a cost effective solution, thus leading to the increased need for a single radar platform which can perform multiple radar missions. Many algorithms have been developed to specifically address multi-objective design problems. One such approach, the Strength Pareto Evolutionary Algorithm 2 (SPEA2), applies the concept of evolution through a Genetic Algorithm (GA) to the design of simultaneous orthogonal waveforms. The objectives of the various radar missions are often conflicting. The goal of SPEA2 is to find the best waveform suite in the Pareto sense. Preliminary results of this algorithm applied to a scaled down multi-objective mission scenario have been promising. One setback of the use of this algorithm is its abundant computational complexity. Even in a scaled down simulation, performance does not meet expectations. This thesis investigated a hardware and software optimization of SPEA2 applied to simultaneous multi-mission waveform design, using hybrid FPGAs. Hybrid FPGAs contain a combination of a single or multiple embedded processors and reconfigurable hardware. The algorithm was first implemented in C on a PC, then profiled and analyzed. The C code was translated to run on an embedded PowerPC 405 processing core on a Virtex4 FX (V4FX). The hardware fabric of the V4FX was utilized to offload the main bottleneck of the algorithm from the PowerPC 405 core to hardware for speedup, while various software optimizations were also implemented, in an effort to improve performance. Performance results from the V4FX implementation were not ideal. Thus, many suggestions for futur

    Extensible FlexRay communication controller for FPGA-based automotive systems

    Get PDF
    Modern vehicles incorporate an increasing number of distributed compute nodes, resulting in the need for faster and more reliable in-vehicle networks. Time-triggered protocols such as FlexRay have been gaining ground as the standard for high-speed reliable communications in the automotive industry, marking a shift away from the event-triggered medium access used in controller area networks (CANs). These new standards enable the higher levels of determinism and reliability demanded from next-generation safety-critical applications. Advanced applications can benefit from tight coupling of the embedded computing units with the communication interface, thereby providing functionality beyond the FlexRay standard. Such an approach is highly suited to implementation on reconfigurable architectures. This paper describes a field-programmable gate array (FPGA)-based communication controller (CC) that features configurable extensions to provide functionality that is unavailable with standard implementations or off-the-shelf devices. It is implemented and verified on a Xilinx Spartan 6 FPGA, integrated with both a logic-based hardware ECU and a fully fledged processor-based electronic control unit (ECU). Results show that the platform-centric implementation generates a highly efficient core in terms of power, performance, and resource utilization. We demonstrate that the flexible extensions help enable advanced applications that integrate features such as fault tolerance, timeliness, and security, with practical case studies. This tight integration between the controller, computational functions, and flexible extensions on the controller enables enhancements that open the door for exciting applications in future vehicles

    Hardware compilation of deep neural networks: an overview

    Get PDF
    Deploying a deep neural network model on a reconfigurable platform, such as an FPGA, is challenging due to the enormous design spaces of both network models and hardware design. A neural network model has various layer types, connection patterns and data representations, and the corresponding implementation can be customised with different architectural and modular parameters. Rather than manually exploring this design space, it is more effective to automate optimisation throughout an end-to-end compilation process. This paper provides an overview of recent literature proposing novel approaches to achieve this aim. We organise materials to mirror a typical compilation flow: front end, platform-independent optimisation and back end. Design templates for neural network accelerators are studied with a specific focus on their derivation methodologies. We also review previous work on network compilation and optimisation for other hardware platforms to gain inspiration regarding FPGA implementation. Finally, we propose some future directions for related research

    Multi-core architectures with coarse-grained dynamically reconfigurable processors for broadband wireless access technologies

    Get PDF
    Broadband Wireless Access technologies have significant market potential, especially the WiMAX protocol which can deliver data rates of tens of Mbps. Strong demand for high performance WiMAX solutions is forcing designers to seek help from multi-core processors that offer competitive advantages in terms of all performance metrics, such as speed, power and area. Through the provision of a degree of flexibility similar to that of a DSP and performance and power consumption advantages approaching that of an ASIC, coarse-grained dynamically reconfigurable processors are proving to be strong candidates for processing cores used in future high performance multi-core processor systems. This thesis investigates multi-core architectures with a newly emerging dynamically reconfigurable processor – RICA, targeting WiMAX physical layer applications. A novel master-slave multi-core architecture is proposed, using RICA processing cores. A SystemC based simulator, called MRPSIM, is devised to model this multi-core architecture. This simulator provides fast simulation speed and timing accuracy, offers flexible architectural options to configure the multi-core architecture, and enables the analysis and investigation of multi-core architectures. Meanwhile a profiling-driven mapping methodology is developed to partition the WiMAX application into multiple tasks as well as schedule and map these tasks onto the multi-core architecture, aiming to reduce the overall system execution time. Both the MRPSIM simulator and the mapping methodology are seamlessly integrated with the existing RICA tool flow. Based on the proposed master-slave multi-core architecture, a series of diverse homogeneous and heterogeneous multi-core solutions are designed for different fixed WiMAX physical layer profiles. Implemented in ANSI C and executed on the MRPSIM simulator, these multi-core solutions contain different numbers of cores, combine various memory architectures and task partitioning schemes, and deliver high throughputs at relatively low area costs. Meanwhile a design space exploration methodology is developed to search the design space for multi-core systems to find suitable solutions under certain system constraints. Finally, laying a foundation for future multithreading exploration on the proposed multi-core architecture, this thesis investigates the porting of a real-time operating system – Micro C/OS-II to a single RICA processor. A multitasking version of WiMAX is implemented on a single RICA processor with the operating system support

    Energy efficient hardware acceleration of multimedia processing tools

    Get PDF
    The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings

    Mapping Framework for Heterogeneous Reconfigurable Architectures:Combining Temporal Partitioning and Multiprocessor Scheduling

    Get PDF

    Methodology for complex dataflow application development

    Get PDF
    This thesis addresses problems inherent to the development of complex applications for reconfig- urable systems. Many projects fail to complete or take much longer than originally estimated by relying on traditional iterative software development processes typically used with conventional computers. Even though designer productivity can be increased by abstract programming and execution models, e.g., dataflow, development methodologies considering the specific properties of reconfigurable systems do not exist. The first contribution of this thesis is a design methodology to facilitate systematic develop- ment of complex applications using reconfigurable hardware in the context of High-Performance Computing (HPC). The proposed methodology is built upon a careful analysis of the original application, a software model of the intended hardware system, an analytical prediction of performance and on-chip area usage, and an iterative architectural refinement to resolve identi- fied bottlenecks before writing a single line of code targeting the reconfigurable hardware. It is successfully validated using two real applications and both achieve state-of-the-art performance. The second contribution extends this methodology to provide portability between devices in two steps. First, additional tool support for contemporary multi-die Field-Programmable Gate Arrays (FPGAs) is developed. An algorithm to automatically map logical memories to hetero- geneous physical memories with special attention to die boundaries is proposed. As a result, only the proposed algorithm managed to successfully place and route all designs used in the evaluation while the second-best algorithm failed on one third of all large applications. Second, best practices for performance portability between different FPGA devices are collected and evaluated on a financial use case, showing efficient resource usage on five different platforms. The third contribution applies the extended methodology to a real, highly demanding emerging application from the radiotherapy domain. A Monte-Carlo based simulation of dose accumu- lation in human tissue is accelerated using the proposed methodology to meet the real time requirements of adaptive radiotherapy.Open Acces

    Evolvable hardware platform for fault-tolerant reconfigurable sensor electronics

    Get PDF
    corecore