52 research outputs found

    Hardware and software optimization of fourier transform infrared spectrometry on hybrid-FPGAs

    Get PDF
    With the increasing complexity of today’s spacecrafts, there exists a concern that the on-board flight computer may be overburdened with various processing tasks. Currently available processors used by NASA are struggling to meet the requirements of scientific experiments [1, 2]. A new computational platform will soon be needed to contend with the increasing demands of future space missions. Recently developed hybrid field-programmable gate arrays (FPGA) offer the versatility of running diverse software applications on embedded processors while at the same time taking advantage of reconfigurable hardware resources, all on the same chip package. These tightly coupled HW/SW systems consume less power than general-purpose singleboard computers (SBC) and promise breakthrough performance previously impossible with traditional processors and reconfigurable devices. This thesis takes an existing floating-point intensive data processing algorithm, used for on-board spacecraft Fourier transform infrared (FTIR) spectrometry, ports it into the embedded PowerPC 405 (PPC405) processor, and evaluates system performance after applying different hardware and software optimizations and architectural configurations of the hybrid-FPGA. The hardware optimizations include Xilinx’s floating-point unit (FPU) for efficient single-precision floating-point calculations and a dedicated single-precision dot-product co-processor assembled from basic floating-point operator cores. The software optimizations include utilizing a non-ANSI single-precision math library as well as IBM’s PowerPC performance libraries recompiled for double-precision arithmetic only. The outcome of this thesis is a fully functional, optimized FTIR spectrometry algorithm implemented on a hybrid-FPGA. The computational and power performance of this system is evaluated and compared to a general-purpose SBC currently used for spacecraft data processing. Suggestions for future work, including a dual-processor concept, are given

    FPGA based technical solutions for high throughput data processing and encryption for 5G communication: A review

    Get PDF
    The field programmable gate array (FPGA) devices are ideal solutions for high-speed processing applications, given their flexibility, parallel processing capability, and power efficiency. In this review paper, at first, an overview of the key applications of FPGA-based platforms in 5G networks/systems is presented, exploiting the improved performances offered by such devices. FPGA-based implementations of cloud radio access network (C-RAN) accelerators, network function virtualization (NFV)-based network slicers, cognitive radio systems, and multiple input multiple output (MIMO) channel characterizers are the main considered applications that can benefit from the high processing rate, power efficiency and flexibility of FPGAs. Furthermore, the implementations of encryption/decryption algorithms by employing the Xilinx Zynq Ultrascale+MPSoC ZCU102 FPGA platform are discussed, and then we introduce our high-speed and lightweight implementation of the well-known AES-128 algorithm, developed on the same FPGA platform, and comparing it with similar solutions already published in the literature. The comparison results indicate that our AES-128 implementation enables efficient hardware usage for a given data-rate (up to 28.16 Gbit/s), resulting in higher efficiency (8.64 Mbps/slice) than other considered solutions. Finally, the applications of the ZCU102 platform for high-speed processing are explored, such as image and signal processing, visual recognition, and hardware resource management

    Hardware Synchronization for Embedded Multi-Core Processors

    Get PDF
    Abstract — Multi-core processors are about to conquer embedded systems — it is not the question of whether they are coming but how the architectures of the microcontrollers should look with respect to the strict requirements in the field. We present the step from one to multiple cores in this paper, establishing coherence and consistency for different types of shared memory by hardware means. Also support for point-to-point synchronization between the processor cores is realized implementing different hardware barriers. The practical examinations focus on the logical first step from single- to dual-core systems, using an FPGA-development board with two hard PowerPC processor cores. Best- and worst-case results, together with intensive benchmarking of all synchronization primitives implemented, show the expected superiority of the hardware solutions. It is also shown that dual-ported memory outperforms single-ported memory if the multiple cores use inherent parallelism by locking shared memory more intelligently using an address-sensitive method. I

    Golden-Finger and Back-Door: Two HW/SW Mechanisms for Accelerating Multicore Computer Systems

    Get PDF
    Continuously requirements of high-performance computing make the computer system adopt more processorswithin a system to improve the parallelism and throughput. Although multiple processing cores are implemented ina computer system, the complicated hardware communication mechanism between processors will decrease theperformance of overall system. Besides, the unsuitable process scheduling mechanism of conventional operatingsystem can not fully utilize the computation power of additional processors. Accordingly, this paper provides twomechanisms to overcome the above challenges by using hardware and software mechanisms, respectively. Insoftware aspect, we propose a tool, called Golden-Finger, to dynamically adjust the scheduling policy of the processscheduler in Linux. This software mechanism can improve the performance of the specified process by occupying aprocessor solely. In hardware aspect, we design an effective hardware mechanism, called Back-Door, tocommunicate two independent processors which can not be operated together, such as the dual PowerPC 405 coresin the Xilinx ML310 system. The experimental results reveal that the two mechanisms can obtain significantperformance enhancements

    Study of hardware and software optimizations of SPEA2 on hybrid FPGAs

    Get PDF
    Traditional radar technology consists of multiple platforms, each designed to process only a single mission objective, such as Ground Moving Target Indication (GMTI), Airborne Moving Target Indication (AMTI) or Synthetic Aperture Radar (SAR). This is no longer considered a cost effective solution, thus leading to the increased need for a single radar platform which can perform multiple radar missions. Many algorithms have been developed to specifically address multi-objective design problems. One such approach, the Strength Pareto Evolutionary Algorithm 2 (SPEA2), applies the concept of evolution through a Genetic Algorithm (GA) to the design of simultaneous orthogonal waveforms. The objectives of the various radar missions are often conflicting. The goal of SPEA2 is to find the best waveform suite in the Pareto sense. Preliminary results of this algorithm applied to a scaled down multi-objective mission scenario have been promising. One setback of the use of this algorithm is its abundant computational complexity. Even in a scaled down simulation, performance does not meet expectations. This thesis investigated a hardware and software optimization of SPEA2 applied to simultaneous multi-mission waveform design, using hybrid FPGAs. Hybrid FPGAs contain a combination of a single or multiple embedded processors and reconfigurable hardware. The algorithm was first implemented in C on a PC, then profiled and analyzed. The C code was translated to run on an embedded PowerPC 405 processing core on a Virtex4 FX (V4FX). The hardware fabric of the V4FX was utilized to offload the main bottleneck of the algorithm from the PowerPC 405 core to hardware for speedup, while various software optimizations were also implemented, in an effort to improve performance. Performance results from the V4FX implementation were not ideal. Thus, many suggestions for futur

    Modeling and automated synthesis of reconfigurable interfaces

    Get PDF
    Stefan IhmorPaderborn, Univ., Diss., 200

    FPGA structures for high speed and low overhead dynamic circuit specialization

    Get PDF
    A Field Programmable Gate Array (FPGA) is a programmable digital electronic chip. The FPGA does not come with a predefined function from the manufacturer; instead, the developer has to define its function through implementing a digital circuit on the FPGA resources. The functionality of the FPGA can be reprogrammed as desired and hence the name “field programmable”. FPGAs are useful in small volume digital electronic products as the design of a digital custom chip is expensive. Changing the FPGA (also called configuring it) is done by changing the configuration data (in the form of bitstreams) that defines the FPGA functionality. These bitstreams are stored in a memory of the FPGA called configuration memory. The SRAM cells of LookUp Tables (LUTs), Block Random Access Memories (BRAMs) and DSP blocks together form the configuration memory of an FPGA. The configuration data can be modified according to the user’s needs to implement the user-defined hardware. The simplest way to program the configuration memory is to download the bitstreams using a JTAG interface. However, modern techniques such as Partial Reconfiguration (PR) enable us to configure a part in the configuration memory with partial bitstreams during run-time. The reconfiguration is achieved by swapping in partial bitstreams into the configuration memory via a configuration interface called Internal Configuration Access Port (ICAP). The ICAP is a hardware primitive (macro) present in the FPGA used to access the configuration memory internally by an embedded processor. The reconfiguration technique adds flexibility to use specialized ci rcuits that are more compact and more efficient t han t heir b ulky c ounterparts. An example of such an implementation is the use of specialized multipliers instead of big generic multipliers in an FIR implementation with constant coefficients. To specialize these circuits and reconfigure during the run-time, researchers at the HES group proposed the novel technique called parameterized reconfiguration that can be used to efficiently and automatically implement Dynamic Circuit Specialization (DCS) that is built on top of the Partial Reconfiguration method. It uses the run-time reconfiguration technique that is tailored to implement a parameterized design. An application is said to be parameterized if some of its input values change much less frequently than the rest. These inputs are called parameters. Instead of implementing these parameters as regular inputs, in DCS these inputs are implemented as constants, and the application is optimized for the constants. For every change in parameter values, the design is re-optimized (specialized) during run-time and implemented by reconfiguring the optimized design for a new set of parameters. In DCS, the bitstreams of the parameterized design are expressed as Boolean functions of the parameters. For every infrequent change in parameters, a specialized FPGA configuration is generated by evaluating the corresponding Boolean functions, and the FPGA is reconfigured with the specialized configuration. A detailed study of overheads of DCS and providing suitable solutions with appropriate custom FPGA structures is the primary goal of the dissertation. I also suggest different improvements to the FPGA configuration memory architecture. After offering the custom FPGA structures, I investigated the role of DCS on FPGA overlays and the use of custom FPGA structures that help to reduce the overheads of DCS on FPGA overlays. By doing so, I hope I can convince the developer to use DCS (which now comes with minimal costs) in real-world applications. I start the investigations of overheads of DCS by implementing an adaptive FIR filter (using the DCS technique) on three different Xilinx FPGA platforms: Virtex-II Pro, Virtex-5, and Zynq-SoC. The study of how DCS behaves and what is its overhead in the evolution of the three FPGA platforms is the non-trivial basis to discover the costs of DCS. After that, I propose custom FPGA structures (reconfiguration controllers and reconfiguration drivers) to reduce the main overhead (reconfiguration time) of DCS. These structures not only reduce the reconfiguration time but also help curbing the power hungry part of the DCS system. After these chapters, I study the role of DCS on FPGA overlays. I investigate the effect of the proposed FPGA structures on Virtual-Coarse-Grained Reconfigurable Arrays (VCGRAs). I classify the VCGRA implementations into three types: the conventional VCGRA, partially parameterized VCGRA and fully parameterized VCGRA depending upon the level of parameterization. I have designed two variants of VCGRA grids for HPC image processing applications, namely, the MAC grid and Pixie. Finally, I try to tackle the reconfiguration time overhead at the hardware level of the FPGA by customizing the FPGA configuration memory architecture. In this part of my research, I propose to use a parallel memory structure to improve the reconfiguration time of DCS drastically. However, this improvement comes with a significant overhead of hardware resources which will need to be solved in future research on commercial FPGA configuration memory architectures

    Operating System Interfaces: Bridging the Gap Between CPU and FPGA Accelerators

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratoryGigascale Systems Research Center / C8559_SA4241-79952_UC-Berkele

    ROACH accelerated BLAST

    Get PDF
    Includes abstract.Includes bibliographical references (p. 115-118).Reconfigurable computing, in recent years, has been taking great strides in becoming part of mainstream computing largely due to the rapid growth in the size of FPGAs and their ability to adapt to certain complex applications efficiently. This dissertation investigates the reuse of application specific hardware developed for radio astronomy in accelerating a popular bioinformatics algorithm
    • …
    corecore