812 research outputs found
Design of an Adaptable Run-Time Reconfigurable Software-Defined Radio Processing Architecture
Processing power is a key technical challenge holding back the development of a high-performance software defined radio (SDR). Traditionally, SDR has utilized digital signal processors (DSPs), but increasingly complex algorithms, higher data rates, and multi-tasking needs have exceed the processing capabilities of modern DSPs. Reconfigurable computers, such as field-programmable gate arrays (FPGAs), are popular alternatives because of their performance gains over software for streaming data applications like SDR. However, FPGAs have not yet realized the ideal SDR because architectures have not fully utilized their partial reconfiguration (PR) capabilities to bring needed flexibility. A reconfigurable processor architecture is proposed that utilizes PR in reconfigurable computers to achieve a more sophisticated SDR. The proposed processor contains run-time swappable blocks whose parameters and interconnects are programmable. The architecture is analyzed for performance and flexibility and compared with available alternate technologies. For a sample QPSK algorithm, hardware performance gains of at least 44x are seen over modern desktop processors and DSPs while most of their flexibility and extensibility is maintained
Transformations of High-Level Synthesis Codes for High-Performance Computing
Specialized hardware architectures promise a major step in performance and
energy efficiency over the traditional load/store devices currently employed in
large scale computing systems. The adoption of high-level synthesis (HLS) from
languages such as C/C++ and OpenCL has greatly increased programmer
productivity when designing for such platforms. While this has enabled a wider
audience to target specialized hardware, the optimization principles known from
traditional software design are no longer sufficient to implement
high-performance codes. Fast and efficient codes for reconfigurable platforms
are thus still challenging to design. To alleviate this, we present a set of
optimizing transformations for HLS, targeting scalable and efficient
architectures for high-performance computing (HPC) applications. Our work
provides a toolbox for developers, where we systematically identify classes of
transformations, the characteristics of their effect on the HLS code and the
resulting hardware (e.g., increases data reuse or resource consumption), and
the objectives that each transformation can target (e.g., resolve interface
contention, or increase parallelism). We show how these can be used to
efficiently exploit pipelining, on-chip distributed fast memory, and on-chip
streaming dataflow, allowing for massively parallel architectures. To quantify
the effect of our transformations, we use them to optimize a set of
throughput-oriented FPGA kernels, demonstrating that our enhancements are
sufficient to scale up parallelism within the hardware constraints. With the
transformations covered, we hope to establish a common framework for
performance engineers, compiler developers, and hardware developers, to tap
into the performance potential offered by specialized hardware architectures
using HLS
Recommended from our members
Efficient FPGA implementation and power modelling of image and signal processing IP cores
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Field Programmable Gate Arrays (FPGAs) are the technology of choice in a number ofimage
and signal processing application areas such as consumer electronics, instrumentation,
medical data processing and avionics due to their reasonable energy consumption, high performance, security, low design-turnaround time and reconfigurability. Low power FPGA
devices are also emerging as competitive solutions for mobile and thermally constrained platforms. Most computationally intensive image and signal processing algorithms also consume a lot of power leading to a number of issues including reduced mobility, reliability concerns and increased design cost among others. Power dissipation has become one of the most important challenges, particularly for FPGAs. Addressing this problem requires optimisation and awareness at all levels in the design flow. The key achievements of the
work presented in this thesis are summarised here. Behavioural level optimisation strategies have been used for implementing matrix product and inner product through the use of mathematical techniques such as Distributed Arithmetic (DA) and its variations including offset binary coding, sparse factorisation and novel vector level transformations. Applications to test the impact of these algorithmic and arithmetic transformations include the fast Hadamard/Walsh transforms and Gaussian mixture models. Complete design space exploration has been performed on these cores, and where appropriate, they have been shown to clearly outperform comparable existing implementations. At the architectural level, strategies such as parallelism, pipelining and systolisation have been successfully applied for the design and optimisation of a number of
cores including colour space conversion, finite Radon transform, finite ridgelet transform and circular convolution. A pioneering study into the influence of supply voltage scaling for FPGA based designs, used in conjunction with performance enhancing strategies such as parallelism and pipelining has been performed. Initial results are very promising and indicated significant potential for future research in this area.
A key contribution of this work includes the development of a novel high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called Functional Level Power Analysis and Modelling (FLPAM). FLPAM
is scalable, platform independent and compares favourably with existing approaches. A hybrid, top-down design flow paradigm integrating FLPAM with commercially available design tools for systematic optimisation of IP cores has also been developed
Active FPGA Security through Decoy Circuits
Field Programmable Gate Arrays (FPGAs) based on Static Random Access Memory (SRAM) are vulnerable to tampering attacks such as readback and cloning attacks. Such attacks enable the reverse engineering of the design programmed into an FPGA. To counter such attacks, measures that protect the design with low performance penalties should be employed. This research proposes a method which employs the addition of active decoy circuits to protect SRAM FPGAs from reverse engineering. The effects of the protection method on security, execution time, power consumption, and FPGA resource usage are quantified. The method significantly increases the security of the design with only minor increases in execution time, power consumption, and resource usage. For the circuits used to characterize the method, security increased to more than one million times the original values, while execution time increased to at most 1.2 times, dynamic power consumption increased to at most two times, and look-up table usage increased to at most seven times the original values. These are reasonable penalties given the size and security of the modified circuits. The proposed design protection method also extends to FPGAs based on other technologies and to Application-Specific Integrated Circuits (ASICs). In addition to the design methodology proposed, a new classification of tampering attacks and countermeasures is presented
A Survey of Graph Pre-processing Methods: From Algorithmic to Hardware Perspectives
Graph-related applications have experienced significant growth in academia
and industry, driven by the powerful representation capabilities of graph.
However, efficiently executing these applications faces various challenges,
such as load imbalance, random memory access, etc. To address these challenges,
researchers have proposed various acceleration systems, including software
frameworks and hardware accelerators, all of which incorporate graph
pre-processing (GPP). GPP serves as a preparatory step before the formal
execution of applications, involving techniques such as sampling, reorder, etc.
However, GPP execution often remains overlooked, as the primary focus is
directed towards enhancing graph applications themselves. This oversight is
concerning, especially considering the explosive growth of real-world graph
data, where GPP becomes essential and even dominates system running overhead.
Furthermore, GPP methods exhibit significant variations across devices and
applications due to high customization. Unfortunately, no comprehensive work
systematically summarizes GPP. To address this gap and foster a better
understanding of GPP, we present a comprehensive survey dedicated to this area.
We propose a double-level taxonomy of GPP, considering both algorithmic and
hardware perspectives. Through listing relavent works, we illustrate our
taxonomy and conduct a thorough analysis and summary of diverse GPP techniques.
Lastly, we discuss challenges in GPP and potential future directions
Blind adaptive equalization for QAM signals: New algorithms and FPGA implementation.
Adaptive equalizers remove signal distortion attributed to intersymbol interference in band-limited channels. The tap coefficients of adaptive equalizers are time-varying and can be adapted using several methods. When these do not include the transmission of a training sequence, it is referred to as blind equalization. The radius-adjusted approach is a method to achieve blind equalizer tap adaptation based on the equalizer output radius for quadrature amplitude modulation (QAM) signals. Static circular contours are defined around an estimated symbol in a QAM constellation, which create regions that correspond to fixed step sizes and weighting factors. The equalizer tap adjustment consists of a linearly weighted sum of adaptation criteria that is scaled by a variable step size. This approach is the basis of two new algorithms: the radius-adjusted modified multitmodulus algorithm (RMMA) and the radius-adjusted multimodulus decision-directed algorithm (RMDA). An extension of the radius-adjusted approach is the selective update method, which is a computationally-efficient method for equalization. The selective update method employs a stop-and-go strategy based on the equalizer output radius to selectively update the equalizer tap coefficients, thereby, reducing the number of computations in steady-state operation. (Abstract shortened by UMI.) Source: Masters Abstracts International, Volume: 45-01, page: 0401. Thesis (M.A.Sc.)--University of Windsor (Canada), 2006
- …