6,782 research outputs found
Predictive control using an FPGA with application to aircraft control
Alternative and more efficient computational methods can extend the applicability of MPC to systems with tight real-time requirements. This paper presents a “system-on-a-chip” MPC system, implemented on a field programmable gate array (FPGA), consisting of a sparse structure-exploiting primal dual interior point (PDIP) QP solver for MPC reference tracking and a fast gradient QP solver for steady-state target calculation. A parallel reduced precision iterative solver is used to accelerate the solution of the set of linear equations forming the computational bottleneck of the PDIP algorithm. A numerical study of the effect of reducing the number of iterations highlights the effectiveness of the approach. The system is demonstrated with an FPGA-inthe-loop testbench controlling a nonlinear simulation of a large airliner. This study considers many more manipulated inputs than any previous FPGA-based MPC implementation to date, yet the implementation comfortably fits into a mid-range FPGA, and the controller compares well in terms of solution quality and latency to state-of-the-art QP solvers running on a standard PC
A Monolithic Time Stretcher for Precision Time Recording
Identifying light mesons which contain only up/down quarks (pions) from those
containing a strange quark (kaons) over the typical meter length scales of a
particle physics detector requires instrumentation capable of measuring flight
times with a resolution on the order of 20ps. In the last few years a large
number of inexpensive, multi-channel Time-to-Digital Converter (TDC) chips have
become available. These devices typically have timing resolution performance in
the hundreds of ps regime. A technique is presented that is a monolithic
version of ``time stretcher'' solution adopted for the Belle Time-Of-Flight
system to address this gap between resolution need and intrinsic multi-hit TDC
performance.Comment: 9 pages, 15 figures, minor corrections made, to appear as JINST_008
Combined Integer and Floating Point Multiplication Architecture(CIFM) for FPGAs and Its Reversible Logic Implementation
In this paper, the authors propose the idea of a combined integer and
floating point multiplier(CIFM) for FPGAs. The authors propose the replacement
of existing 18x18 dedicated multipliers in FPGAs with dedicated 24x24
multipliers designed with small 4x4 bit multipliers. It is also proposed that
for every dedicated 24x24 bit multiplier block designed with 4x4 bit
multipliers, four redundant 4x4 multiplier should be provided to enforce the
feature of self repairability (to recover from the faults). In the proposed
CIFM reconfigurability at run time is also provided resulting in low power. The
major source of motivation for providing the dedicated 24x24 bit multiplier
stems from the fact that single precision floating point multiplier requires
24x24 bit integer multiplier for mantissa multiplication. A reconfigurable,
self-repairable 24x24 bit multiplier (implemented with 4x4 bit multiply
modules) will ideally suit this purpose, making FPGAs more suitable for integer
as well floating point operations. A dedicated 4x4 bit multiplier is also
proposed in this paper. Moreover, in the recent years, reversible logic has
emerged as a promising technology having its applications in low power CMOS,
quantum computing, nanotechnology, and optical computing. It is not possible to
realize quantum computing without reversible logic. Thus, this paper also paper
provides the reversible logic implementation of the proposed CIFM. The
reversible CIFM designed and proposed here will form the basis of the
completely reversible FPGAs.Comment: Published in the proceedings of the The 49th IEEE International
Midwest Symposium on Circuits and Systems (MWSCAS 2006), Puerto Rico, August
2006. Nominated for the Student Paper Award(12 papers are nominated for
Student paper Award among all submissions
Transformations of High-Level Synthesis Codes for High-Performance Computing
Specialized hardware architectures promise a major step in performance and
energy efficiency over the traditional load/store devices currently employed in
large scale computing systems. The adoption of high-level synthesis (HLS) from
languages such as C/C++ and OpenCL has greatly increased programmer
productivity when designing for such platforms. While this has enabled a wider
audience to target specialized hardware, the optimization principles known from
traditional software design are no longer sufficient to implement
high-performance codes. Fast and efficient codes for reconfigurable platforms
are thus still challenging to design. To alleviate this, we present a set of
optimizing transformations for HLS, targeting scalable and efficient
architectures for high-performance computing (HPC) applications. Our work
provides a toolbox for developers, where we systematically identify classes of
transformations, the characteristics of their effect on the HLS code and the
resulting hardware (e.g., increases data reuse or resource consumption), and
the objectives that each transformation can target (e.g., resolve interface
contention, or increase parallelism). We show how these can be used to
efficiently exploit pipelining, on-chip distributed fast memory, and on-chip
streaming dataflow, allowing for massively parallel architectures. To quantify
the effect of our transformations, we use them to optimize a set of
throughput-oriented FPGA kernels, demonstrating that our enhancements are
sufficient to scale up parallelism within the hardware constraints. With the
transformations covered, we hope to establish a common framework for
performance engineers, compiler developers, and hardware developers, to tap
into the performance potential offered by specialized hardware architectures
using HLS
Floating-Point Matrix Product on FPGA
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.---- Copyright IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE
Normalizing or not normalizing? An open question for floating-point arithmetic in embedded systems
Emerging embedded applications lack of a specific standard when they require floating-point arithmetic. In this situation they use the IEEE-754 standard or ad hoc variations of it. However, this standard was not designed for this purpose. This paper aims to open a debate to define a new extension of the standard to cover embedded applications. In this work, we only focus on the impact of not performing normalization. We show how eliminating the condition of normalized numbers, implementation costs can be dramatically reduced, at the expense of a moderate loss of accuracy. Several architectures to implement addition and multiplication for non-normalized numbers are proposed and analyzed. We show that a combined architecture (adder-multiplier) can halve the area and power consumption of its counterpart IEEE-754 architecture. This saving comes at the cost of reducing an average of about 10 dBs the Signal-to-Noise Ratio for the tested algorithms. We think these results should encourage researchers to perform further investigation in this issue.Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
Fast HUB Floating-point Adder for FPGA
Several previous publications have shown the area
and delay reduction when implementing real number computation
using HUB formats for both floating-point and fixed-point.
In this paper, we present a HUB floating-point adder for FPGA
which greatly improves the speed of previous proposed HUB
designs for these devices. Our architecture is based on the double
path technique which reduces the execution time since each
path works in parallel. We also deal with the implementation of
unbiased rounding in the proposed adder. Experimental results
are presented showing the goodness of the new HUB adder for
FPGA.TIN2016- 80920-R, JA2012 P12-TIC-1692, JA2012 P12-TIC-147
Computer Architectures to Close the Loop in Real-time Optimization
© 2015 IEEE.Many modern control, automation, signal processing and machine learning applications rely on solving a sequence of optimization problems, which are updated with measurements of a real system that evolves in time. The solutions of each of these optimization problems are then used to make decisions, which may be followed by changing some parameters of the physical system, thereby resulting in a feedback loop between the computing and the physical system. Real-time optimization is not the same as fast optimization, due to the fact that the computation is affected by an uncertain system that evolves in time. The suitability of a design should therefore not be judged from the optimality of a single optimization problem, but based on the evolution of the entire cyber-physical system. The algorithms and hardware used for solving a single optimization problem in the office might therefore be far from ideal when solving a sequence of real-time optimization problems. Instead of there being a single, optimal design, one has to trade-off a number of objectives, including performance, robustness, energy usage, size and cost. We therefore provide here a tutorial introduction to some of the questions and implementation issues that arise in real-time optimization applications. We will concentrate on some of the decisions that have to be made when designing the computing architecture and algorithm and argue that the choice of one informs the other
- …