31 research outputs found

    Digital VLSI Implementation of Piecewise-Affine Controllers Based on Lattice Approach

    Get PDF
    This paper presents a small, fast, low-power consumption solution for piecewise-affine (PWA) controllers. To achieve this goal, a digital architecture for very-large-scale integration (VLSI) circuits is proposed. The implementation is based on the simplest lattice form, which eliminates the point location problem of other PWA representations and is able to provide continuous PWA controllers defined over generic partitions of the input domain. The architecture is parameterized in terms of number of inputs, outputs, signal resolution, and features of the controller to be generated. The design flows for field-programmable gate arrays and application-specific integrated circuits are detailed. Several application examples of explicit model predictive controllers (such as an adaptive cruise control and the control of a buck-boost dc-dc converter) are included to illustrate the performance of the VLSI solution obtained with the proposed lattice-based architecture

    Digital VLSI Implementation of Piecewise-Affine Controllers Based on Lattice Approach

    Get PDF
    This paper presents a small, fast, low-power consumption solution for piecewise-affine (PWA) controllers. To achieve this goal, a digital architecture for very-large-scale integration (VLSI) circuits is proposed. The implementation is based on the simplest lattice form, which eliminates the point location problem of other PWA representations and is able to provide continuous PWA controllers defined over generic partitions of the input domain. The architecture is parameterized in terms of number of inputs, outputs, signal resolution, and features of the controller to be generated. The design flows for field-programmable gate arrays and application-specific integrated circuits are detailed. Several application examples of explicit model predictive controllers (such as an adaptive cruise control and the control of a buck-boost dc-dc converter) are included to illustrate the performance of the VLSI solution obtained with the proposed lattice-based architecture.Peer reviewe

    High-Performance Architecture for Binary-Tree-Based Finite State Machines

    Get PDF
    A binary-tree-based finite state machine (BT-FSM) is a state machine with a 1-bit input signal whose state transition graph is a binary tree. BT-FSMs are useful in those application areas where searching in a binary tree is required, such as computer networks, compression, automatic control, or cryptography. This paper presents a new architecture for implementing BT-FSMs which is based on the model finite virtual state machine (FVSM). The proposed architecture has been compared with the general FVSM and conventional approaches by using both synthetic test benches and very large BT-FSMs obtained from a real application. In synthetic test benches, the average speed improvement of the proposed architecture respect to the best results of the other approaches achieves 41% (there are some cases in which the speed is more than double). In the case of the real application, the average speed improvement achieves 155%

    Improved Generation of Identifiers, Secret Keys, and Random Numbers From SRAMs

    Get PDF
    This paper presents a method to simultaneously improve the quality of the identifiers, secret keys, and random numbers that can be generated from the start-up values of standard static random access memories (SRAMs). The method is based on classifying memory cells after evaluating their start-up values at multiple measurements in a registration phase. The registration can be done without unplugging the device from its application context, and with no need for a complex laboratory setup. The method has been validated experimentally with standard low-power SRAM modules in two different application specific integrated circuits (ASICs) fabricated with the 90-nm TSMC technology. The results show that with a simple registration the length of the identifiers can be reduced by 45%, the worst case bit error probability (which defines the complexity of the error correcting code needed to recover a secret key) can be reduced by 64%, and the worst case minimum entropy value is improved, thus reducing the number of bits that have to be processed to obtain full entropy by 81%. The method can be applied to standard digital designs by controlling the external power supply to the SRAM using software or by incorporating simple circuitry in the design. In the latter case, a module for implementing the method in an ASIC designed in the 90-nm TSMC technology occupies an active area of 42, $025~mu text{m}^{mathrm {mathbf {2}}}

    Loop Transformations for the Optimized Generation of Reconfigurable Hardware

    Get PDF
    Current high-level design environments offer little support to implement data-intensive applications on heterogeneous-memory systems; they rather focus on parallelism. This thesis addresses the memory hierarchy problem to high-level transformations of loop structures. The composition of long transformation sequences by combining shorter subsequences is studied together with the influence of the order of applying transformation steps. Several methods are presented to estimate bounds on Ehrhart quasi-polynomials, which can be used to statically evaluate program properties, such as memory usage. Since loop transformations not only influence the data access pattern but also the control complexity we present a hardware loop controller architecture which supports hardware generation from the polyhedral representation used for loop transformations. The techniques are demonstrated by the semi-automatic generation of an FPGA implementation of an inverse discrete wavelet transform

    Accelerating Reconfigurable Financial Computing

    Get PDF
    This thesis proposes novel approaches to the design, optimisation, and management of reconfigurable computer accelerators for financial computing. There are three contributions. First, we propose novel reconfigurable designs for derivative pricing using both Monte-Carlo and quadrature methods. Such designs involve exploring techniques such as control variate optimisation for Monte-Carlo, and multi-dimensional analysis for quadrature methods. Significant speedups and energy savings are achieved using our Field-Programmable Gate Array (FPGA) designs over both Central Processing Unit (CPU) and Graphical Processing Unit (GPU) designs. Second, we propose a framework for distributing computing tasks on multi-accelerator heterogeneous clusters. In this framework, different computational devices including FPGAs, GPUs and CPUs work collaboratively on the same financial problem based on a dynamic scheduling policy. The trade-off in speed and in energy consumption of different accelerator allocations is investigated. Third, we propose a mixed precision methodology for optimising Monte-Carlo designs, and a reduced precision methodology for optimising quadrature designs. These methodologies enable us to optimise throughput of reconfigurable designs by using datapaths with minimised precision, while maintaining the same accuracy of the results as in the original designs

    High-level automation of custom hardware design for high-performance computing

    Get PDF
    This dissertation focuses on efficient generation of custom processors from high-level language descriptions. Our work exploits compiler-based optimizations and transformations in tandem with high-level synthesis (HLS) to build high-performance custom processors. The goal is to offer a common multiplatform high-abstraction programming interface for heterogeneous compute systems where the benefits of custom reconfigurable (or fixed) processors can be exploited by the application developers. The research presented in this dissertation supports the following thesis: In an increasingly heterogeneous compute environment it is important to leverage the compute capabilities of each heterogeneous processor efficiently. In the case of FPGA and ASIC accelerators this can be achieved through HLS-based flows that (i) extract parallelism at coarser than basic block granularities, (ii) leverage common high-level parallel programming languages, and (iii) employ high-level source-to-source transformations to generate high-throughput custom processors. First, we propose a novel HLS flow that extracts instruction level parallelism beyond the boundary of basic blocks from C code. Subsequently, we describe FCUDA, an HLS-based framework for mapping fine-grained and coarse-grained parallelism from parallel CUDA kernels onto spatial parallelism. FCUDA provides a common programming model for acceleration on heterogeneous devices (i.e. GPUs and FPGAs). Moreover, the FCUDA framework balances multilevel granularity parallelism synthesis using efficient techniques that leverage fast and accurate estimation models (i.e. do not rely on lengthy physical implementation tools). Finally, we describe an advanced source-to-source transformation framework for throughput-driven parallelism synthesis (TDPS), which appropriately restructures CUDA kernel code to maximize throughput on FPGA devices. We have integrated the TDPS framework into the FCUDA flow to enable automatic performance porting of CUDA kernels designed for the GPU architecture onto the FPGA architecture
    corecore