8,491 research outputs found

    Computer Architectures to Close the Loop in Real-time Optimization

    Get PDF
    © 2015 IEEE.Many modern control, automation, signal processing and machine learning applications rely on solving a sequence of optimization problems, which are updated with measurements of a real system that evolves in time. The solutions of each of these optimization problems are then used to make decisions, which may be followed by changing some parameters of the physical system, thereby resulting in a feedback loop between the computing and the physical system. Real-time optimization is not the same as fast optimization, due to the fact that the computation is affected by an uncertain system that evolves in time. The suitability of a design should therefore not be judged from the optimality of a single optimization problem, but based on the evolution of the entire cyber-physical system. The algorithms and hardware used for solving a single optimization problem in the office might therefore be far from ideal when solving a sequence of real-time optimization problems. Instead of there being a single, optimal design, one has to trade-off a number of objectives, including performance, robustness, energy usage, size and cost. We therefore provide here a tutorial introduction to some of the questions and implementation issues that arise in real-time optimization applications. We will concentrate on some of the decisions that have to be made when designing the computing architecture and algorithm and argue that the choice of one informs the other

    A framework for FPGA functional units in high performance computing

    Get PDF
    FPGAs make it practical to speed up a program by defining hardware functional units that perform calculations faster than can be achieved in software. Specialised digital circuits avoid the overhead of executing sequences of instructions, and they make available the massive parallelism of the components. The FPGA operates as a coprocessor controlled by a conventional computer. An application that combines software with hardware in this way needs an interface between a communications port to the processor and the signals connected to the functional units. We present a framework that supports the design of such systems. The framework consists of a generic controller circuit defined in VHDL that can be configured by the user according to the needs of the functional units and the I/O channel. The controller contains a register file and a pipelined programmable register transfer machine, and it supports the design of both stateless and stateful functional units. Two examples are described: the implementation of a set of basic stateless arithmetic functional units, and the implementation of a stateful algorithm that exploits circuit parallelism

    A field programmable gate array based modular motion control platform

    Get PDF
    The expectations from motion control systems have been rising day by day. As the systems become more complex, conventional motion control systems can not achieve to meet all the specifications with optimized results. This creates the necessity of fundamental changes in the infrastructure of the system. Field programmable gate array (FPGA) technology enables the reconfiguration of the digital hardware, thus dissolving the necessity of infrastructural changes for minor manipulations in the hardware even if the system is deployed. An FPGA based hardware system shrinks the size of the hardware hence the cost. FPGAs also provide better power ratings for the systems as well as a more reliable system with improved performance. As a trade off, the development is rather more difficult than software based systems, which also affects the research and development time of the overall system. In this paper a level of abstraction is introduced in order to diminish the requirement of advanced hardware description language (HDL) knowledge for implementing motion control systems thoroughly on an FPGA. The intellectual property library consists of synthesizable hardware modules specifically implemented for motion control purposes. Other parts of a motion control system, like user interface and trajectory generation, are implemented as software functions in order to protect the modularity of the system. There are also several external hardware designs for interfacing and driving various types of actuators

    PGPG: An Automatic Generator of Pipeline Design for Programmable GRAPE Systems

    Get PDF
    We have developed PGPG (Pipeline Generator for Programmable GRAPE), a software which generates the low-level design of the pipeline processor and communication software for FPGA-based computing engines (FBCEs). An FBCE typically consists of one or multiple FPGA (Field-Programmable Gate Array) chips and local memory. Here, the term "Field-Programmable" means that one can rewrite the logic implemented to the chip after the hardware is completed, and therefore a single FBCE can be used for calculation of various functions, for example pipeline processors for gravity, SPH interaction, or image processing. The main problem with FBCEs is that the user need to develop the detailed hardware design for the processor to be implemented to FPGA chips. In addition, she or he has to write the control logic for the processor, communication and data conversion library on the host processor, and application program which uses the developed processor. These require detailed knowledge of hardware design, a hardware description language such as VHDL, the operating system and the application, and amount of human work is huge. A relatively simple design would require 1 person-year or more. The PGPG software generates all necessary design descriptions, except for the application software itself, from a high-level design description of the pipeline processor in the PGPG language. The PGPG language is a simple language, specialized to the description of pipeline processors. Thus, the design of pipeline processor in PGPG language is much easier than the traditional design. For real applications such as the pipeline for gravitational interaction, the pipeline processor generated by PGPG achieved the performance similar to that of hand-written code. In this paper we present a detailed description of PGPG version 1.0.Comment: 24 pages, 6 figures, accepted PASJ 2005 July 2

    A versatile sensor interface for programmable vision systems-on-chip

    Get PDF
    This paper describes an optical sensor interface designed for a programmable mixed-signal vision chip. This chip has been designed and manufactured in a standard 0.35μm n-well CMOS technology with one poly layer and five metal layers. It contains a digital shell for control and data interchange, and a central array of 128 × 128 identical cells, each cell corresponding to a pixel. Die size is 11.885 × 12.230mm2 and cell size is 75.7μm × 73.3μm. Each cell contains 198 transistors dedicated to functions like processing, storage, and sensing. The system is oriented to real-time, single-chip image acquisition and processing. Since each pixel performs the basic functions of sensing, processing and storage, data transferences are fully parallel (image-wide). The programmability of the processing functions enables the realization of complex image processing functions based on the sequential application of simpler operations. This paper provides a general overview of the system architecture and functionality, with special emphasis on the optical interface.European Commission IST-1999-19007Office of Naval Research (USA) N00014021088

    Application-Specific Number Representation

    No full text
    Reconfigurable devices, such as Field Programmable Gate Arrays (FPGAs), enable application- specific number representations. Well-known number formats include fixed-point, floating- point, logarithmic number system (LNS), and residue number system (RNS). Such different number representations lead to different arithmetic designs and error behaviours, thus produc- ing implementations with different performance, accuracy, and cost. To investigate the design options in number representations, the first part of this thesis presents a platform that enables automated exploration of the number representation design space. The second part of the thesis shows case studies that optimise the designs for area, latency or throughput from the perspective of number representations. Automated design space exploration in the first part addresses the following two major issues: ² Automation requires arithmetic unit generation. This thesis provides optimised arithmetic library generators for logarithmic and residue arithmetic units, which support a wide range of bit widths and achieve significant improvement over previous designs. ² Generation of arithmetic units requires specifying the bit widths for each variable. This thesis describes an automatic bit-width optimisation tool called R-Tool, which combines dynamic and static analysis methods, and supports different number systems (fixed-point, floating-point, and LNS numbers). Putting it all together, the second part explores the effects of application-specific number representation on practical benchmarks, such as radiative Monte Carlo simulation, and seismic imaging computations. Experimental results show that customising the number representations brings benefits to hardware implementations: by selecting a more appropriate number format, we can reduce the area cost by up to 73.5% and improve the throughput by 14.2% to 34.1%; by performing the bit-width optimisation, we can further reduce the area cost by 9.7% to 17.3%. On the performance side, hardware implementations with customised number formats achieve 5 to potentially over 40 times speedup over software implementations

    Pipeline-Based Power Reduction in FPGA Applications

    Get PDF
    This paper shows how temporal parallelism has an important role in the power dissipation reduction in the FPGA field. Glitches propagation is blocked by the flip-flops or registers in the pipeline. Several multiplication structures are implemented over modern FPGAs, StratixII and Virtex4, comparing their results with and without pipeline and hardware duplication

    Power-Aware Design Methodologies for FPGA-Based Implementation of Video Processing Systems

    Get PDF
    The increasing capacity and capabilities of FPGA devices in recent years provide an attractive option for performance-hungry applications in the image and video processing domain. FPGA devices are often used as implementation platforms for image and video processing algorithms for real-time applications due to their programmable structure that can exploit inherent spatial and temporal parallelism. While performance and area remain as two main design criteria, power consumption has become an important design goal especially for mobile devices. Reduction in power consumption can be achieved by reducing the supply voltage, capacitances, clock frequency and switching activities in a circuit. Switching activities can be reduced by architectural optimization of the processing cores such as adders, multipliers, multiply and accumulators (MACS), etc. This dissertation research focuses on reducing the switching activities in digital circuits by considering data dependencies in bit level, word level and block level neighborhoods in a video frame. The bit level data neighborhood dependency consideration for power reduction is illustrated in the design of pipelined array, Booth and log-based multipliers. For an array multiplier, operands of the multipliers are partitioned into higher and lower parts so that the probability of the higher order parts being zero or one increases. The gating technique for the pipelined approach deactivates part(s) of the multiplier when the above special values are detected. For the Booth multiplier, the partitioning and gating technique is integrated into the Booth recoding scheme. In addition, a delay correction strategy is developed for the Booth multiplier to reduce the switching activities of the sign extension part in the partial products. A novel architecture design for the computation of log and inverse-log functions for the reduction of power consumption in arithmetic circuits is also presented. This also utilizes the proposed partitioning and gating technique for further dynamic power reduction by reducing the switching activities. The word level and block level data dependencies for reducing the dynamic power consumption are illustrated by presenting the design of a 2-D convolution architecture. Here the similarities of the neighboring pixels in window-based operations of image and video processing algorithms are considered for reduced switching activities. A partitioning and detection mechanism is developed to deactivate the parallel architecture for window-based operations if higher order parts of the pixel values are the same. A neighborhood dependent approach (NDA) is incorporated with different window buffering schemes. Consideration of the symmetry property in filter kernels is also applied with the NDA method for further reduction of switching activities. The proposed design methodologies are implemented and evaluated in a FPGA environment. It is observed that the dynamic power consumption in FPGA-based circuit implementations is significantly reduced in bit level, data level and block level architectures when compared to state-of-the-art design techniques. A specific application for the design of a real-time video processing system incorporating the proposed design methodologies for low power consumption is also presented. An image enhancement application is considered and the proposed partitioning and gating, and NDA methods are utilized in the design of the enhancement system. Experimental results show that the proposed multi-level power aware methodology achieves considerable power reduction. Research work is progressing In utilizing the data dependencies in subsequent frames in a video stream for the reduction of circuit switching activities and thereby the dynamic power consumption
    corecore