622 research outputs found

    Simultaneous slack budgeting and retiming for synchronous circuits optimization

    Full text link
    Abstract- With the challenges of growing functionality and scaling chip size, the possible performance improvements should be considered in the earlier IC design stages, which gives more freedom to the later optimization. Potential slack as an effective metric of possible performance improvements is considered in this work which, as far as we known, is the first work that maximizes the potential slack by retiming for synchronous sequential circuit. A simultaneous slack budgeting and incremental retiming algorithm is proposed for maximizing potential slack. The overall slack budget is optimized by relocating the FFs iteratively with the MIS-based slack estimation. Compared with the potential slack of a well-known min-period retiming, our algorithm improves potential slack averagely 19.6 % without degrading the circuit performance in reasonable runtime. Furthermore, at the expense of a small amount of timing performance, 0.52 % and 2.08%, the potential slack is increased averagely by 19.89 % and 28.16 % separately, which give a hint of the tradeoff between the timing performance and the slack budget.

    Fast self-reconfigurable embedded system on Spartan-3

    Get PDF
    Many image-processing algorithms require several stages to be processed that cannot be resolved by embedded microprocessors in a reasonable time, due to their high-computational cost. A set of dedicated coprocessors can accelerate the resolution of these algorithms, alt hough the main drawback is the area needed for their implementation. The main advantage of a reconfigurable system is that several coprocessors designed to perform different operations can be mapped on the same area in a time-multiplexed way. This work presents the architecture of an embedded system composed of a microprocessor and a run-time reconfigurable coprocessor, mapped on Spartan-3, the low-cost family of Xilinx FPGAs. Designing reconfigurable systems on Spartan-3 requires much design effort, since unlike higher cost families of Xilinx FPGAs, this device does not officially support partial reconfiguration. In order to overcome this drawback, the paper also describes the main steps used in the design flow to obtain a successful design. The main goal of the presented architecture is to reduce the coprocessor reconfiguration time, as well as accelerate image-processing algorithms. The experimental results demonstrate significant improvement in both objectives. The reconfiguration rate nearly achieves 320 Mb/s which is far superior to th e previous related works.Peer ReviewedPostprint (published version

    Lexicographic path searches for FPGA routing

    Full text link
    This dissertation reports on studies of the application of lexicographic graph searches to solve problems in FPGA detailed routing. Our contributions include the derivation of iteration limits for scalar implementations of negotiation congestion for standard floating point types and the identification of pathological cases for path choice. In the study of the routability-driven detailed FPGA routing problem, we show universal detailed routability is NP-complete based on a related proof by Lee and Wong. We describe the design of a lexicographic composition operator of totally-ordered monoids as path cost metrics and show its optimality under an adapted A* search. Our new router, CornNC, based on lexicographic composition of congestion and wirelength, established a new minimum track count for the FPGA Place and Route Challenge. For the problem of long-path timing-driven FPGA detailed routing, we show that long-path budgeted detailed routability is NP-complete by reduction to universal detailed routability. We generalise the lexicographic composition to any finite length and verify its optimality under A* search. The application of the timing budget solution of Ghiasi et al. is used to solve the long-path timing budget problem for FPGA connections. Our delay-clamped spiral lexicographic composition design, SpiralRoute, ensures connection based budgets are always met, thus achieves timing closure when it successfully routes. For 113 test routing instances derived from standard benchmarks, SpiralRoute found 13 routable instances with timing closure that were unroutable by a scalar negotiated congestion router and achieved timing closure in another 27 cases when the scalar router did not, at the expense of increased runtime. We also study techniques to improve SpiralRoute runtimes, including a data structure of a trie augmented by data stacks for minimum element retrieval, and the technique of step tomonoid elimination in reducing the retrieval depth in a trie of stacks structure

    Efficient reconfigurable architectures for 3D medical image compression

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Recently, the more widespread use of three-dimensional (3-D) imaging modalities, such as magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET), and ultrasound (US) have generated a massive amount of volumetric data. These have provided an impetus to the development of other applications, in particular telemedicine and teleradiology. In these fields, medical image compression is important since both efficient storage and transmission of data through high-bandwidth digital communication lines are of crucial importance. Despite their advantages, most 3-D medical imaging algorithms are computationally intensive with matrix transformation as the most fundamental operation involved in the transform-based methods. Therefore, there is a real need for high-performance systems, whilst keeping architectures exible to allow for quick upgradeability with real-time applications. Moreover, in order to obtain efficient solutions for large medical volumes data, an efficient implementation of these operations is of significant importance. Reconfigurable hardware, in the form of field programmable gate arrays (FPGAs) has been proposed as viable system building block in the construction of high-performance systems at an economical price. Consequently, FPGAs seem an ideal candidate to harness and exploit their inherent advantages such as massive parallelism capabilities, multimillion gate counts, and special low-power packages. The key achievements of the work presented in this thesis are summarised as follows. Two architectures for 3-D Haar wavelet transform (HWT) have been proposed based on transpose-based computation and partial reconfiguration suitable for 3-D medical imaging applications. These applications require continuous hardware servicing, and as a result dynamic partial reconfiguration (DPR) has been introduced. Comparative study for both non-partial and partial reconfiguration implementation has shown that DPR offers many advantages and leads to a compelling solution for implementing computationally intensive applications such as 3-D medical image compression. Using DPR, several large systems are mapped to small hardware resources, and the area, power consumption as well as maximum frequency are optimised and improved. Moreover, an FPGA-based architecture of the finite Radon transform (FRAT)with three design strategies has been proposed: direct implementation of pseudo-code with a sequential or pipelined description, and block random access memory (BRAM)- based method. An analysis with various medical imaging modalities has been carried out. Results obtained for image de-noising implementation using FRAT exhibits promising results in reducing Gaussian white noise in medical images. In terms of hardware implementation, promising trade-offs on maximum frequency, throughput and area are also achieved. Furthermore, a novel hardware implementation of 3-D medical image compression system with context-based adaptive variable length coding (CAVLC) has been proposed. An evaluation of the 3-D integer transform (IT) and the discrete wavelet transform (DWT) with lifting scheme (LS) for transform blocks reveal that 3-D IT demonstrates better computational complexity than the 3-D DWT, whilst the 3-D DWT with LS exhibits a lossless compression that is significantly useful for medical image compression. Additionally, an architecture of CAVLC that is capable of compressing high-definition (HD) images in real-time without any buffer between the quantiser and the entropy coder is proposed. Through a judicious parallelisation, promising results have been obtained with limited resources. In summary, this research is tackling the issues of massive 3-D medical volumes data that requires compression as well as hardware implementation to accelerate the slowest operations in the system. Results obtained also reveal a significant achievement in terms of the architecture efficiency and applications performance.Ministry of Higher Education Malaysia (MOHE), Universiti Tun Hussein Onn Malaysia (UTHM) and the British Counci

    Architectures for Adaptive Low-Power Embedded Multimedia Systems

    Get PDF
    This Ph.D. thesis describes novel hardware/software architectures for adaptive low-power embedded multimedia systems. Novel techniques for run-time adaptive energy management are proposed, such that both HW & SW adapt together to react to the unpredictable scenarios. A complete power-aware H.264 video encoder was developed. Comparison with state-of-the-art demonstrates significant energy savings while meeting the performance constraint and keeping the video quality degradation unnoticeable

    An approximate randomization-based neural network with dedicated digital architecture for energy-constrained devices

    Get PDF
    Variable energy constraints affect the implementations of neural networks on battery-operated embedded systems. This paper describes a learning algorithm for randomization-based neural networks with hard-limit activation functions. The approach adopts a novel cost function that balances accuracy and network complexity during training. From an energyspecific perspective, the new learning strategy allows to adjust, dynamically and in real time, the number of operations during the network’s forward phase. The proposed learning scheme leads to efficient predictors supported by digital architectures. The resulting digital architecture can switch to approximate computing at run time, in compliance with the available energy budget. Experiments on 10 real-world prediction testbeds confirmed the effectiveness of the learning scheme. Additional tests on limited-resource devices supported the implementation efficiency of the overall design approac

    Choose-Your-Own Adventure: A Lightweight, High-Performance Approach To Defect And Variation Mitigation In Reconfigurable Logic

    Get PDF
    For field-programmable gate arrays (FPGAs), fine-grained pre-computed alternative configurations, combined with simple test-based selection, produce limited per-chip specialization to counter yield loss, increased delay, and increased energy costs that come from fabrication defects and variation. This lightweight approach achieves much of the benefit of knowledge-based full specialization while reducing to practical, palatable levels the computational, testing, and load-time costs that obstruct the application of the knowledge-based approach. In practice this may more than double the power-limited computational capabilities of dies fabricated with 22nm technologies. Contributions of this work: • Choose-Your-own-Adventure (CYA), a novel, lightweight, scalable methodology to achieve defect and variation mitigation • Implementation of CYA, including preparatory components (generation of diverse alternative paths) and FPGA load-time components • Detailed performance characterization of CYA – Comparison to conventional loading and dynamic frequency and voltage scaling (DFVS) – Limit studies to characterize the quality of the CYA implementation and identify potential areas for further optimizatio

    A Multi-layer Fpga Framework Supporting Autonomous Runtime Partial Reconfiguration

    Get PDF
    Partial reconfiguration is a unique capability provided by several Field Programmable Gate Array (FPGA) vendors recently, which involves altering part of the programmed design within an SRAM-based FPGA at run-time. In this dissertation, a Multilayer Runtime Reconfiguration Architecture (MRRA) is developed, evaluated, and refined for Autonomous Runtime Partial Reconfiguration of FPGA devices. Under the proposed MRRA paradigm, FPGA configurations can be manipulated at runtime using on-chip resources. Operations are partitioned into Logic, Translation, and Reconfiguration layers along with a standardized set of Application Programming Interfaces (APIs). At each level, resource details are encapsulated and managed for efficiency and portability during operation. An MRRA mapping theory is developed to link the general logic function and area allocation information to the device related physical configuration level data by using mathematical data structure and physical constraints. In certain scenarios, configuration bit stream data can be read and modified directly for fast operations, relying on the use of similar logic functions and common interconnection resources for communication. A corresponding logic control flow is also developed to make the entire process autonomous. Several prototype MRRA systems are developed on a Xilinx Virtex II Pro platform. The Virtex II Pro on-chip PowerPC core and block RAM are employed to manage control operations while multiple physical interfaces establish and supplement autonomous reconfiguration capabilities. Area, speed and power optimization techniques are developed based on the developed Xilinx prototype. Evaluations and analysis of these prototype and techniques are performed on a number of benchmark and hashing algorithm case studies. The results indicate that based on a variety of test benches, up to 70% reduction in the resource utilization, up to 50% improvement in power consumption, and up to 10 times increase in run-time performance are achieved using the developed architecture and approaches compared with Xilinx baseline reconfiguration flow. Finally, a Genetic Algorithm (GA) for a FPGA fault tolerance case study is evaluated as a ultimate high-level application running on this architecture. It demonstrated that this is a hardware and software infrastructure that enables an FPGA to dynamically reconfigure itself efficiently under the control of a soft microprocessor core that is instantiated within the FPGA fabric. Such a system contributes to the observed benefits of intelligent control, fast reconfiguration, and low overhead

    Towards Computational Efficiency of Next Generation Multimedia Systems

    Get PDF
    To address throughput demands of complex applications (like Multimedia), a next-generation system designer needs to co-design and co-optimize the hardware and software layers. Hardware/software knobs must be tuned in synergy to increase the throughput efficiency. This thesis provides such algorithmic and architectural solutions, while considering the new technology challenges (power-cap and memory aging). The goal is to maximize the throughput efficiency, under timing- and hardware-constraints
    corecore