19 research outputs found

    Variation-aware high-level DSP circuit design optimisation framework for FPGAs

    Get PDF
    The constant technology shrinking and the increasing demand for systems that operate under different power profiles with the maximum performance, have motivated the work in this thesis. Modern design tools that target FPGA devices take a conservative approach in the estimation of the maximum performance that can be achieved by a design when it is placed on a device, accounting for any variability in the fabrication process of the device. The work presented here takes a new view on the performance improvement of DSP designs by pushing them into the error-prone regime, as defined by the synthesis tools, and by investigating methodologies that reduce the impact of timing errors at the output of the system. In this work two novel error reduction techniques are proposed to address this problem. One is based on reduced-precision redundancy and the other on an error optimisation framework that uses information from a prior characterisation of the device. The first one is a generic architecture that is appended to existing arithmetic operators. The second defines the high-level parameters of the algorithm without using extra resources. Both of these methods allow to achieve graceful degradation whilst variation increases. A comparison of the new methods is laid against the existing methodologies, and conclusions drawn on the tradeoffs between their cost, in terms of resources and errors, and their benefits in terms of throughput. In some cases it is possible to double the performance of the design while still producing valid results.Open Acces

    Design of Intellectual Property-Based Hardware Blocks Integrable with Embedded RISC Processors

    Get PDF
    The main focus of this thesis is to research methods, architecture, and implementation of hardware acceleration for a Reduced Instruction Set Computer (RISC) platform. The target platform is a single-core general-purpose embedded processor (the COFFEE core) which was developed by our group at Tampere University of Technology. The COFFEE core alone cannot meet the requirements of the modern applications due to the lack of several components of which the Memory Management Unit (MMU) is one of the prominent ones. Since the MMU is one of the main requirements of today’s processors, COFFEE with no MMU was not able to run an operating system. In the design of the MMU, we employed two additional micro-Translation-Lookaside Buffers (TLBs) to speed up the translation process, as well as minimizing congestions of the data/instruction address translations with a unified TLB. The MMU is tightly-coupled with the COFFEE RISC core through the Peripheral Control Block (PCB) interface of the core. The hardware implementation, alongside some optimization techniques and post synthesis results are presented, as well.Another intention of this work is to prepare a reconfigurable platform to send and receive data packets of the next generation wireless communications. Hence, we will further discuss a recently emerged wireless modulation technique known as Non-Contiguous Orthogonal Frequency Division Multiplexing (NC-OFDM), a promising technique to alleviate spectrum scarcity problem. However, one of the primary concerns in such systems is the synchronization. To that end, we developed a reconfigurable hardware component to perform as a synchronizer. The developed module exploits Partial Reconfiguration (PR) feature in order to reconfigure itself. Eventually, we will come up with several architectural choices for systems with different limiting factors such as power consumption, operating frequency, and silicon area. The synchronizer can be loosely-coupled via one of the available co-processor slots of the target processor, the COFFEE RISC core.In addition, we are willing to improve the versatility of the COFFEE core even in industrial use cases. Hence, we developed a reconfigurable hardware component capable of operating in the Controller Area Network (CAN) protocol. In the first step of this implementation, we mainly concentrate on receiving, decoding, and extracting the data segment of a CAN-based packet. Moreover, this hardware block can reconfigure itself on-the-fly to operate on different data frames. More details regarding hardware implementation issues, as well as post synthesis results are also presented. The CAN module is loosely-coupled with the COFFEE RISC processor through one of the available co-processor block

    Design of large polyphase filters in the Quadratic Residue Number System

    Full text link

    Temperature aware power optimization for multicore floating-point units

    Full text link

    Comparison of logarithmic and floating-point number systems implemented on Xilinx Virtex-II field-programmable gate arrays

    Get PDF
    The aim of this thesis is to compare the implementation of parameterisable LNS (logarithmic number system) and floating-point high dynamic range number systems on FPGA. The Virtex/Virtex-II range of FPGAs from Xilinx, which are the most popular FPGA technology, are used to implement the designs. The study focuses on using the low level primitives of the technology in an efficient way and so initially the design issues in implementing fixed-point operators are considered. The four basic operations of addition, multiplication, division and square root are considered. Carry- free adders, ripple-carry adders, parallel multipliers and digit recurrence division and square root are discussed. The floating-point operators use the word format and exceptions as described by the IEEE std-754. A dual-path adder implementation is described in detail, as are floating-point multiplier, divider and square root components. Results and comparisons with other works are given. The efficient implementation of function evaluation methods is considered next. An overview of current FPGA methods is given and a new piecewise polynomial implementation using the Taylor series is presented and compared with other designs in the literature. In the next section the LNS word format, accuracy and exceptions are described and two new LNS addition/subtraction function approximations are described. The algorithms for performing multiplication, division and powering in the LNS domain are also described and are compared with other designs in the open literature. Parameterisable conversion algorithms to convert to/from the fixed-point domain from/to the LNS and floating-point domain are described and implementation results given. In the next chapter MATLAB bit-true software models are given that have the exact functionality as the hardware models. The interfaces of the models are given and a serial communication system to perform low speed system tests is described. A comparison of the LNS and floating-point number systems in terms of area and delay is given. Different functions implemented in LNS and floating-point arithmetic are also compared and conclusions are drawn. The results show that when the LNS is implemented with a 6-bit or less characteristic it is superior to floating-point. However, for larger characteristic lengths the floating-point system is more efficient due to the delay and exponential area increase of the LNS addition operator. The LNS is beneficial for larger characteristics than 6-bits only for specialist applications that require a high portion of division, multiplication, square root, powering operations and few additions

    A computer-aided design for digital filter implementation

    Get PDF
    Imperial Users onl

    Ultra-low-power circuits and systems for wearable and implantable medical devices

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 219-231).Advances in circuits, sensors, and energy storage elements have opened up many new possibilities in the health industry. In the area of wearable devices, the miniaturization of electronics has spurred the rapid development of wearable vital signs, activity, and fitness monitors. Maximizing the time between battery recharge places stringent requirements on power consumption by the device. For implantable devices, the situation is exacerbated by the fact that energy storage capacity is limited by volume constraints, and frequent battery replacement via surgery is undesirable. In this case, the design of energy-efficient circuits and systems becomes even more crucial. This thesis explores the design of energy-efficient circuits and systems for two medical applications. The first half of the thesis focuses on the design and implementation of an ultra-low-power, mixed-signal front-end for a wearable ECG monitor in a 0.18pm CMOS process. A mixed-signal architecture together with analog circuit optimizations enable ultra-low-voltage operation at 0.6V which provides power savings through voltage scaling, and ensures compatibility with state-of-the-art DSPs. The fully-integrated front-end consumes just 2.9[mu]W, which is two orders of magnitude lower than commercially available parts. The second half of this thesis focuses on ultra-low-power system design and energy-efficient neural stimulation for a proof-of-concept fully-implantable cochlear implant. First, implantable acoustic sensing is demonstrated by sensing the motion of a human cadaveric middle ear with a piezoelectric sensor. Second, alternate energy-efficient electrical stimulation waveforms are investigated to reduce neural stimulation power when compared to the conventional rectangular waveform. The energy-optimal waveform is analyzed using a computational nerve fiber model, and validated with in-vivo ECAP recordings in the auditory nerve of two cats and with psychophysical tests in two human cochlear implant users. Preliminary human subject testing shows that charge and energy savings of 20-30% and 15-35% respectively are possible with alternative waveforms. A system-on-chip comprising the sensor interface, reconfigurable sound processor, and arbitrary-waveform neural stimulator is implemented in a 0.18[mu]m high-voltage CMOS process to demonstrate the feasibility of this system. The sensor interface and sound processor consume just 12[mu]W of power, representing just 2% of the overall system power which is dominated by stimulation. As a result, the energy savings from using alternative stimulation waveforms transfer directly to the system.by Marcus Yip.Ph.D

    Belle II Technical Design Report

    Full text link
    The Belle detector at the KEKB electron-positron collider has collected almost 1 billion Y(4S) events in its decade of operation. Super-KEKB, an upgrade of KEKB is under construction, to increase the luminosity by two orders of magnitude during a three-year shutdown, with an ultimate goal of 8E35 /cm^2 /s luminosity. To exploit the increased luminosity, an upgrade of the Belle detector has been proposed. A new international collaboration Belle-II, is being formed. The Technical Design Report presents physics motivation, basic methods of the accelerator upgrade, as well as key improvements of the detector.Comment: Edited by: Z. Dole\v{z}al and S. Un
    corecore