1,096 research outputs found

    A Scalable Correlator Architecture Based on Modular FPGA Hardware, Reuseable Gateware, and Data Packetization

    Full text link
    A new generation of radio telescopes is achieving unprecedented levels of sensitivity and resolution, as well as increased agility and field-of-view, by employing high-performance digital signal processing hardware to phase and correlate large numbers of antennas. The computational demands of these imaging systems scale in proportion to BMN^2, where B is the signal bandwidth, M is the number of independent beams, and N is the number of antennas. The specifications of many new arrays lead to demands in excess of tens of PetaOps per second. To meet this challenge, we have developed a general purpose correlator architecture using standard 10-Gbit Ethernet switches to pass data between flexible hardware modules containing Field Programmable Gate Array (FPGA) chips. These chips are programmed using open-source signal processing libraries we have developed to be flexible, scalable, and chip-independent. This work reduces the time and cost of implementing a wide range of signal processing systems, with correlators foremost among them,and facilitates upgrading to new generations of processing technology. We present several correlator deployments, including a 16-antenna, 200-MHz bandwidth, 4-bit, full Stokes parameter application deployed on the Precision Array for Probing the Epoch of Reionization.Comment: Accepted to Publications of the Astronomy Society of the Pacific. 31 pages. v2: corrected typo, v3: corrected Fig. 1

    Demonstrating Analog Inference on the BrainScaleS-2 Mobile System

    Full text link
    We present the BrainScaleS-2 mobile system as a compact analog inference engine based on the BrainScaleS-2 ASIC and demonstrate its capabilities at classifying a medical electrocardiogram dataset. The analog network core of the ASIC is utilized to perform the multiply-accumulate operations of a convolutional deep neural network. At a system power consumption of 5.6W, we measure a total energy consumption of 192uJ for the ASIC and achieve a classification time of 276us per electrocardiographic patient sample. Patients with atrial fibrillation are correctly identified with a detection rate of (93.7±{\pm}0.7)% at (14.0±{\pm}1.0)% false positives. The system is directly applicable to edge inference applications due to its small size, power envelope, and flexible I/O capabilities. It has enabled the BrainScaleS-2 ASIC to be operated reliably outside a specialized lab setting. In future applications, the system allows for a combination of conventional machine learning layers with online learning in spiking neural networks on a single neuromorphic platform

    VERILOG DESIGN AND FPGA PROTOTYPE OF A NANOCONTROLLER SYSTEM

    Get PDF
    Many new fabrication technologies, from nanotechnology and MEMS to printed organic semiconductors, center on constructing arrays of large numbers of sensors, actuators, or other devices on a single substrate. The utility of such an array could be greatly enhanced if each device could be managed by a programmable controller and all of these controllers could coordinate their actions as a massively-parallel computer. Kentucky Architecture nanocontroller array with very low per controller circuit complexity can provide efficient control of nanotechnology devices. This thesis provides a detailed description of the control hierarchy of a digital system needed to build nanocontrollers suitable for controlling millions of devices on a single chip. A Verilog design and FPGA prototype of a nanocontroller system is provided to meet the constraints associated with a massively-parallel programmable controller system

    Energy-efficient hardware design based on high-level synthesis

    Get PDF
    This dissertation describes research activities broadly concerning the area of High-level synthesis (HLS), but more specifically, regarding the HLS-based design of energy-efficient hardware (HW) accelerators. HW accelerators, mostly implemented on FPGAs, are integral to the heterogeneous architectures employed in modern high performance computing (HPC) systems due to their ability to speed up the execution while dramatically reducing the energy consumption of computationally challenging portions of complex applications. Hence, the first activity was regarding an HLS-based approach to directly execute an OpenCL code on an FPGA instead of its traditional GPU-based counterpart. Modern FPGAs offer considerable computational capabilities while consuming significantly smaller power as compared to high-end GPUs. Several different implementations of the K-Nearest Neighbor algorithm were considered on both FPGA- and GPU-based platforms and their performance was compared. FPGAs were generally more energy-efficient than the GPUs in all the test cases. Eventually, we were also able to get a faster (in terms of execution time) FPGA implementation by using an FPGA-specific OpenCL coding style and utilizing suitable HLS directives. The second activity was targeted towards the development of a methodology complementing HLS to automatically derive power optimization directives (also known as "power intent") from a system-level design description and use it to drive the design steps after HLS, by producing a directive file written using the common power format (CPF) to achieve power shut-off (PSO) in case of an ASIC design. The proposed LP-HLS methodology reduces the design effort by enabling designers to infer low power information from the system-level description of a design rather than at the RTL. This methodology required a SystemC description of a generic power management module to describe the design context of a HW module also modeled in SystemC, along with the development of a tool to automatically produce the CPF file to accomplish PSO. Several test cases were considered to validate the proposed methodology and the results demonstrated its ability to correctly extract the low power information and apply it to achieve power optimization in the backend flow

    KAVUAKA: a low-power application-specific processor architecture for digital hearing aids

    Get PDF
    The power consumption of digital hearing aids is very restricted due to their small physical size and the available hardware resources for signal processing are limited. However, there is a demand for more processing performance to make future hearing aids more useful and smarter. Future hearing aids should be able to detect, localize, and recognize target speakers in complex acoustic environments to further improve the speech intelligibility of the individual hearing aid user. Computationally intensive algorithms are required for this task. To maintain acceptable battery life, the hearing aid processing architecture must be highly optimized for extremely low-power consumption and high processing performance.The integration of application-specific instruction-set processors (ASIPs) into hearing aids enables a wide range of architectural customizations to meet the stringent power consumption and performance requirements. In this thesis, the application-specific hearing aid processor KAVUAKA is presented, which is customized and optimized with state-of-the-art hearing aid algorithms such as speaker localization, noise reduction, beamforming algorithms, and speech recognition. Specialized and application-specific instructions are designed and added to the baseline instruction set architecture (ISA). Among the major contributions are a multiply-accumulate (MAC) unit for real- and complex-valued numbers, architectures for power reduction during register accesses, co-processors and a low-latency audio interface. With the proposed MAC architecture, the KAVUAKA processor requires 16 % less cycles for the computation of a 128-point fast Fourier transform (FFT) compared to related programmable digital signal processors. The power consumption during register file accesses is decreased by 6 %to 17 % with isolation and by-pass techniques. The hardware-induced audio latency is 34 %lower compared to related audio interfaces for frame size of 64 samples.The final hearing aid system-on-chip (SoC) with four KAVUAKA processor cores and ten co-processors is integrated as an application-specific integrated circuit (ASIC) using a 40 nm low-power technology. The die size is 3.6 mm2. Each of the processors and co-processors contains individual customizations and hardware features with a varying datapath width between 24-bit to 64-bit. The core area of the 64-bit processor configuration is 0.134 mm2. The processors are organized in two clusters that share memory, an audio interface, co-processors and serial interfaces. The average power consumption at a clock speed of 10 MHz is 2.4 mW for SoC and 0.6 mW for the 64-bit processor.Case studies with four reference hearing aid algorithms are used to present and evaluate the proposed hardware architectures and optimizations. The program code for each processor and co-processor is generated and optimized with evolutionary algorithms for operation merging,instruction scheduling and register allocation. The KAVUAKA processor architecture is com-pared to related processor architectures in terms of processing performance, average power consumption, and silicon area requirements

    Rapid-Prototyping Emulation System for Embedded System Hardware Verification, using a SystemC Control System Environment and Reconfigurable Multimedia Hardware Development Platform

    Get PDF
    This paper describes research into the suitability of using SystemC for rapid prototyping of embedded systems. SystemC[1][2][3] communication interface protocols [4][5] are interfaced with a reconfigurable hardware system platform to provide a real-time emulation environment, allowing SystemC simulations to be directly translated into real-time solutions. The consequent Rapid Prototyping Emulation System Platform1, suitable for the implementation of consumer level multimedia systems, is described, including the system architecture, SystemC Controller model, the FPGA configured MicroBlaze CPU system and additional logic devices implemented on the Multimedia development board used for the hardware in the PESP, illustrated in the context of a typical application

    Introductory Chapter: ASIC Technologies and Design Techniques

    Get PDF

    Low-power CMOS digital-pixel Imagers for high-speed uncooled PbSe IR applications

    Get PDF
    This PhD dissertation describes the research and development of a new low-cost medium wavelength infrared MWIR monolithic imager technology for high-speed uncooled industrial applications. It takes the baton on the latest technological advances in the field of vapour phase deposition (VPD) PbSe-based medium wavelength IR (MWIR) detection accomplished by the industrial partner NIT S.L., adding fundamental knowledge on the investigation of novel VLSI analog and mixed-signal design techniques at circuit and system levels for the development of the readout integrated device attached to the detector. The work supports on the hypothesis that, by the use of the preceding design techniques, current standard inexpensive CMOS technologies fulfill all operational requirements of the VPD PbSe detector in terms of connectivity, reliability, functionality and scalability to integrate the device. The resulting monolithic PbSe-CMOS camera must consume very low power, operate at kHz frequencies, exhibit good uniformity and fit the CMOS read-out active pixels in the compact pitch of the focal plane, all while addressing the particular characteristics of the MWIR detector: high dark-to-signal ratios, large input parasitic capacitance values and remarkable mismatching in PbSe integration. In order to achieve these demands, this thesis proposes null inter-pixel crosstalk vision sensor architectures based on a digital-only focal plane array (FPA) of configurable pixel sensors. Each digital pixel sensor (DPS) cell is equipped with fast communication modules, self-biasing, offset cancellation, analog-to-digital converter (ADC) and fixed pattern noise (FPN) correction. In-pixel power consumption is minimized by the use of comprehensive MOSFET subthreshold operation. The main aim is to potentiate the integration of PbSe-based infra-red (IR)-image sensing technologies so as to widen its use, not only in distinct scenarios, but also at different stages of PbSe-CMOS integration maturity. For this purpose, we posit to investigate a comprehensive set of functional blocks distributed in two parallel approaches: • Frame-based “Smart” MWIR imaging based on new DPS circuit topologies with gain and offset FPN correction capabilities. This research line exploits the detector pitch to offer fully-digital programmability at pixel level and complete functionality with input parasitic capacitance compensation and internal frame memory. • Frame-free “Compact”-pitch MWIR vision based on a novel DPS lossless analog integrator and configurable temporal difference, combined with asynchronous communication protocols inside the focal plane. This strategy is conceived to allow extensive pitch compaction and readout speed increase by the suppression of in-pixel digital filtering, and the use of dynamic bandwidth allocation in each pixel of the FPA. In order make the electrical validation of first prototypes independent of the expensive PbSe deposition processes at wafer level, investigation is extended as well to the development of affordable sensor emulation strategies and integrated test platforms specifically oriented to image read-out integrated circuits. DPS cells, imagers and test chips have been fabricated and characterized in standard 0.15μm 1P6M, 0.35μm 2P4M and 2.5μm 2P1M CMOS technologies, all as part of research projects with industrial partnership. The research has led to the first high-speed uncooled frame-based IR quantum imager monolithically fabricated in a standard VLSI CMOS technology, and has given rise to the Tachyon series [1], a new line of commercial IR cameras used in real-time industrial, environmental and transportation control systems. The frame-free architectures investigated in this work represent a firm step forward to push further pixel pitch and system bandwidth up to the limits imposed by the evolving PbSe detector in future generations of the device.La present tesi doctoral descriu la recerca i el desenvolupament d'una nova tecnologia monolítica d'imatgeria infraroja de longitud d'ona mitja (MWIR), no refrigerada i de baix cost, per a usos industrials d'alta velocitat. El treball pren el relleu dels últims avenços assolits pel soci industrial NIT S.L. en el camp dels detectors MWIR de PbSe depositats en fase vapor (VPD), afegint-hi coneixement fonamental en la investigació de noves tècniques de disseny de circuits VLSI analògics i mixtes pel desenvolupament del dispositiu integrat de lectura unit al detector pixelat. Es parteix de la hipòtesi que, mitjançant l'ús de les esmentades tècniques de disseny, les tecnologies CMOS estàndard satisfan tots els requeriments operacionals del detector VPD PbSe respecte a connectivitat, fiabilitat, funcionalitat i escalabilitat per integrar de forma econòmica el dispositiu. La càmera PbSe-CMOS resultant ha de consumir molt baixa potència, operar a freqüències de kHz, exhibir bona uniformitat, i encabir els píxels actius CMOS de lectura en el pitch compacte del pla focal de la imatge, tot atenent a les particulars característiques del detector: altes relacions de corrent d'obscuritat a senyal, elevats valors de capacitat paràsita a l'entrada i dispersions importants en el procés de fabricació. Amb la finalitat de complir amb els requisits previs, es proposen arquitectures de sensors de visió de molt baix acoblament interpíxel basades en l'ús d'una matriu de pla focal (FPA) de píxels actius exclusivament digitals. Cada píxel sensor digital (DPS) està equipat amb mòduls de comunicació d'alta velocitat, autopolarització, cancel·lació de l'offset, conversió analògica-digital (ADC) i correcció del soroll de patró fixe (FPN). El consum en cada cel·la es minimitza fent un ús exhaustiu del MOSFET operant en subllindar. L'objectiu últim és potenciar la integració de les tecnologies de sensat d'imatge infraroja (IR) basades en PbSe per expandir-ne el seu ús, no només a diferents escenaris, sinó també en diferents estadis de maduresa de la integració PbSe-CMOS. En aquest sentit, es proposa investigar un conjunt complet de blocs funcionals distribuïts en dos enfocs paral·lels: - Dispositius d'imatgeria MWIR "Smart" basats en frames utilitzant noves topologies de circuit DPS amb correcció de l'FPN en guany i offset. Aquesta línia de recerca exprimeix el pitch del detector per oferir una programabilitat completament digital a nivell de píxel i plena funcionalitat amb compensació de la capacitat paràsita d'entrada i memòria interna de fotograma. - Dispositius de visió MWIR "Compact"-pitch "frame-free" en base a un novedós esquema d'integració analògica en el DPS i diferenciació temporal configurable, combinats amb protocols de comunicació asíncrons dins del pla focal. Aquesta estratègia es concep per permetre una alta compactació del pitch i un increment de la velocitat de lectura, mitjançant la supressió del filtrat digital intern i l'assignació dinàmica de l'ample de banda a cada píxel de l'FPA. Per tal d'independitzar la validació elèctrica dels primers prototips respecte a costosos processos de deposició del PbSe sensor a nivell d'oblia, la recerca s'amplia també al desenvolupament de noves estratègies d'emulació del detector d'IR i plataformes de test integrades especialment orientades a circuits integrats de lectura d'imatge. Cel·les DPS, dispositius d'imatge i xips de test s'han fabricat i caracteritzat, respectivament, en tecnologies CMOS estàndard 0.15 micres 1P6M, 0.35 micres 2P4M i 2.5 micres 2P1M, tots dins el marc de projectes de recerca amb socis industrials. Aquest treball ha conduït a la fabricació del primer dispositiu quàntic d'imatgeria IR d'alta velocitat, no refrigerat, basat en frames, i monolíticament fabricat en tecnologia VLSI CMOS estàndard, i ha donat lloc a Tachyon, una nova línia de càmeres IR comercials emprades en sistemes de control industrial, mediambiental i de transport en temps real.Postprint (published version

    An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration

    Get PDF
    We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators. Specifically, we experimentally study the reduced-voltage operation of multiple components of real FPGAs, characterize the corresponding reliability behavior of CNN accelerators, propose techniques to minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning. We investigate the effect of environmental temperature on the reliability-power trade-off of such accelerators. We perform experiments on three identical samples of modern Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification CNN benchmarks. This approach allows us to study the effects of our undervolting technique for both software and hardware variability. We achieve more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain is the result of eliminating the voltage guardband region, i.e., the safe voltage region below the nominal level that is set by FPGA vendor to ensure correct functionality in worst-case environmental and circuit conditions. 43% of the power-efficiency gain is due to further undervolting below the guardband, which comes at the cost of accuracy loss in the CNN accelerator. We evaluate an effective frequency underscaling technique that prevents this accuracy loss, and find that it reduces the power-efficiency gain from 43% to 25%.Comment: To appear at the DSN 2020 conferenc
    corecore