Search CORE

67 research outputs found

A Construction Kit for Efficient Low Power Neural Network Accelerator Designs

Author: Azarkhish Erfan
Benini Luca
Bonetti Andrea
Emery Stephane
Jokic Petar
Pons Marc
Publication venue
Publication date: 24/06/2021
Field of study

Implementing embedded neural network processing at the edge requires efficient hardware acceleration that couples high computational performance with low power consumption. Driven by the rapid evolution of network architectures and their algorithmic features, accelerator designs are constantly updated and improved. To evaluate and compare hardware design choices, designers can refer to a myriad of accelerator implementations in the literature. Surveys provide an overview of these works but are often limited to system-level and benchmark-specific performance metrics, making it difficult to quantitatively compare the individual effect of each utilized optimization technique. This complicates the evaluation of optimizations for new accelerator designs, slowing-down the research progress. This work provides a survey of neural network accelerator optimization approaches that have been used in recent works and reports their individual effects on edge processing performance. It presents the list of optimizations and their quantitative effects as a construction kit, allowing to assess the design choices for each building block separately. Reported optimizations range from up to 10'000x memory savings to 33x energy reductions, providing chip designers an overview of design choices for implementing efficient low power neural network accelerators

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Spatial Augmented Reality Using Structured Light Illumination

Author: Yu Ying
Publication venue: UKnowledge
Publication date: 01/01/2019
Field of study

Spatial augmented reality is a particular kind of augmented reality technique that uses projector to blend the real objects with virtual contents. Coincidentally, as a means of 3D shape measurement, structured light illumination makes use of projector as part of its system as well. It uses the projector to generate important clues to establish the correspondence between the 2D image coordinate system and the 3D world coordinate system. So it is appealing to build a system that can carry out the functionalities of both spatial augmented reality and structured light illumination. In this dissertation, we present all the hardware platforms we developed and their related applications in spatial augmented reality and structured light illumination. Firstly, it is a dual-projector structured light 3D scanning system that has two synchronized projectors operate simultaneously, consequently it outperforms the traditional structured light 3D scanning system which only include one projector in terms of the quality of 3D reconstructions. Secondly, we introduce a modified dual-projector structured light 3D scanning system aiming at detecting and solving the multi-path interference. Thirdly, we propose an augmented reality face paint system which detects human face in a scene and paints the face with any favorite colors by projection. Additionally, the system incorporates a second camera to realize the 3D space position tracking by exploiting the principle of structured light illumination. At last, a structured light 3D scanning system with its own built-in machine vision camera is presented as the future work. So far the standalone camera has been completed from the a bare CMOS sensor. With this customized camera, we can achieve high dynamic range imaging and better synchronization between the camera and projector. But the full-blown system that includes HDMI transmitter, structured light pattern generator and synchronization logic has yet to be done due to the lack of a well designed high speed PCB

University of Kentucky

Interconnect and Memory Design for Intelligent Mobile System

Author: Wang Jingcheng
Publication venue
Publication date: 01/01/2020
Field of study

Technology scaling has driven the transistor to a smaller area, higher performance and lower power consuming which leads us into the mobile and edge computing era. However, the benefits of technology scaling are diminishing today, as the wire delay and energy scales far behind that of the logics, which makes communication more expensive than computation. Moreover, emerging data centric algorithms like deep learning have a growing demand on SRAM capacity and bandwidth. High access energy and huge leakage of the large on-chip SRAM have become the main limiter of realizing an energy efficient low power smart sensor platform. This thesis presents several architecture and circuit solutions to enable intelligent mobile systems, including voltage scalable interconnect scheme, Compute-In-Memory (CIM), low power memory system from edge deep learning processor and an ultra-low leakage stacked voltage domain SRAM for low power smart image signal processor (ISP). Four prototypes are implemented for demonstration and verification. The first two seek the solutions to the slow and high energy global on-chip interconnect: the first prototype proposes a reconfigurable self-timed regenerator based global interconnect scheme to achieve higher performance and energy-efficiency in wide voltage range, while the second one presents a non Von Neumann architecture, a hybrid in-/near-memory Compute SRAM (CRAM), to address the locality issue. The next two works focus on low-power low-leakage SRAM design for Intelligent sensors. The third prototype is a low power memory design for a deep learning processor with 270KB custom SRAM and Non-Uniform Memory Access architecture. The fourth prototype is an ultra-low leakage SRAM for motion-triggered low power smart imager sensor system with voltage domain stacking and a novel array swapping mechanism. The work presented in this dissertation exploits various optimizations in both architecture level (exploiting temporal and spatial locality) and circuit customization to overcome the main challenges in making extremely energy-efficient battery-powered intelligent mobile devices. The impact of the work is significant in the era of Internet-of-Things (IoT) and the age of AI when the mobile computing systems get ubiquitous, intelligent and longer battery life, powered by these proposed solutions.PHDElectrical and Computer EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155232/1/jiwang_1.pd

Deep Blue Documents at the University of Michigan

Recommended from our members

Complete spatial safety for C and C++ using CHERI capabilities

Author: Richardson Alexander
Publication venue: University of Cambridge
Publication date: 15/10/2019
Field of study

Lack of memory safety in commonly used systems-level languages such as C and C++ results in a constant stream of new exploitable software vulnerabilities and exploit techniques. Many exploit mitigations have been proposed and deployed over the years, yet none address the root issue: lack of memory safety. Most C and C++ implementations assume a memory model based on a linear array of bytes rather than an object-centric view. Whilst more efficient on contemporary CPU architectures, linear addresses cannot encode the target object, thus permitting memory errors such as spatial safety violations (ignoring the bounds of an object). One promising mechanism to provide memory safety is CHERI (Capability Hardware Enhanced RISC Instructions), which extends existing processor architectures with capabilities that provide hardware-enforced checks for all accesses and can be used to prevent spatial memory violations. This dissertation prototypes and evaluates a pure-capability programming model (using CHERI capabilities for all pointers) to provide complete spatial memory protection for traditionally unsafe languages. As the first step towards memory safety, all language-visible pointers can be implemented as capabilities. I analyse the programmer-visible impact of this change and refine the pure-capability programming model to provide strong source-level compatibility with existing code. Second, to provide robust spatial safety, language-invisible pointers (mostly arising from program linkage) such as those used for functions calls and global variable accesses must also be protected. In doing so, I highlight trade-offs between performance and privilege minimization for implicit and programmer-visible pointers. Finally, I present CheriSH, a novel and highly compatible technique that protects against buffer overflows between fields of the same object, hereby ensuring that the CHERI spatial memory protection is complete. I find that the byte-granular spatial safety provided by CHERI pure-capability code is not only stronger than most other approaches, but also incurs almost negligible performance overheads in common cases (0.1% geometric mean) and a worst-case overhead of only 23.3% compared to the insecure MIPS baseline. Moreover, I show that the pure-capability programming model provides near-complete source-level compatibility with existing programs. I evaluate this based on porting large widely used open-source applications such as PostgreSQL and WebKit with only minimal changes: fewer than 0.1% of source lines. I conclude that pure-capability CHERI C/C++ is an eminently viable programming environment offering strong memory protection, good source-level compatibility and low performance overheads

Apollo (Cambridge)

Developing Logic Synthesis Flow for NVDLA IP

Author: Lindqvist Aarno
Publication venue
Publication date: 19/05/2022
Field of study

Modern digital devices require high computing performance; thus, markets have a huge demand for SoC. The most powerful SoC are implemented on ASIC chips since, it is the most cost-efficient technology when production volumes are high. An important step on ASIC design process is the logic synthesis. By utilizing dedicated software tool, it transfers RTL code into gate level netlist. The logic synthesis process is executed multiple times alongside the RTL code development to meet the desired specifications for the chip. This thesis project used the NVDLA IP as a use case to execute logic synthesis. NVDLA is an open-source deep learning accelerator developed by NVIDIA. The design is able to execute CNNs making it efficient. Each component in the NVDLA can be configured independently, which make it flexible and cost effective. NVDLA software ecosystem has extensive cover of software features. NVDLA is divided into five partitions according to their functionality. Each partition is an individual top-level synthesis hierarchy. The target of this thesis is to develop a logic synthesis flow for NVDLA in the company design environment. This was achieved by exploiting NVDLA design environment, company internal memory wrapper, and Synopsys Design Compiler and IC Compiler 2 tools to execute logic synthesis for TSMC 7 nm standard cell technology. All the used RTL codes and scripts were downloaded from NVDLA GitHub webpage. The memory wrapper was created by the company memory wrapper tool. It connects the NVDLA design and the RAM instances. The Design Compiler tool was used to generate the initial netlist for NVDLA partitions. The IC Compiler 2 tool was used to create individual floorplans for each partition. The generated DEF file was used for second pass synthesis to obtain the final logic synthesis results. The results demonstrate that the company design environment can be used to run synthesis for open-source IP blocks. Further, the developed flow provides a platform to exploit different kind of open-source IP’s on industrial development environment since, it can generate synthesis results for 7 nm standard cell technology quickly

Trepo - Institutional Repository of Tampere University

Experimental Investigation of a MAV-Scale Cyclocopter

Author: Shrestha Elena
Publication venue
Publication date: 01/01/2018
Field of study

The development of an efficient, maneuverable, and gust tolerant hovering concept with a multi-modal locomotion capability is key to the success of micro air vehicles (MAVs) operating in multiple mission scenarios. The current research investigated performance of two unconventional cycloidal-rotor-based (cyclocopter) configurations: (1) twin-cyclocopter and (2) all-terrain cyclocopter. The twin-cyclocopter configuration used two cycloidal rotors (cyclorotors) and a smaller horizontal edge-wise nose rotor to counteract the torque produced by the cyclorotors. The all-terrain cyclocopter relied on four cyclorotors oriented in an H-configuration. Objectives of this research include the following: (1) develop control strategies to enable level forward flight of a cyclocopter purely relying on thrust vectoring, (2) identify flight dynamics model in forward flight, (3) experimentally evaluate gust tolerance strategies, and (4) determine feasibility and performance of multi-modal locomotion of the cyclocopter configuration. The forward flight control strategy for the twin-cyclocopter used a unique combination of independent thrust vectoring and rotational speed control of the cyclorotors. Unlike conventional rotary-winged vehicles, the cyclocopter propelled in forward flight by thrust vectoring instead of pitching the entire fuselage. While the strategy enabled the vehicle to maintain a level attitude in forward flight, it was accompanied by significant yaw-roll controls coupling and gyroscopic coupling. To understand these couplings and characterize the bare airframe dynamics, a 6-DOF flight dynamics model of the cyclocopter was extracted using a time-domain system identification technique. Decoupling methods involved simultaneously mixing roll and yaw inputs in the controller. After implementing the controls mixing strategy in the closed-loop feedback system, the cyclocopter successfully achieved level forward flight up to 5 m/s. Thrust vectoring capability also proved critical for gust mitigation. Thrust vectoring input combined with flow feedback and position feedback improved gust tolerance up to 4 m/s for a twin-cyclocopter mounted on a 6-DOF test stand. Flow feedback relied on a dual-axis flowprobe attached to differential pressure sensors and position feedback was based on data recorded by the VICON motion capture system. The vehicle was also able to recover initial position for crosswind scenarios tested at various side-slip angles up to 30 degrees. Unlike existing multi-modal platforms, the all-terrain cyclocopter solely relied on its four cyclorotors as main source of propulsion, as well as wheels. Aerial and aquatic modes used aerodynamic forces generated by modulating cyclorotor rotational speeds and thrust vectors while terrestrial mode used motor torque. In aerial mode, cyclorotors operated at 1550 rpm and consumed 232 W to sustain hover. In terrestrial mode, forward translation at 2 m/s required 28 W, which was an 88% reduction in power consumption required to hover. In aquatic mode, cyclorotors operated at 348 rpm to achieve 1.3 m/s translation and consumed 19 W, a 92% reduction in power consumption. With only a modest weight addition of 200 grams for wheels and retractable landing gear, the versatile cyclocopter platform achieved sustained hover, efficient translation and rotational maneuvers on ground, and aquatic locomotion

Digital Repository at the University of Maryland

Cross-Layer Approaches for an Aging-Aware Design of Nanoscale Microprocessors

Author: Oboril Fabian
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2015
Field of study

Thanks to aggressive scaling of transistor dimensions, computers have revolutionized our life. However, the increasing unreliability of devices fabricated in nanoscale technologies emerged as a major threat for the future success of computers. In particular, accelerated transistor aging is of great importance, as it reduces the lifetime of digital systems. This thesis addresses this challenge by proposing new methods to model, analyze and mitigate aging at microarchitecture-level and above

KITopen

Control de un sistema Bola-Viga (Ball&Bean) a distancia mediante protocolo Ethernet

Author: Martín Fernández Pablo
Publication venue
Publication date: 01/01/2019
Field of study

El sistema “bola y viga” se define como una práctica clásica de control no lineal en la cual se pueden ejecutar distintos algoritmos de control, donde el proceso de diseño matemático está muy presente para que el controlador logre su objetivo. El objetivo principal de este proyecto es servir de base de aprendizaje para los alumnos de la escuela de ingenieras industriales, con el cual puedan aprender metodologías de control y en especial el diseño de controladores en lazo cerrado con algoritmo PID en este caso para equilibrar en una maqueta una bola que se desliza sobre una viga. Los alumnos podrán modificar parámetros característicos del control PID realizando prácticas y pruebas, mediante el protocolo ethernet TCP/IP de forma telemandada obtenido así la salida del sistema en forma gráfica y de forma visual mediante webcam en tiempo real o de forma presencial mediante protocolo serial-UART. Este proyecto también pretende ser una experiencia donde los alumnos podrán analizar el sistema desde un modelo matemático y mediante la teoría de control impartida en la escuela de ingenieras industriales obtener los parámetros utilizados en los sistemas de control PID que en la actualidad son empleados en la industria para el control de procesos continuosDepartamento de Ingeniería de Sistemas y AutomáticaMáster en Ingeniería Industria

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Documental de la Universidad de Valladolid

Computadora de vuelo para adquisición de datos cinemáticos en tiempo real en micro aeronaves

Author: Campos Canizales Rubén Abisai
Publication venue
Publication date: 01/01/2016
Field of study

En los Vehículos Aéreos No Tripulados (VANTs), es necesario contar con un sistema de cómputo, comúnmente llamado Computadora de Vuelo (CV), que se encargue de la obtención, procesamiento y almacenamiento de datos de aire, datos inerciales, datos de navegación por estimación y datos de posicionamiento. Además de lo anterior, la computadora de vuelo debe contar con algoritmos que le permitan actuar sobre las superficies de control de la aeronave para garantizar su correcto funcionamiento sobre un amplio rango de situaciones, a este subsistema se le conoce como Autopiloto [1]. En este trabajo se plantea la construcción y prueba de una CV para VANTs, en particular, se aborda el problema mediante la integración de hardware comercial (COTS - Commercial Off-The-Shelf) con algoritmos de obtención, procesamiento, almacenamiento y control propios

Repositorio Academico Digital UANL

A Low-Power Wireless Multichannel Microsystem for Reliable Neural Recording.

Author: Borna Amir
Publication venue
Publication date
Field of study

This thesis reports on the development of a reliable, single-chip, multichannel wireless biotelemetry microsystem intended for extracellular neural recording from awake, mobile, and small animal models. The inherently conflicting requirements of low power and reliability are addressed in the proposed microsystem at architectural and circuit levels. Through employing the preliminary microsystems in various in-vivo experiments, the system requirements for reliable neural recording are identified and addressed at architectural level through the analytical tool: signal path co-optimization. The 2.85mm×3.84mm, mixed-signal ASIC integrates a low-noise front-end, programmable digital controller, an RF modulator, and an RF power amplifier (PA) at the ISM band of 433MHz on a single-chip; and is fabricated using a 0.5µm double-poly triple-metal n-well standard CMOS process. The proposed microsystem, incorporating the ASIC, is a 9-channel (8-neural, 1-audio) user programmable reliable wireless neural telemetry microsystem with a weight of 2.2g (including two 1.5V batteries) and size of 2.2×1.1×0.5cm3. The electrical characteristics of this microsystem are extensively characterized via benchtop tests. The transmitter consumes 5mW and has a measured total input referred voltage noise of 4.74µVrms, 6.47µVrms, and 8.27µVrms at transmission distances of 3m, 10m, and 20m, respectively. The measured inter-channel crosstalk is less than 3.5% and battery life is about an hour. To compare the wireless neural telemetry systems, a figure of merit (FoM) is defined as the reciprocal of the power spent on broadcasting one channel over one meter distance. The proposed microsystem’s FoM is an order of magnitude larger compared to all other research and commercial systems. The proposed biotelemetry system has been successfully used in two in-vivo neural recording experiments: i) from a freely roaming South-American cockroach, and ii) from an awake and mobile rat.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91542/1/aborna_1.pd

Deep Blue Documents at the University of Michigan