Search CORE

383 research outputs found

From FPGA to ASIC: A RISC-V processor experience

Author: Rojas Morales Carlos
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2019
Field of study

This work document a correct design flow using these tools in the Lagarto RISC- V Processor and the RTL design considerations that must be taken into account, to move from a design for FPGA to design for ASIC

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Low power architectures for streaming applications

Author: He Y.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2013
Field of study

Repository TU/e

Pure OAI Repository

XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference

Author: Benini Luca
Conti Francesco
Schiavone Pasquale Davide
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Binary Neural Networks (BNNs) are promising to deliver accuracy comparable to conventional deep neural networks at a fraction of the cost in terms of memory and energy. In this paper, we introduce the XNOR Neural Engine (XNE), a fully digital configurable hardware accelerator IP for BNNs, integrated within a microcontroller unit (MCU) equipped with an autonomous I/O subsystem and hybrid SRAM / standard cell memory. The XNE is able to fully compute convolutional and dense layers in autonomy or in cooperation with the core in the MCU to realize more complex behaviors. We show post-synthesis results in 65nm and 22nm technology for the XNE IP and post-layout results in 22nm for the full MCU indicating that this system can drop the energy cost per binary operation to 21.6fJ per operation at 0.4V, and at the same time is flexible and performant enough to execute state-of-the-art BNN topologies such as ResNet-34 in less than 2.2mJ per frame at 8.9 fps.Comment: 11 pages, 8 figures, 2 tables, 3 listings. Accepted for presentation at CODES'18 and for publication in IEEE Transactions on Computer-Aided Design of Circuits and Systems (TCAD) as part of the ESWEEK-TCAD special issu

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Recommended from our members

Advanced microarchitecture and circuit design techniques for on-chip memories in CMOS technology

Author: Hsu Steven K.
Publication venue: 'Oregon State University'
Publication date
Field of study

In modern on-chip memories, an increasing demand for higher performance, lower power, reduced area, and improved robustness creates a rising need for advanced microarchitecture and circuit design techniques. Particularly in large-signal multi-ported register files, these advanced design techniques include: (i) multi-banked arrays, (ii) multi-frequency arrays, (iii) multi-bit width gating, (iv) multi-latency cycle times, (v) multi-threshold devices, and (vi) multi-strength keepers. In modern microprocessors, register files are important ingredients, but the increasing number of register file read/write ports and entries can produce a bottleneck. This thesis discusses various new techniques, to address the challenges facing register file designers, and to satisfy microprocessor requirements. The scalability of register files is a concern in modern microprocessors. As microprocessors become wider to exploit instruction level parallelism, this increases the amount of read/write ports. In turn this results in quadratic growth in register file area, decreasing frequency and increasing the power consumption. Multi-banked and multi-frequency register files reduce area and power consumption by relieving the read/write port congestion. Multi-bit width register files reduce active power during read/write operations by gating the clock/wordline. Pipelined register files improve frequency by reducing logic depth, but require multiple cycles for read/write operations. Multi-latency register files contain variable access cycle times, which are dependent on the physical locality of the data. This improves overall microprocessor performance and recovers lost instructions per cycle. As instruction window size continues to expand in modern microprocessors, the resulting demand for additional register file entries requires increased use of wide-OR dynamic circuits. However, these circuits, primarily found in local/global bitlines, are susceptible to leakage noise. In a multi-threshold process, a self-reverse bias technique exploits the use of leaky low-VTH devices, reducing bitline leakage and improving robustness. This circuit topology improves bitline delay from reduced keeper contention. Downsized keepers improve bitline delay in low leakage conditions; stronger keepers improve bitline robustness in high leakage conditions. Utilizing this concept, register files with multi-strength keepers enable robust operation across a wide range of process, voltage, and temperature. These various design techniques show excellent promise in improving performance, power, area, and robustness of multi-ported register files in modern microprocessors

ScholarsArchive@OSU

The impact of design techniques in the reduction of power consumption of SoCs Multimedia

Author: Yang Yun Ju, 1980-
Publication venue: [s.n.]
Publication date: 19/08/2018
Field of study

Orientador: Guido Costa Souza de AraújoDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: A indústria de semicondutores sempre enfrentou fortes demandas em resolver problema de dissipação de calor e reduzir o consumo de energia em dispositivos. Esta tendência tem sido intensificada nos últimos anos com o movimento de sustentabilidade ambiental. A concepção correta de um sistema eletrônico de baixo consumo de energia é um problema de vários níveis de complexidade e exige estratégias sistemáticas na sua construção. Fora disso, a adoção de qualquer técnica de redução de energia sempre está vinculada com objetivos especiais e provoca alguns impactos no projeto. Apesar dos projetistas conheçam bem os impactos de forma qualitativa, as detalhes quantitativas ainda são incógnitas ou apenas mantidas dentro do 'know-how' das empresas. Neste trabalho, de acordo com resultados experimentais baseado num plataforma de SoC1 industrial, tentamos quantificar os impactos derivados do uso de técnicas de redução de consumo de energia. Nos concentramos em relacionar o fator de redução de energia de cada técnica aos impactos em termo de área, desempenho, esforço de implementação e verificação. Na ausência desse tipo de dados, que relacionam o esforço de engenharia com as metas de consumo de energia, incertezas e atrasos serão frequentes no cronograma de projeto. Esperamos que este tipo de orientações possam ajudar/guiar os arquitetos de projeto em selecionar as técnicas adequadas para reduzir o consumo de energia dentro do alcance de orçamento e cronograma de projetoAbstract: The semiconductor industry has always faced strong demands to solve the problem of heat dissipation and reduce the power consumption in electronic devices. This trend has been increased in recent years with the action of environmental sustainability. The correct conception of an electronic system for low power consumption is an issue with multiple levels of complexities and requires systematic approaches in its construction. However, the adoption of any technique for reducing the power consumption is always linked with some specific goals and causes some impacts on the project. Although the designers know well that these impacts can affect the design in a quality aspect, the quantitative details are still unkown or just be kept inside the company's know-how. In this work, according to the experimental results based on an industrial SoC2 platform, we try to quantify the impacts of the use of low power techniques. We will relate the power reduction factor of each technique to the impact in terms of area, performance, implementation and verification effort. In the absence of such data, which relates the engineering effort to the goals of power consumption, uncertainties and delays are frequent. We hope that such guidelines can help/guide the project architects in selecting the appropriate techniques to reduce the power consumption within the limit of budget and project scheduleMestradoCiência da ComputaçãoMestre em Ciência da Computaçã

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio da Producao Cientifica e Intelectual da Unicamp

Circuit Techniques for Low-Power and Secure Internet-of-Things Systems

Author: Yang Kaiyuan
Publication venue
Publication date
Field of study

The coming of Internet of Things (IoT) is expected to connect the physical world to the cyber world through ubiquitous sensors, actuators and computers. The nature of these applications demand long battery life and strong data security. To connect billions of things in the world, the hardware platform for IoT systems must be optimized towards low power consumption, high energy efficiency and low cost. With these constraints, the security of IoT systems become a even more difficult problem compared to that of computer systems. A new holistic system design considering both hardware and software implementations is demanded to face these new challenges. In this work, highly robust and low-cost true random number generators (TRNGs) and physically unclonable functions (PUFs) are designed and implemented as security primitives for secret key management in IoT systems. They provide three critical functions for crypto systems including runtime secret key generation, secure key storage and lightweight device authentication. To achieve robustness and simplicity, the concept of frequency collapse in multi-mode oscillator is proposed, which can effectively amplify the desired random variable in CMOS devices (i.e. process variation or noise) and provide a runtime monitor of the output quality. A TRNG with self-tuning loop to achieve robust operation across -40 to 120 degree Celsius and 0.6 to 1V variations, a TRNG that can be fully synthesized with only standard cells and commercial placement and routing tools, and a PUF with runtime filtering to achieve robust authentication, are designed based upon this concept and verified in several CMOS technology nodes. In addition, a 2-transistor sub-threshold amplifier based "weak" PUF is also presented for chip identification and key storage. This PUF achieves state-of-the-art 1.65% native unstable bit, 1.5fJ per bit energy efficiency, and 3.16% flipping bits across -40 to 120 degree Celsius range at the same time, while occupying only 553 feature size square area in 180nm CMOS. Secondly, the potential security threats of hardware Trojan is investigated and a new Trojan attack using analog behavior of digital processors is proposed as the first stealthy and controllable fabrication-time hardware attack. Hardware Trojan is an emerging concern about globalization of semiconductor supply chain, which can result in catastrophic attacks that are extremely difficult to find and protect against. Hardware Trojans proposed in previous works are based on either design-time code injection to hardware description language or fabrication-time modification of processing steps. There have been defenses developed for both types of attacks. A third type of attack that combines the benefits of logical stealthy and controllability in design-time attacks and physical "invisibility" is proposed in this work that crosses the analog and digital domains. The attack eludes activation by a diverse set of benchmarks and evades known defenses. Lastly, in addition to security-related circuits, physical sensors are also studied as fundamental building blocks of IoT systems in this work. Temperature sensing is one of the most desired functions for a wide range of IoT applications. A sub-threshold oscillator based digital temperature sensor utilizing the exponential temperature dependence of sub-threshold current is proposed and implemented. In 180nm CMOS, it achieves 0.22/0.19K inaccuracy and 73mK noise-limited resolution with only 8865 square micrometer additional area and 75nW extra power consumption to an existing IoT system.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/138779/1/kaiyuan_1.pd

Deep Blue Documents at the University of Michigan

Xetal-Pro : an ultra-low energy and high throughput SIMD processor

Author: Abbo A.A.
Corporaal H.
He Y.
Kleihorst R.P.
Moreno Londono S.
Pu Y.
Publication venue
Publication date: 01/01/2010
Field of study

This paper presents Xetal-Pro SIMD processor, which is based on Xetal-II, one of the most computational-efficient (in terms of GOPS/Watt) processors available today. Xetal-Pro supports ultra wide VDD scaling from nominal supply to the sub-threshold region. Although aggressive VDD scaling causes severe throughput degradation, this can be compensated by the nature of massive parallelism in the Xetal family. The predecessor of Xetal-Pro, Xetal-II, includes a large on-chip frame memory (FM), which cannot operate reliably at ultra low voltage. Therefore we investigate both different FM realizations and memory organization alternatives. We propose a hybrid memory architecture which reduces the non-local memory traffic and enables further VDD scaling. Compared to Xetal-II operating at nominal voltage, we could gain more than 10× energy reduction while still delivering a sufficiently high throughput of 0.69 GOPS (counting multiply and add operations only). This work gives a new insight to the design of ultra-low energy SIMD processors, which are suitable for portable streaming applications

Pure OAI Repository

An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics

Author: Benini Luca
Conti Francesco
Gautschi Michael
Gürkaynak Frank Kagan
Haugou Germain
Loi Igor
Mangard Stefan
Muehlberghuber Michael
Pullini Antonio
Rossi Davide
Schiavone Pasquale Davide
Schilling Robert
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/04/2017
Field of study

Near-sensor data analytics is a promising direction for IoT endpoints, as it minimizes energy spent on communication and reduces network load - but it also poses security concerns, as valuable data is stored or sent over the network at various stages of the analytics pipeline. Using encryption to protect sensitive data at the boundary of the on-chip analytics engine is a way to address data security issues. To cope with the combined workload of analytics and encryption in a tight power envelope, we propose Fulmine, a System-on-Chip based on a tightly-coupled multi-core cluster augmented with specialized blocks for compute-intensive data processing and encryption functions, supporting software programmability for regular computing tasks. The Fulmine SoC, fabricated in 65nm technology, consumes less than 20mW on average at 0.8V achieving an efficiency of up to 70pJ/B in encryption, 50pJ/px in convolution, or up to 25MIPS/mW in software. As a strong argument for real-life flexible application of our platform, we show experimental results for three secure analytics use cases: secure autonomous aerial surveillance with a state-of-the-art deep CNN consuming 3.16pJ per equivalent RISC op; local CNN-based face detection with secured remote recognition in 5.74pJ/op; and seizure detection with encrypted data collection from EEG within 12.7pJ/op.Comment: 15 pages, 12 figures, accepted for publication to the IEEE Transactions on Circuits and Systems - I: Regular Paper

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Ultra low-power fault-tolerant SRAM design in 90nm CMOS technology

Author: Wang Kuande
Publication venue: 'University of Saskatchewan Library'
Publication date
Field of study

With the increment of mobile, biomedical and space applications, digital systems with low-power consumption are required. As a main part in digital systems, low-power memories are especially desired. Reducing the power supply voltages to sub-threshold region is one of the effective approaches for ultra low-power applications. However, the reduced Static Noise Margin (SNM) of Static Random Access Memory (SRAM) imposes great challenges to the subthreshold SRAM design. The conventional 6-transistor SRAM cell does not function properly at sub-threshold supply voltage range because it has no enough noise margin for reliable operation. In order to achieve ultra low-power at sub-threshold operation, previous research work has demonstrated that the read and write decoupled scheme is a good solution to the reduced SNM problem. A Dual Interlocked Storage Cell (DICE) based SRAM cell was proposed to eliminate the drawback of conventional DICE cell during read operation. This cell can mitigate the singleevent effects, improve the stability and also maintain the low-power characteristic of subthreshold SRAM, In order to make the proposed SRAM cell work under different power supply voltages from 0.3 V to 0.6 V, an improved replica sense scheme was applied to produce a reference control signal, with which the optimal read time could be achieved. In this thesis, a 2K ~8 bits SRAM test chip was designed, simulated and fabricated in 90nm CMOS technology provided by ST Microelectronics. Simulation results suggest that the operating frequency at VDD = 0.3 V is up to 4.7 MHz with power dissipation 6.0 ƒÊW, while it is 45.5 MHz at VDD = 0.6 V dissipating 140 ƒÊW. However, the area occupied by a single cell is larger than that by conventional SRAM due to additional transistors used. The main contribution of this thesis project is that we proposed a new design that could simultaneously solve the ultra low-power and radiation-tolerance problem in large capacity memory design

eCommons@USASK

University of Saskatchewan Research Archive