854 research outputs found
Privacy Leakages in Approximate Adders
Approximate computing has recently emerged as a promising method to meet the
low power requirements of digital designs. The erroneous outputs produced in
approximate computing can be partially a function of each chip's process
variation. We show that, in such schemes, the erroneous outputs produced on
each chip instance can reveal the identity of the chip that performed the
computation, possibly jeopardizing user privacy. In this work, we perform
simulation experiments on 32-bit Ripple Carry Adders, Carry Lookahead Adders,
and Han-Carlson Adders running at over-scaled operating points. Our results
show that identification is possible, we contrast the identifiability of each
type of adder, and we quantify how success of identification varies with the
extent of over-scaling and noise. Our results are the first to show that
approximate digital computations may compromise privacy. Designers of future
approximate computing systems should be aware of the possible privacy leakages
and decide whether mitigation is warranted in their application.Comment: 2017 IEEE International Symposium on Circuits and Systems (ISCAS
Design and analysis of SRAMs for energy harvesting systems
PhD ThesisAt present, the battery is employed as a power source for wide varieties of microelectronic systems ranging from biomedical implants and sensor net-works to portable devices. However, the battery has several limitations and incurs many challenges for the majority of these systems. For instance, the design considerations of implantable devices concern about the battery from two aspects, the toxic materials it contains and its lifetime since replacing the battery means a surgical operation. Another challenge appears in wire-less sensor networks, where hundreds or thousands of nodes are scattered around the monitored environment and the battery of each node should be maintained and replaced regularly, nonetheless, the batteries in these nodes do not all run out at the same time.
Since the introduction of portable systems, the area of low power designs has witnessed extensive research, driven by the industrial needs, towards the aim of extending the lives of batteries. Coincidentally, the continuing innovations in the field of micro-generators made their outputs in the same range of several portable applications. This overlap creates a clear oppor-tunity to develop new generations of electronic systems that can be powered, or at least augmented, by energy harvesters. Such self-powered systems benefit applications where maintaining and replacing batteries are impossi-ble, inconvenient, costly, or hazardous, in addition to decreasing the adverse effects the battery has on the environment.
The main goal of this research study is to investigate energy harvesting aware design techniques for computational logic in order to enable the capa-
II
bility of working under non-deterministic energy sources. As a case study, the research concentrates on a vital part of all computational loads, SRAM, which occupies more than 90% of the chip area according to the ITRS re-ports.
Essentially, this research conducted experiments to find out the design met-ric of an SRAM that is the most vulnerable to unpredictable energy sources, which has been confirmed to be the timing. Accordingly, the study proposed a truly self-timed SRAM that is realized based on complete handshaking protocols in the 6T bit-cell regulated by a fully Speed Independent (SI) tim-ing circuitry. The study proved the functionality of the proposed design in real silicon. Finally, the project enhanced other performance metrics of the self-timed SRAM concentrating on the bit-line length and the minimum operational voltage by employing several additional design techniques.Umm Al-Qura University, the Ministry of Higher Education in the Kingdom of Saudi Arabia, and the Saudi Cultural Burea
Reliable Low-Power High Performance Spintronic Memories
Moores Gesetz folgend, ist es der Chipindustrie in den letzten fĂŒnf Jahrzehnten gelungen, ein
explosionsartiges Wachstum zu erreichen. Dies hatte ebenso einen exponentiellen Anstieg der
Nachfrage von Speicherkomponenten zur Folge, was wiederum zu speicherlastigen Chips in
den heutigen Computersystemen fĂŒhrt. Allerdings stellen traditionelle on-Chip Speichertech-
nologien wie Static Random Access Memories (SRAMs), Dynamic Random Access Memories
(DRAMs) und Flip-Flops eine Herausforderung in Bezug auf Skalierbarkeit, Verlustleistung
und ZuverlĂ€ssigkeit dar. Eben jene Herausforderungen und die ĂŒberwĂ€ltigende Nachfrage
nach höherer Performanz und Integrationsdichte des on-Chip Speichers motivieren Forscher,
nach neuen nichtflĂŒchtigen Speichertechnologien zu suchen. Aufkommende spintronische Spe-
ichertechnologien wie Spin Orbit Torque (SOT) und Spin Transfer Torque (STT) erhielten
in den letzten Jahren eine hohe Aufmerksamkeit, da sie eine Reihe an Vorteilen bieten. Dazu
gehören NichtflĂŒchtigkeit, Skalierbarkeit, hohe BestĂ€ndigkeit, CMOS KompatibilitĂ€t und Unan-
fĂ€lligkeit gegenĂŒber Soft-Errors. In der Spintronik reprĂ€sentiert der Spin eines Elektrons dessen
Information. Das Datum wird durch die Höhe des Widerstandes gespeichert, welche sich durch
das Anlegen eines polarisierten Stroms an das Speichermedium verÀndern lÀsst. Das Prob-
lem der statischen Leistung gehen die SpeichergerÀte sowohl durch deren verlustleistungsfreie
Eigenschaft, als auch durch ihr Standard- Aus/Sofort-Ein Verhalten an. Nichtsdestotrotz sind
noch andere Probleme, wie die hohe Zugriffslatenz und die Energieaufnahme zu lösen, bevor
sie eine verbreitete Anwendung finden können. Um diesen Problemen gerecht zu werden, sind
neue Computerparadigmen, -architekturen und -entwurfsphilosophien notwendig.
Die hohe Zugriffslatenz der Spintroniktechnologie ist auf eine vergleichsweise lange Schalt-
dauer zurĂŒckzufĂŒhren, welche die von konventionellem SRAM ĂŒbersteigt. Des Weiteren ist auf
Grund des stochastischen Schaltvorgangs der Speicherzelle und des Einflusses der Prozessvari-
ation ein nicht zu vernachlĂ€ssigender Zeitraum dafĂŒr erforderlich. In diesem Zeitraum wird ein
konstanter Schreibstrom durch die Bitzelle geleitet, um den Schaltvorgang zu gewÀhrleisten.
Dieser Vorgang verursacht eine hohe Energieaufnahme. FĂŒr die Leseoperation wird gleicher-
maĂen ein beachtliches Zeitfenster benötigt, ebenfalls bedingt durch den Einfluss der Prozess-
variation. Dem gegenĂŒber stehen diverse ZuverlĂ€ssigkeitsprobleme. Dazu gehören unter An-
derem die Leseintereferenz und andere Degenerationspobleme, wie das des Time Dependent Di-
electric Breakdowns (TDDB). Diese ZuverlÀssigkeitsprobleme sind wiederum auf die benötigten
lĂ€ngeren Schaltzeiten zurĂŒckzufĂŒhren, welche in der Folge auch einen ĂŒber lĂ€ngere Zeit an-
liegenden Lese- bzw. Schreibstrom implizieren. Es ist daher notwendig, sowohl die Energie, als
auch die Latenz zur Steigerung der ZuverlÀssigkeit zu reduzieren, um daraus einen potenziellen
Kandidaten fĂŒr ein on-Chip Speichersystem zu machen.
In dieser Dissertation werden wir Entwurfsstrategien vorstellen, welche das Ziel verfolgen,
die Herausforderungen des Cache-, Register- und Flip-Flop-Entwurfs anzugehen. Dies erre-
ichen wir unter Zuhilfenahme eines Cross-Layer Ansatzes. FĂŒr Caches entwickelten wir ver-
schiedene AnsÀtze auf Schaltkreisebene, welche sowohl auf der Speicherarchitekturebene, als
auch auf der Systemebene in Bezug auf Energieaufnahme, Performanzsteigerung und Zuver-
lĂ€ssigkeitverbesserung evaluiert werden. Wir entwickeln eine Selbstabschalttechnik, sowohl fĂŒr
die Lese-, als auch die Schreiboperation von Caches. Diese ist in der Lage, den Abschluss der
entsprechenden Operation dynamisch zu ermitteln. Nachdem der Abschluss erkannt wurde,
wird die Lese- bzw. Schreiboperation sofort gestoppt, um Energie zu sparen. ZusÀtzlich
limitiert die Selbstabschalttechnik die Dauer des Stromflusses durch die Speicherzelle, was
wiederum das Auftreten von TDDB und Leseinterferenz bei Schreib- bzw. Leseoperationen re-
duziert. Zur Verbesserung der Schreiblatenz heben wir den Schreibstrom an der Bitzelle an, um den magnetischen Schaltprozess zu beschleunigen. Um registerbankspezifische Anforderungen
zu berĂŒcksichtigen, haben wir zusĂ€tzlich eine Multiport-Speicherarchitektur entworfen, welche
eine einzigartige Eigenschaft der SOT-Zelle ausnutzt, um simultan Lese- und Schreiboperatio-
nen auszufĂŒhren. Es ist daher möglich Lese/Schreib- Konfilkte auf Bitzellen-Ebene zu lösen,
was sich wiederum in einer sehr viel einfacheren Multiport- Registerbankarchitektur nieder-
schlÀgt.
ZusÀtzlich zu den SpeicheransÀtzen haben wir ebenfalls zwei Flip-Flop-Architekturen vorgestellt.
Die erste ist eine nichtflĂŒchtige non-Shadow Flip-Flop-Architektur, welche die Speicherzelle als
aktive Komponente nutzt. Dies ermöglicht das sofortige An- und Ausschalten der Versorgungss-
pannung und ist daher besonders gut fĂŒr aggressives Powergating geeignet. Alles in Allem zeigt
der vorgestellte Flip-Flop-Entwurf eine Àhnliche Timing-Charakteristik wie die konventioneller
CMOS Flip-Flops auf. Jedoch erlaubt er zur selben Zeit eine signifikante Reduktion der statis-
chen Leistungsaufnahme im Vergleich zu nichtflĂŒchtigen Shadow- Flip-Flops. Die zweite ist eine
fehlertolerante Flip-Flop-Architektur, welche sich unanfĂ€llig gegenĂŒber diversen Defekten und
Fehlern verhĂ€lt. Die LeistungsfĂ€higkeit aller vorgestellten Techniken wird durch ausfĂŒhrliche
Simulationen auf Schaltkreisebene verdeutlicht, welche weiter durch detaillierte Evaluationen
auf Systemebene untermauert werden. Im Allgemeinen konnten wir verschiedene Techniken en-
twickeln, die erhebliche Verbesserungen in Bezug auf Performanz, Energie und ZuverlÀssigkeit
von spintronischen on-Chip Speichern, wie Caches, Register und Flip-Flops erreichen
On Real-Time AER 2-D Convolutions Hardware for Neuromorphic Spike-Based Cortical Processing
In this paper, a chip that performs real-time image
convolutions with programmable kernels of arbitrary shape is presented.
The chip is a first experimental prototype of reduced size
to validate the implemented circuits and system level techniques.
The convolution processing is based on the addressâevent-representation
(AER) technique, which is a spike-based biologically
inspired image and video representation technique that favors
communication bandwidth for pixels with more information. As
a first test prototype, a pixel array of 16x16 has been implemented
with programmable kernel size of up to 16x16. The
chip has been fabricated in a standard 0.35- m complimentary
metalâoxideâsemiconductor (CMOS) process. The technique also
allows to process larger size images by assembling 2-D arrays of
such chips. Pixel operation exploits low-power mixed analogâdigital
circuit techniques. Because of the low currents involved (down
to nanoamperes or even picoamperes), an important amount of
pixel area is devoted to mismatch calibration. The rest of the
chip uses digital circuit techniques, both synchronous and asynchronous.
The fabricated chip has been thoroughly tested, both at
the pixel level and at the system level. Specific computer interfaces
have been developed for generating AER streams from conventional
computers and feeding them as inputs to the convolution
chip, and for grabbing AER streams coming out of the convolution
chip and storing and analyzing them on computers. Extensive
experimental results are provided. At the end of this paper, we
provide discussions and results on scaling up the approach for
larger pixel arrays and multilayer cortical AER systems.Commission of the European Communities IST-2001-34124 (CAVIAR)Commission of the European Communities 216777 (NABAB)Ministerio de EducaciĂłn y Ciencia TIC-2000-0406-P4Ministerio de EducaciĂłn y Ciencia TIC-2003-08164-C03-01Ministerio de EducaciĂłn y Ciencia TEC2006-11730-C03-01Junta de AndalucĂa TIC-141
Stream Processor Development using Multi-Threshold NULL Convention Logic Asynchronous Design Methodology
Decreasing transistor feature size has led to an increase in the number of transistors in integrated circuits (IC), allowing for the implementation of more complex logic. However, such logic also requires more complex clock tree synthesis (CTS) to avoid timing violations as the clock must reach many more gates over larger areas. Thus, timing analysis requires significantly more computing power and designer involvement than in the past. For these reasons, IC designers have been pushed to nix conventional synchronous (SYNC) architecture and explore novel methodologies such as asynchronous, self-timed architecture. This dissertation evaluates the nominal active energy, voltage-scaled active energy, and leakage power dissipation across two cores of a stream processor: Smoothing Filter (SF) and Histogram Equalization (HEQ). Both cores were implemented in Multi-Threshold NULL Convention Logic (MTNCL) and clock-gated synchronous methodologies using a gate-level netlist to avoid any architectural discrepancies while guaranteeing impartial comparisons. MTNCL designs consumed more active energy than their synchronous counterparts due to the dual-rail encoding system; however, high-threshold-voltage (High-Vt) transistors used in MTNCL threshold gates reduced leakage power dissipation by up to 227%. During voltage-scaling simulations, MTNCL circuits showed a high level of robustness as the output results were logically valid across all voltage sweeps without any additional circuitry. SYNC circuits, however, needed extra logic, such as a DVS controller, to adjust the circuitâs speed when VDD changed. Although SYNC circuits still consumed less average energy, MTNCL circuit power gains accelerated when switching to lower voltage domains
Stream Processor Development using Multi-Threshold NULL Convention Logic Asynchronous Design Methodology
Decreasing transistor feature size has led to an increase in the number of transistors in integrated circuits (IC), allowing for the implementation of more complex logic. However, such logic also requires more complex clock tree synthesis (CTS) to avoid timing violations as the clock must reach many more gates over larger areas. Thus, timing analysis requires significantly more computing power and designer involvement than in the past. For these reasons, IC designers have been pushed to nix conventional synchronous (SYNC) architecture and explore novel methodologies such as asynchronous, self-timed architecture. This dissertation evaluates the nominal active energy, voltage-scaled active energy, and leakage power dissipation across two cores of a stream processor: Smoothing Filter (SF) and Histogram Equalization (HEQ). Both cores were implemented in Multi-Threshold NULL Convention Logic (MTNCL) and clock-gated synchronous methodologies using a gate-level netlist to avoid any architectural discrepancies while guaranteeing impartial comparisons. MTNCL designs consumed more active energy than their synchronous counterparts due to the dual-rail encoding system; however, high-threshold-voltage (High-Vt) transistors used in MTNCL threshold gates reduced leakage power dissipation by up to 227%. During voltage-scaling simulations, MTNCL circuits showed a high level of robustness as the output results were logically valid across all voltage sweeps without any additional circuitry. SYNC circuits, however, needed extra logic, such as a DVS controller, to adjust the circuitâs speed when VDD changed. Although SYNC circuits still consumed less average energy, MTNCL circuit power gains accelerated when switching to lower voltage domains
Non-Volatile Memory Adaptation in Asynchronous Microcontroller for Low Leakage Power and Fast Turn-on Time
This dissertation presents an MSP430 microcontroller implementation using Multi-Threshold NULL Convention Logic (MTNCL) methodology combined with an asynchronous non-volatile magnetic random-access-memory (RAM) to achieve low leakage power and fast turn-on. This asynchronous non-volatile RAM is designed with a Spin-Transfer Torque (STT) memory device model and CMOS transistors in a 65 nm technology. A self-timed Quasi-Delay-Insensitive 1 KB STT RAM is designed with an MTNCL interface and handshaking protocol. A replica methodology is implemented to handle write operation completion detection for long state-switching delays of the STT memory device. The MTNCL MSP430 core is integrated with the STT RAM to create a fully asynchronous non-volatile microcontroller.
The MSP430 architecture, the MTNCL design methodology, and the STT RAMâs low power property, along with STT RAMâs non-volatility yield multiple advantages in the MTNCL-STT RAM system for a variety of applications. For comparison, a baseline system with the same MTNCL core combined with an asynchronous CMOS RAM is designed and tested. Schematic simulation results demonstrate that the MTNCL-CMOS RAM system presents advantages in execution time and active energy over the MTNCL-STT RAM system; however, the MTNCL-STT RAM system presents unmatched advantages such as negligible leakage power, zero overhead memory power failure handling, and fast system turn-on
45-nm Radiation Hardened Cache Design
abstract: Circuits on smaller technology nodes become more vulnerable to radiation-induced upset. Since this is a major problem for electronic circuits used in space applications, designers have a variety of solutions in hand. Radiation hardening by design (RHBD) is an approach, where electronic components are designed to work properly in certain radiation environments without the use of special fabrication processes. This work focuses on the cache design for a high performance microprocessor. The design tries to mitigate radiation effects like SEE, on a commercial foundry 45 nm SOI process. The design has been ported from a previously done cache design at the 90 nm process node. The cache design is a 16 KB, 4 way set associative, write-through design that uses a no-write allocate policy. The cache has been tested to write and read at above 2 GHz at VDD = 0.9 V. Interleaved layout, parity protection, dual redundancy, and checking circuits are used in the design to achieve radiation hardness. High speed is accomplished through the use of dynamic circuits and short wiring routes wherever possible. Gated clocks and optimized wire connections are used to reduce power. Structured methodology is used to build up the entire cache.Dissertation/ThesisM.S. Electrical Engineering 201
Multi-port Memory Design for Advanced Computer Architectures
In this thesis, we describe and evaluate novel memory designs for multi-port on-chip and off-chip use in advanced computer architectures. We focus on combining multi-porting and evaluating the performance over a range of design parameters. Multi-porting is essential for caches and shared-data systems, especially multi-core System-on-chips (SOC). It can significantly increase the memory access throughput. We evaluate FinFET voltage-mode multi-port SRAM cells using different metrics including leakage current, static noise margin and read/write performance. Simulation results show that single-ended multi-port FinFET SRAMs with isolated read ports offer improved read stability and flexibility over classical double-ended structures at the expense of write performance. By increasing the size of the
access transistors, we show that the single-ended multi-port structures can achieve equivalent write performance to the classical double-ended multi-port structure for 9% area overhead. Moreover, compared with CMOS SRAM, FinFET SRAM has better stability and standby power. We also describe new methods for the design of FinFET current-mode multi-port
SRAM cells. Current-mode SRAMs avoid the full-swing of the bitline, reducing dynamic power and access time. However, that comes at the cost of voltage drop, which compromises
stability. The design proposed in this thesis utilizes the feature of Independent Gate (IG) mode FinFET, which can leverage threshold voltage by controlling the back gate voltage, to merge two transistors into one through high-Vt and low-Vt transistors. This design not only reduces the voltage drop, but it also reduces the area in multi-port current-mode SRAM design. For off-chip memory, we propose a novel two-port 1-read, 1-write (1R1W) phasechange memory (PCM) cell, which significantly reduces the probability of blocking at the bank levels. Different from the traditional PCM cell, the access transistors are at the top and connected to the bitline. We use Verilog-A to model the behavior of Ge2Sb2Te5 (GST: the storage component). We evaluate the performance of the two-port cell by transistor
sizing and voltage pumping. Simulation results show that pMOS transistor is more practical than nMOS transistor as the access device when both area and power are considered. The estimated area overhead is 1.7ïżœ, compared to single-port PCM cell. In brief, the contribution we make in this thesis is that we propose and evaluate three different kinds of multi-port memories that are favorable for advanced computer architectures
- âŠ