Search CORE

17 research outputs found

ENERGY-EFFICIENT AND SECURE HARDWARE FOR INTERNET OF THINGS (IoT) DEVICES

Author: Selvakumaran Dinesh Kumar
Publication venue: UKnowledge
Publication date: 01/01/2018
Field of study

Internet of Things (IoT) is a network of devices that are connected through the Internet to exchange the data for intelligent applications. Though IoT devices provide several advantages to improve the quality of life, they also present challenges related to security. The security issues related to IoT devices include leakage of information through Differential Power Analysis (DPA) based side channel attacks, authentication, piracy, etc. DPA is a type of side-channel attack where the attacker monitors the power consumption of the device to guess the secret key stored in it. There are several countermeasures to overcome DPA attacks. However, most of the existing countermeasures consume high power which makes them not suitable to implement in power constraint devices. IoT devices are battery operated, hence it is important to investigate the methods to design energy-efficient and secure IoT devices not susceptible to DPA attacks. In this research, we have explored the usefulness of a novel computing platform called adiabatic logic, low-leakage FinFET devices and Magnetic Tunnel Junction (MTJ) Logic-in-Memory (LiM) architecture to design energy-efficient and DPA secure hardware. Further, we have also explored the usefulness of adiabatic logic in the design of energy-efficient and reliable Physically Unclonable Function (PUF) circuits to overcome the authentication and piracy issues in IoT devices. Adiabatic logic is a low-power circuit design technique to design energy-efficient hardware. Adiabatic logic has reduced dynamic switching energy loss due to the recycling of charge to the power clock. As the first contribution of this dissertation, we have proposed a novel DPA-resistant adiabatic logic family called Energy-Efficient Secure Positive Feedback Adiabatic Logic (EE-SPFAL). EE-SPFAL based circuits are energy-efficient compared to the conventional CMOS based design because of recycling the charge after every clock cycle. Further, EE-SPFAL based circuits consume uniform power irrespective of input data transition which makes them resilience against DPA attacks. Scaling of CMOS transistors have served the industry for more than 50 years in providing integrated circuits that are denser, and cheaper along with its high performance, and low power. However, scaling of the transistors leads to increase in leakage current. Increase in leakage current reduces the energy-efficiency of the computing circuits,and increases their vulnerability to DPA attack. Hence, it is important to investigate the crypto circuits in low leakage devices such as FinFET to make them energy-efficient and DPA resistant. In this dissertation, we have proposed a novel FinFET based Secure Adiabatic Logic (FinSAL) family. FinSAL based designs utilize the low-leakage FinFET device along with adiabatic logic principles to improve energy-efficiency along with its resistance against DPA attack. Recently, Magnetic Tunnel Junction (MTJ)/CMOS based Logic-in-Memory (LiM) circuits have been explored to design low-power non-volatile hardware. Some of the advantages of MTJ device include non-volatility, near-zero leakage power, high integration density and easy compatibility with CMOS devices. However, the differences in power consumption between the switching of MTJ devices increase the vulnerability of Differential Power Analysis (DPA) based side-channel attack. Further, the MTJ/CMOS hybrid logic circuits which require frequent switching of MTJs are not very energy-efficient due to the significant energy required to switch the MTJ devices. In the third contribution of this dissertation, we have investigated a novel approach of building cryptographic hardware in MTJ/CMOS circuits using Look-Up Table (LUT) based method where the data stored in MTJs are constant during the entire encryption/decryption operation. Currently, high supply voltage is required in both writing and sensing operations of hybrid MTJ/CMOS based LiM circuits which consumes a considerable amount of energy. In order to meet the power budget in low-power devices, it is important to investigate the novel design techniques to design ultra-low-power MTJ/CMOS circuits. In the fourth contribution of this dissertation, we have proposed a novel energy-efficient Secure MTJ/CMOS Logic (SMCL) family. The proposed SMCL logic family consumes uniform power irrespective of data transition in MTJ and more energy-efficient compared to the state-of-art MTJ/ CMOS designs by using charge sharing technique. The other important contribution of this dissertation is the design of reliable Physical Unclonable Function (PUF). Physically Unclonable Function (PUF) are circuits which are used to generate secret keys to avoid the piracy and device authentication problems. However, existing PUFs consume high power and they suffer from the problem of generating unreliable bits. This dissertation have addressed this issue in PUFs by designing a novel adiabatic logic based PUF. The time ramp voltages in adiabatic PUF is utilized to improve the reliability of the PUF along with its energy-efficiency. Reliability of the adiabatic logic based PUF proposed in this dissertation is tested through simulation based temperature variations and supply voltage variations

University of Kentucky

A Survey on Low-Power Techniques with Emerging Technologies: From Devices to Systems

Author: Beigné Edith
De Micheli Giovanni
Gaillardon Pierre-Emmanuel
Lesecq Suzanne
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/04/2015
Field of study

Nowadays, power consumption is one of the main limitations of electronic systems. In this context, novel and emerging devices provide us with new opportunities to keep the trend to low-power design. In this survey paper, we present a transversal survey on energy efficient techniques ranging from devices to architectures. The actual trends of device research, with fully-depleted planar devices, tri-gate geometries and gate-all-around structures, allows us to reach an increasingly higher level of performance while reducing the associated power. In addition, beyond the simple device properties enhancements, emerging devices also lead to innovations at circuit and architectural levels. In particular, devices whose properties can be tuned through additional terminals enable a fine and dynamic control of device threshold. They also enable designers to realize logic gates and to implement power-related techniques in a compact way unreachable to standard technologies. These innovations reduce the power consumption at the gate level and unlock new means of actuation in architectural solutions like adaptive voltage and frequency scaling

Infoscience - École polytechnique fédérale de Lausanne

Near-Threshold Computing: Past, Present, and Future.

Author: Pinckney Nathaniel Ross
Publication venue
Publication date
Field of study

Transistor threshold voltages have stagnated in recent years, deviating from constant-voltage scaling theory and directly limiting supply voltage scaling. To overcome the resulting energy and power dissipation barriers, energy efficiency can be improved through aggressive voltage scaling, and there has been increased interest in operating at near-threshold computing (NTC) supply voltages. In this region sizable energy gains are achieved with moderate performance loss, some of which can be regained through parallelism. This thesis first provides a methodical definition of how near to threshold is "near threshold" and continues with an in-depth examination of NTC across past, present, and future CMOS technologies. By systematically defining near-threshold, the trends and tradeoffs are analyzed, lending insight in how best to design and optimize near-threshold systems. NTC works best for technologies that feature good circuit delay scalability, therefore technologies without strong short-channel effects. Early planar technologies (prior to 90nm or so) featured good circuit scalability (8x energy gains), but lacked area in which to add cores for parallelization. Recent planar nodes (32nm – 20nm) feature more area for cores but suffer from poor delay scalability, and so are not well-suited for NTC (4x energy gains). The switch to FinFET CMOS technology allows for a return to strong voltage scalability (8x gain), reversing trends seen in planar technologies, while dark silicon has created an opportunity to add cores for parallelization. Improved FinFET voltage scalability even allows for latency reduction of a single task, as long as the task is sufficiently parallelizable (< 10% serial code). Finally, we will look at a technique for fast voltage boosting, called Shortstop, in which a core's operating voltage is raised in 10s of cycles. Shortstop can be used to quickly respond to single-threaded performance demands of a near-threshold system by leveraging the innate parasitic inductance of a dedicated dirty supply rail, further improving energy efficiency. The technique is demonstrated in a wirebond implementation and is able to boost a core up to 1.8x faster than a header-based approach, while reducing supply droop by 2-7x. An improved flip-chip architecture is also proposed.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113600/1/npfet_1.pd

Deep Blue Documents at the University of Michigan

Computing with Spintronics: Circuits and architectures

Author: Venkatesan Rangharajan
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2014
Field of study

This thesis makes the following contributions towards the design of computing platforms with spintronic devices. 1) It explores the use of spintronic memories in the design of a domain-specific processor for an emerging class of data-intensive applications, namely recognition, mining and synthesis (RMS). Two different spintronic memory technologies — Domain Wall Memory (DWM) and STT-MRAM — are utilized to realize the different levels in the memory hierarchy of the domain-specific processor, based on their respective access characteristics. Architectural tradeoffs created by the use of spintronic memories are analyzed. The proposed design achieves 1.5X-4X improvements in energy-delay product compared to a CMOS baseline. 2) It describes the first attempt to use DWM in the cache hierarchy of general-purpose processors. DWM promises unparalleled density by packing several bits of data into each bit-cell. TapeCache, the proposed DWM-based cache architecture, utilizes suitable circuit and architectural optimizations to address two key challenges (i) the high energy and latency requirement of write operations and (ii) the need for shift operations to access the data stored in each DWM bit-cell. At the circuit level, DWM bit-cells that are tailored to the distinct design requirements of different levels in the cache hierarchy are proposed. At the architecture level, TapeCache proposes suitable cache organization and management policies to alleviate the performance impact of shift operations required to access data stored in DWM bit-cells. TapeCache achieves more than 7X improvements in both cache area and energy with virtually identical performance compared to an SRAM-based cache hierarchy. 3) It investigates the design of the on-chip memory hierarchy of general-purpose graphics processing units (GPGPUs)—massively parallel processors that are optimized for data-intensive high-throughput workloads—using DWM. STAG, a high density, energy-efficient Spintronic- Tape Architecture for GPGPU cache hierarchies is described. STAG utilizes different DWM bit-cells to realize different memory arrays in the GPGPU cache hierarchy. To address the challenge of high access latencies due to shifts, STAG predicts upcoming cache accesses by leveraging unique characteristics of GPGPU architectures and workloads, and prefetches data that are both likely to be accessed and require large numbers of shift operations. STAG achieves 3.3X energy reduction and 12.1% performance improvement over CMOS SRAM under iso-area conditions. 4) While the potential of spintronic devices for memories is widely recognized, their utility in realizing logic is much less clear. The thesis presents Spintastic, a new paradigm that utilizes Stochastic Computing (SC) to realize spintronic logic. In SC, data is encoded in the form of pseudo-random bitstreams, such that the probability of a \u271\u27 in a bitstream corresponds to the numerical value that it represents. SC can enable compact, low-complexity logic implementations of various arithmetic functions. Spintastic establishes the synergy between stochastic computing and spin-based logic by demonstrating that they mutually alleviate each other\u27s limitations. On the one hand, various building blocks of SC, which incur significant overheads in CMOS implementations, can be efficiently realized by exploiting the physical characteristics of spin devices. On the other hand, the reduced logic complexity and low logic depth of SC circuits alleviates the shortcomings of spintronic logic. Based on this insight, the design of spin-based stochastic arithmetic circuits, bitstream generators, bitstream permuters and stochastic-to-binary converter circuits are presented. Spintastic achieves 7.1X energy reduction over CMOS implementations for a wide range of benchmarks from the image processing, signal processing, and RMS application domains. 5) In order to evaluate the proposed spintronic designs, the thesis describes various device-to-architecture modeling frameworks. Starting with devices models that are calibrated to measurements, the characteristics of spintronic devices are successively abstracted into circuit-level and architectural models, which are incorporated into suitable simulation frameworks. (Abstract shortened by UMI.

Purdue E-Pubs

Energy Aware Runtime Systems for Elastic Stream Processing Platforms

Author: Rexha Hergys
Publication venue: Åbo Akademi - Åbo Akademi University
Publication date: 01/01/2023
Field of study

Following an invariant growth in the required computational performance of processors, the multicore revolution started around 20 years ago. This revolution was mainly an answer to power dissipation constraints restricting the increase of clock frequency in single-core processors. The multicore revolution not only brought in the challenge of parallel programming, i.e. being able to develop software exploiting the entire capabilities of manycore architectures, but also the challenge of programming heterogeneous platforms. The question of “on which processing element to map a specific computational unit?”, is well known in the embedded community. With the introduction of general-purpose graphics processing units (GPGPUs), digital signal processors (DSPs) along with many-core processors on different system-on-chip platforms, heterogeneous parallel platforms are nowadays widespread over several domains, from consumer devices to media processing platforms for telecom operators. Finding mapping together with a suitable hardware architecture is a process called design-space exploration. This process is very challenging in heterogeneous many-core architectures, which promise to offer benefits in terms of energy efficiency. The main problem is the exponential explosion of space exploration. With the recent trend of increasing levels of heterogeneity in the chip, selecting the parameters to take into account when mapping software to hardware is still an open research topic in the embedded area. For example, the current Linux scheduler has poor performance when mapping tasks to computing elements available in hardware. The only metric considered is CPU workload, which as was shown in recent work does not match true performance demands from the applications. Doing so may produce an incorrect allocation of resources, resulting in a waste of energy. The origin of this research work comes from the observation that these approaches do not provide full support for the dynamic behavior of stream processing applications, especially if these behaviors are established only at runtime. This research will contribute to the general goal of developing energy-efficient solutions to design streaming applications on heterogeneous and parallel hardware platforms. Streaming applications are nowadays widely spread in the software domain. Their distinctive characiteristic is the retrieving of multiple streams of data and the need to process them in real time. The proposed work will develop new approaches to address the challenging problem of efficient runtime coordination of dynamic applications, focusing on energy and performance management.Efter en oföränderlig tillväxt i prestandakrav hos processorer, började den flerkärniga processor-revolutionen för ungefär 20 år sedan. Denna revolution skedde till största del som en lösning till begränsningar i energieffekten allt eftersom klockfrekvensen kontinuerligt höjdes i en-kärniga processorer. Den flerkärniga processor-revolutionen medförde inte enbart utmaningen gällande parallellprogrammering, m.a.o. förmågan att utveckla mjukvara som använder sig av alla delelement i de flerkärniga processorerna, men också utmaningen med programmering av heterogena plattformar. Frågeställningen ”på vilken processorelement skall en viss beräkning utföras?” är väl känt inom ramen för inbyggda datorsystem. Efter introduktionen av grafikprocessorer för allmänna beräkningar (GPGPU), signalprocesserings-processorer (DSP) samt flerkärniga processorer på olika system-on-chip plattformar, är heterogena parallella plattformar idag omfattande inom många domäner, från konsumtionsartiklar till mediaprocesseringsplattformar för telekommunikationsoperatörer. Processen att placera beräkningarna på en passande hårdvaruplattform kallas för utforskning av en designrymd (design-space exploration). Denna process är mycket utmanande för heterogena flerkärniga arkitekturer, och kan medföra fördelar när det gäller energieffektivitet. Det största problemet är att de olika valmöjligheterna i designrymden kan växa exponentiellt. Enligt den nuvarande trenden som förespår ökad heterogeniska aspekter i processorerna är utmaningen att hitta den mest passande placeringen av beräkningarna på hårdvaran ännu en forskningsfråga inom ramen för inbyggda datorsystem. Till exempel, den nuvarande schemaläggaren i Linux operativsystemet är inkapabel att hitta en effektiv placering av beräkningarna på den underliggande hårdvaran. Det enda mätsättet som används är processorns belastning vilket, som visats i tidigare forskning, inte motsvarar den verkliga prestandan i applikationen. Användning av detta mätsätt vid resursallokering resulterar i slöseri med energi. Denna forskning härstammar från observationerna att dessa tillvägagångssätt inte stöder det dynamiska beteendet hos ström-processeringsapplikationer (stream processing applications), speciellt om beteendena bara etableras vid körtid. Denna forskning kontribuerar till det allmänna målet att utveckla energieffektiva lösningar för ström-applikationer (streaming applications) på heterogena flerkärniga hårdvaruplattformar. Ström-applikationer är numera mycket vanliga i mjukvarudomän. Deras distinkta karaktär är inläsning av flertalet dataströmmar, och behov av att processera dem i realtid. Arbetet i denna forskning understöder utvecklingen av nya sätt för att lösa det utmanade problemet att effektivt koordinera dynamiska applikationer i realtid och fokus på energi- och prestandahantering

National Library of Finland DSpace Services

Energy efficient core designs for upcoming process technologies

Author: Gopi Reddy Bhargava Reddy
Publication venue
Publication date: 01/05/2019
Field of study

Energy efficiency has been a first order constraint in the design of micro processors for the last decade. As Moore's law sunsets, new technologies are being actively explored to extend the march in increasing the computational power and efficiency. It is essential for computer architects to understand the opportunities and challenges in utilizing the upcoming process technology trends in order to design the most efficient processors. In this work, we consider three process technology trends and propose core designs that are best suited for each of the technologies. The process technologies are expected to be viable over a span of timelines. We first consider the most popular method currently available to improve the energy efficiency, i.e. by lowering the operating voltage. We make key observations regarding the limiting factors in scaling down the operating voltage for general purpose high performance processors. Later, we propose our novel core design, ScalCore, one that can work in high performance mode at nominal Vdd, and in a very energy-efficient mode at low Vdd. The resulting core design can operate at much lower voltages providing higher parallel performance while consuming lower energy. While lowering Vdd improves the energy efficiency, CMOS devices are fundamentally limited in their low voltage operation. Therefore, we next consider an upcoming device technology -- Tunneling Field-Effect Transistors (TFETs), that is expected to supplement CMOS device technology in the near future. TFETs can attain much higher energy efficiency than CMOS at low voltages. However, their performance saturates at high voltages and, therefore, cannot entirely replace CMOS when high performance is needed. Ideally, we desire a core that is as energy-efficient as TFET and provides as much performance as CMOS. To reach this goal, we characterize the TFET device behavior for core design and judiciously integrate TFET units, CMOS units in a single core. The resulting core, called HetCore, can provide very high energy efficiency while limiting the slowdown when compared to a CMOS core. Finally, we analyze Monolithic 3D (M3D) integration technology that is widely considered to be the only way to integrate more transistors on a chip. We present the first analysis of the architectural implications of using M3D for core design and show how to partition the core across different layers. We also address one of the key challenges in realizing the technology, namely, the top layer performance degradation. We propose a critical path based partitioning for logic stages and asymmetric bit/port partitioning for storage stages. The result is a core that performs nearly as well as a core without any top layer slowdown. When compared to a 2D baseline design, an M3D core not only provides much higher performance, it also reduces the energy consumption at the same time. In summary, this thesis addresses one of the fundamental challenges in computer architecture -- overcoming the fact that CMOS is not scaling anymore. As we increase the computing power on a single chip, our ability to power the entire chip keeps decreasing. This thesis proposes three solutions aimed at solving this problem over different timelines. Across all our solutions, we improve energy efficiency without compromising the performance of the core. As a result, we are able to operate twice as many cores with in the same power budget as regular cores, significantly alleviating the problem of dark silicon

Illinois Digital Environment for Access to Learning and Scholarship Repository

Recommended from our members

Design and performance optimization of asynchronous networks-on-chip

Author: Jiang Weiwei
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

As digital systems continue to grow in complexity, the design of conventional synchronous systems is facing unprecedented challenges. The number of transistors on individual chips is already in the multi-billion range, and a greatly increasing number of components are being integrated onto a single chip. As a consequence, modern digital designs are under strong time-to-market pressure, and there is a critical need for composable design approaches for large complex systems. In the past two decades, networks-on-chip (NoC’s) have been a highly active research area. In a NoC-based system, functional blocks are first designed individually and may run at different clock rates. These modules are then connected through a structured network for on-chip global communication. However, due to the rigidity of centrally-clocked NoC’s, there have been bottlenecks of system scalability, energy and performance, which cannot be easily solved with synchronous approaches. As a result, there has been significant recent interest in combing the notion of asynchrony with NoC designs. Since the NoC approach inherently separates the communication infrastructure, and its timing, from computational elements, it is a natural match for an asynchronous paradigm. Asynchronous NoC’s, therefore, enable a modular and extensible system composition for an ‘object-orient’ design style. The thesis aims to significantly advance the state-of-art and viability of asynchronous and globally-asynchronous locally-synchronous (GALS) networks-on-chip, to enable high-performance and low-energy systems. The proposed asynchronous NoC’s are nearly entirely based on standard cells, which eases their integration into industrial design flows. The contributions are instantiated in three different directions. First, practical acceleration techniques are proposed for optimizing the system latency, in order to break through the latency bottleneck in the memory interfaces of many on-chip parallel processors. Novel asynchronous network protocols are proposed, along with concrete NoC designs. A new concept, called ‘monitoring network’, is introduced. Monitoring networks are lightweight shadow networks used for fast-forwarding anticipated traffic information, ahead of the actual packet traffic. The routers are therefore allowed to initiate and perform arbitration and channel allocation in advance. The technique is successfully applied to two topologies which belong to two different categories – a variant mesh-of-trees (MoT) structure and a 2D-mesh topology. Considerable and stable latency improvements are observed across a wide range of traffic patterns, along with moderate throughput gains. Second, for the first time, a high-performance and low-power asynchronous NoC router is compared directly to a leading commercial synchronous counterpart in an advanced industrial technology. The asynchronous router design shows significant performance improvements, as well as area and power savings. The proposed asynchronous router integrates several advanced techniques, including a low-latency circular FIFO for buffer design, and a novel end-to-end credit-based virtual channel (VC) flow control. In addition, a semi-automated design flow is created, which uses portions of a standard synchronous tool flow. Finally, a high-performance multi-resource asynchronous arbiter design is developed. This small but important component can be directly used in existing asynchronous NoC’s for performance optimization. In addition, this standalone design promises use in opening up new NoC directions, as well as for general use in parallel systems. In the proposed arbiter design, the allocation of a resource to a client is divided into several steps. Multiple successive client-resource pairs can be selected rapidly in pipelined sequence, and the completion of the assignments can overlap in parallel. In sum, the thesis provides a set of advanced design solutions for performance optimization of asynchronous and GALS networks-on-chip. These solutions are at different levels, from network protocols, down to router- and component-level optimizations, which can be directly applied to existing basic asynchronous NoC designs to provide a leap in performance improvement

Columbia University Academic Commons

Predicting power scalability in a reconfigurable platform

Author: Beckett P
Publication venue: RMIT University
Publication date: 01/01/2007
Field of study

This thesis focuses on the evolution of digital hardware systems. A reconfigurable platform is proposed and analysed based on thin-body, fully-depleted silicon-on-insulator Schottky-barrier transistors with metal gates and silicide source/drain (TBFDSBSOI). These offer the potential for simplified processing that will allow them to reach ultimate nanoscale gate dimensions. Technology CAD was used to show that the threshold voltage in TBFDSBSOI devices will be controllable by gate potentials that scale down with the channel dimensions while remaining within appropriate gate reliability limits. SPICE simulations determined that the magnitude of the threshold shift predicted by TCAD software would be sufficient to control the logic configuration of a simple, regular array of these TBFDSBSOI transistors as well as to constrain its overall subthreshold power growth. Using these devices, a reconfigurable platform is proposed based on a regular 6-input, 6-output NOR LUT block in which the logic and configuration functions of the array are mapped onto separate gates of the double-gate device. A new analytic model of the relationship between power (P), area (A) and performance (T) has been developed based on a simple VLSI complexity metric of the form ATσ = constant. As σ defines the performance “return” gained as a result of an increase in area, it also represents a bound on the architectural options available in power-scalable digital systems. This analytic model was used to determine that simple computing functions mapped to the reconfigurable platform will exhibit continuous power-area-performance scaling behavior. A number of simple arithmetic circuits were mapped to the array and their delay and subthreshold leakage analysed over a representative range of supply and threshold voltages, thus determining a worse-case range for the device/circuit-level parameters of the model. Finally, an architectural simulation was built in VHDL-AMS. The frequency scaling described by σ, combined with the device/circuit-level parameters predicts the overall power and performance scaling of parallel architectures mapped to the array

RMIT Research Repository

Prototyping Methodologies and Design of Communication-centric Heterogeneous Many-core Architectures

Author: Masing Leonard Jannik
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2020
Field of study

KITopen