36 research outputs found
Normally-Off Computing Design Methodology Using Spintronics: From Devices to Architectures
Energy-harvesting-powered computing offers intriguing and vast opportunities to dramatically transform the landscape of Internet of Things (IoT) devices and wireless sensor networks by utilizing ambient sources of light, thermal, kinetic, and electromagnetic energy to achieve battery-free computing. In order to operate within the restricted energy capacity and intermittency profile of battery-free operation, it is proposed to innovate Elastic Intermittent Computation (EIC) as a new duty-cycle-variable computing approach leveraging the non-volatility inherent in post-CMOS switching devices. The foundations of EIC will be advanced from the ground up by extending Spin Hall Effect Magnetic Tunnel Junction (SHE-MTJ) device models to realize SHE-MTJ-based Majority Gate (MG) and Polymorphic Gate (PG) logic approaches and libraries, that leverage intrinsic-non-volatility to realize middleware-coherent, intermittent computation without checkpointing, micro-tasking, or software bloat and energy overheads vital to IoT. Device-level EIC research concentrates on encapsulating SHE-MTJ behavior with a compact model to leverage the non-volatility of the device for intrinsic provision of intermittent computation and lifetime energy reduction. Based on this model, the circuit-level EIC contributions will entail the design, simulation, and analysis of PG-based spintronic logic which is adaptable at the gate-level to support variable duty cycle execution that is robust to brief and extended supply outages or unscheduled dropouts, and development of spin-based research synthesis and optimization routines compatible with existing commercial toolchains. These tools will be employed to design a hybrid post-CMOS processing unit utilizing pipelining and power-gating through state-holding properties within the datapath itself, thus eliminating checkpointing and data transfer operations
DIAC: Design Exploration of Intermittent-Aware Computing Realizing Batteryless Systems
Battery-powered IoT devices face challenges like cost, maintenance, and
environmental sustainability, prompting the emergence of batteryless
energy-harvesting systems that harness ambient sources. However, their
intermittent behavior can disrupt program execution and cause data loss,
leading to unpredictable outcomes. Despite exhaustive studies employing
conventional checkpoint methods and intricate programming paradigms to address
these pitfalls, this paper proposes an innovative systematic methodology,
namely DIAC. The DIAC synthesis procedure enhances the performance and
efficiency of intermittent computing systems, with a focus on maximizing
forward progress and minimizing the energy overhead imposed by distinct memory
arrays for backup. Then, a finite-state machine is delineated, encapsulating
the core operations of an IoT node, sense, compute, transmit, and sleep states.
First, we validate the robustness and functionalities of a DIAC-based design in
the presence of power disruptions. DIAC is then applied to a wide range of
benchmarks, including ISCAS-89, MCNS, and ITC-99. The simulation results
substantiate the power-delay-product (PDP) benefits. For example, results for
complex MCNC benchmarks indicate a PDP improvement of 61%, 56%, and 38% on
average compared to three alternative techniques, evaluated at 45 nm.Comment: 6 pages, will be appeared in Design, Automation and Test in Europe
Conference 202
Enabling Intelligent IoTs for Histopathology Image Analysis Using Convolutional Neural Networks
Medical imaging is an essential data source that has been leveraged worldwide in healthcare systems. In pathology, histopathology images are used for cancer diagnosis, whereas these images are very complex and their analyses by pathologists require large amounts of time and effort. On the other hand, although convolutional neural networks (CNNs) have produced near-human results in image processing tasks, their processing time is becoming longer and they need higher computational power. In this paper, we implement a quantized ResNet model on two histopathology image datasets to optimize the inference power consumption. We analyze classification accuracy, energy estimation, and hardware utilization metrics to evaluate our method. First, the original RGBcolored images are utilized for the training phase, and then compression methods such as channel reduction and sparsity are applied. Our results show an accuracy increase of 6% from RGB on 32-bit (baseline) to the optimized representation of sparsity on RGB with a lower bit-width, i.e., \u3c8:8\u3e. For energy estimation on the used CNN model, we found that the energy used in RGB color mode with 32-bit is considerably higher than the other lower bit-width and compressed color modes. Moreover, we show that lower bit-width implementations yield higher resource utilization and a lower memory bottleneck ratio. This work is suitable for inference on energy-limited devices, which are increasingly being used in the Internet of Things (IoT) systems that facilitate healthcare systems
Ultra-Fast, High-Performance 8x8 Approximate Multipliers by a New Multicolumn 3,3:2 Inexact Compressor and its Derivatives
Multiplier, as a key role in many different applications, is a
time-consuming, energy-intensive computation block. Approximate computing is a
practical design paradigm that attempts to improve hardware efficacy while
keeping computation quality satisfactory. A novel multicolumn 3,3:2 inexact
compressor is presented in this paper. It takes three partial products from two
adjacent columns each for rapid partial product reduction. The proposed inexact
compressor and its derivates enable us to design a high-speed approximate
multiplier. Then, another ultra-fast, high-efficient approximate multiplier is
achieved utilizing a systematic truncation strategy. The proposed multipliers
accumulate partial products in only two stages, one fewer stage than other
approximate multipliers in the literature. Implementation results by Synopsys
Design Compiler and 45 nm technology node demonstrates nearly 11.11% higher
speed for the second proposed design over the fastest existing approximate
multiplier. Furthermore, the new approximate multipliers are applied to the
image processing application of image sharpening, and their performance in this
application is highly satisfactory. It is shown in this paper that the error
pattern of an approximate multiplier, in addition to the mean error distance
and error rate, has a direct effect on the outcomes of the image processing
application.Comment: 21 Pages, 18 Figures, 6 Table
OISA: Architecting an Optical In-Sensor Accelerator for Efficient Visual Computing
Targeting vision applications at the edge, in this work, we systematically
explore and propose a high-performance and energy-efficient Optical In-Sensor
Accelerator architecture called OISA for the first time. Taking advantage of
the promising efficiency of photonic devices, the OISA intrinsically implements
a coarse-grained convolution operation on the input frames in an innovative
minimum-conversion fashion in low-bit-width neural networks. Such a design
remarkably reduces the power consumption of data conversion, transmission, and
processing in the conventional cloud-centric architecture as well as
recently-presented edge accelerators. Our device-to-architecture simulation
results on various image data-sets demonstrate acceptable accuracy while OISA
achieves 6.68 TOp/s/W efficiency. OISA reduces power consumption by a factor of
7.9 and 18.4 on average compared with existing electronic in-/near-sensor and
ASIC accelerators.Comment: 7 page
Nv-Clustering: Normally-Off Computing Using Non-Volatile Datapaths
With technology downscaling, static power dissipation presents a crucial challenge to multicore, many-core, and System-on-Chip (SoC) architectures due to the increased role of leakage currents in overall energy consumption and the need to support power-gating schemes. Herein, a non-Volatile (NV) flip-flop design approach, referred to as NV Clustering, is developed to realize middleware-transparent intermittent computing. First, a Logic-Embedded Flip-Flop (LE-FF) is developed to realize rudimentary Boolean logic functions along with an inherent state-holding capability within a compact footprint. Second, the NV-Clustering synthesis procedure and corresponding tool module are utilized to instantiate the LE-FF library cells within conventional Register Transfer Language (RTL) specifications. This selectively clusters together logic and NV state-holding functionality, based on energy and area minimization criteria. NV-Clustering is applied to a wide range of benchmarks including ISCAS-89, MCNS, and ITC-99 computational circuits using a LE-FF based on the Spin Hall Effect (SHE)-assisted Spin Transfer Torque (STT) Magnetic Tunnel Junction (MTJ). Simulation results validate functionality and power dissipation, area, and delay benefits. For instance, results for ISCAS-89 benchmarks indicate 15 percent area reduction on average, up to 22 percent reduction in energy consumption, and up to 14 percent reduction in delay as compared to alternative NV-FF based designs, as evaluated via SPICE simulation at the 45-nm technology node
Logic-Encrypted Synthesis For Energy-Harvesting-Powered Spintronic-Embedded Datapath Design
The objectives of advancing secure, intermittency-tolerant, and energy-aware logic datapaths are addressed herein by developing a spin-based design methodology and its corresponding synthesis steps. The approach selectively-inserts Non-Volatile (NV) Polymorphic Gates (PGs) to realize datapaths which are suitable for intrinsic operation in Energy-Harvesting-Powered (EHP) devices. Spin Hall Effect (SHE)-based Magnetic Tunnel (MTJs) are utilized to design NV-PGs, which are combined within a Flip-Flop (FF) circuit to develop a PG-FF realizing Boolean logic functions with inherent state-holding capability. The reconfigurability of PGs is leveraged for logic-encryption to enhance the security of the developed intermittency-resilient circuits, which are applied to ISCAS-89, MCNS, and ITC-99 benchmarks. The results obtained indicate that the PG-FF based design can achieve up to 7.1% and 13.6% improvements in terms of area and Power Delay Product (PDP), respectively, compared to NV-FF based methodologies that replace the CMOS-based FFs with NV-FFs. Further PDP improvements are achieved by using low-energy barrier SHE-MTJ devices within the PG-FF circuit. SHE-MTJs with 30kT energy exhibit 40.5% reduction in PDP at the cost of lower retention times in the range of minutes, which is still sufficient to achieve forward progress in EHP devices having more than hundreds of power-on and power-off cycles per minute