146 research outputs found
Techniques for Aging, Soft Errors and Temperature to Increase the Reliability of Embedded On-Chip Systems
This thesis investigates the challenge of providing an abstracted, yet sufficiently accurate reliability estimation for embedded on-chip systems. In addition, it also proposes new techniques to increase the reliability of register files within processors against aging effects and soft errors. It also introduces a novel thermal measurement setup that perspicuously captures the infrared images of modern multi-core processors
Evaluation of Features Extraction and Classification Techniques for Offline Handwritten Tifinagh Recognition
This paper presents a review on different features extraction and classification methods for off-line handwritten Amazigh characters (called Tifinagh) recognition. The features extraction methods are discussed based on Statistical, Structural, Global transformation and moments.Although a number of techniques are available for feature extraction and classification,but the choice of an excellent technique decides the degree of accuracy of recognition. A series of experimentswere performed on AMHCD databaseallowing to evaluate the effectiveness of different techniques of extraction features based on Hidden Markov models, Neural network and Support vector Machine classifiers. The statistical techniques giveencouraging results
Design automation of approximate circuits with runtime reconfigurable accuracy
Leveraging the inherent error tolerance of a vast number of application domains that are rapidly growing, approximate computing arises as a design alternative to improve the efficiency of our computing systems by trading accuracy for energy savings. However, the requirement for computational accuracy is not fixed. Controlling the applied level of approximation dynamically at runtime is a key to effectively optimize energy, while still containing and bounding the induced errors at runtime. In this paper, we propose and implement an automatic and circuit independent design framework that generates approximate circuits with dynamically reconfigurable accuracy at runtime. The generated circuits feature varying accuracy levels, supporting also accurate execution. Extensive experimental evaluation, using industry strength flow and circuits, demonstrates that our generated approximate circuits improve the energy by up to 41% for 2% error bound and by 17.5% on average under a pessimistic scenario that assumes full accuracy requirement in the 33% of the runtime. To demonstrate further the efficiency of our framework, we considered two state-of-the-art technology libraries which are a 7nm conventional FinFET and an emerging technology that boosts performance at a high cost of increased dynamic power
Brain-Inspired Hyperdimensional Computing: How Thermal-Friendly for Edge Computing?
Brain-inspired hyperdimensional computing (HDC) is an emerging machine
learning (ML) methods. It is based on large vectors of binary or bipolar
symbols and a few simple mathematical operations. The promise of HDC is a
highly efficient implementation for embedded systems like wearables. While fast
implementations have been presented, other constraints have not been considered
for edge computing. In this work, we aim at answering how thermal-friendly HDC
for edge computing is. Devices like smartwatches, smart glasses, or even mobile
systems have a restrictive cooling budget due to their limited volume. Although
HDC operations are simple, the vectors are large, resulting in a high number of
CPU operations and thus a heavy load on the entire system potentially causing
temperature violations. In this work, the impact of HDC on the chip's
temperature is investigated for the first time. We measure the temperature and
power consumption of a commercial embedded system and compare HDC with
conventional CNN. We reveal that HDC causes up to 6.8{\deg}C higher
temperatures and leads to up to 47% more CPU throttling. Even when both HDC and
CNN aim for the same throughput (i.e., perform a similar number of
classifications per second), HDC still causes higher on-chip temperatures due
to the larger power consumption.Comment: 4 pages, 3 figure
Energy Optimization in NCFET-based Processors
Energy consumption is a key optimization goal for all modern processors. Negative Capacitance Field-Effect Transistors (NCFETs) are a leading emerging technology that promises outstanding performance in addition to better energy efficiency. Thickness of the additional ferroelectric layer, frequency, and voltage are the key parameters in NCFET technology that impact the power and frequency of processors. However, their joint impact on energy optimization has not been investigated yet.In this work, we are the first to demonstrate that conventional (i.e., NCFET-unaware) dynamic voltage/frequency scaling (DVFS) techniques to minimize energy are sub-optimal when applied to NCFET-based processors. We further demonstrate that state-of-the-art NCFET-aware voltage scaling for power minimization is also sub-optimal when it comes to energy. This work provides the first NCFET-aware DVFS technique that optimizes the processor\u27s energy through optimal runtime frequency/voltage selection. In NCFETs, energy-optimal frequency and voltage are dependent on the workload and technology parameters. Our NCFET-aware DVFS technique considers these effects to perform optimal voltage/frequency selection at runtime depending on workload characteristics. Results show up to 90 % energy savings compared to conventional DVFS techniques. Compared to state-of-the-art NCFET-aware power management, our technique provides up to 72 % energy savings along with 3.7x higher performance
Compact and High-Performance TCAM Based on Scaled Double-Gate FeFETs
Ternary content addressable memory (TCAM), widely used in network routers and
high-associativity caches, is gaining popularity in machine learning and
data-analytic applications. Ferroelectric FETs (FeFETs) are a promising
candidate for implementing TCAM owing to their high ON/OFF ratio,
non-volatility, and CMOS compatibility. However, conventional single-gate
FeFETs (SG-FeFETs) suffer from relatively high write voltage, low endurance,
potential read disturbance, and face scaling challenges. Recently, a
double-gate FeFET (DG-FeFET) has been proposed and outperforms SG-FeFETs in
many aspects. This paper investigates TCAM design challenges specific to
DG-FeFETs and introduces a novel 1.5T1Fe TCAM design based on DG-FeFETs. A
2-step search with early termination is employed to reduce the cell area and
improve energy efficiency. A shared driver design is proposed to reduce the
peripherals area. Detailed analysis and SPICE simulation show that the 1.5T1Fe
DG-TCAM leads to superior search speed and energy efficiency. The 1.5T1Fe TCAM
design can also be built with SG-FeFETs, which achieve search latency and
energy improvement compared with 2FeFET TCAM.Comment: Accepted by Design Automation Conference (DAC) 202
Unlocking efficiency in BNNs: global by local thresholding for analog-based HW accelerators
For accelerating Binarized Neural Networks (BNNs), analog computing-based crossbar accelerators, utilizing XNOR gates and additional interface circuits, have been proposed. Such accelerators demand a large amount of analog-to-digital converters (ADCs) and registers, resulting in expensive designs. To increase the inference efficiency, the state of the art divides the interface circuit into an Analog Path (AP), utilizing (cheap) analog comparators, and a Digital Path (DP), utilizing (expensive) ADCs and registers. During BNN execution, a certain path is selectively triggered. Ideally, as inference via AP is more efficient, it should be triggered as often as possible. However, we reveal that, unless the number of weights is very small, the AP is rarely triggered. To overcome this, we propose a novel BNN inference scheme, called Local Thresholding Approximation (LTA). It approximates the global thresholdings in BNNs by local thresholdings. This enables the use of the AP through most of the execution, which significantly increases the interface circuit efficiency. In our evaluations with two BNN architectures, using LTA reduces the area by 42x and 54x, the energy by 2.7x and 4.2x, and the latency by 3.8x and 1.15x, compared to the state-of-the-art crossbar-based BNN accelerators
- …