67 research outputs found
Energy-Efficient Inexact Speculative Adder with High Performance and Accuracy Control
Inexact and approximate circuit design is a promising approach to improve performance and energy efficiency in technology-scaled and low-power digital systems. Such strategy is suitable for error-tolerant applications involving perceptive or statistical outputs. This paper presents a novel architecture of an Inexact Speculative Adder with optimized hardware efficiency and advanced compensation technique with either error correction or error reduction. This general topology of speculative adders improves performance and enables precise accuracy control. A brief design methodology and comparative study of this speculative adder are also presented herein, demonstrating power savings up to 26 % and energy-delay-area reductions up to 60% at equivalent accuracy compared to the state-of-the-art
Poster: Energy-Efficient Inexact Speculative Adder with High Performance and Accuracy Control
Approximate circuit design is a promising approach to improve performances and energy efficiency beyond the boundaries of conventional digital circuits. Such strategy is suitable for error-tolerant applications involving perceptive or statistical outputs. This work presents a novel architecture of an Inexact Speculative Adder for high-performance applications with optimized hardware and advanced error compensation technique. This adder allows precise tuning of accuracy to match design specifications while optimizing performances and energy efficiency. It demonstrates power savings up to 26% and energy-delay-area reductions up to 60% compared to the state-of-the-art
Design and Evaluation of Approximate Logarithmic Multipliers for Low Power Error-Tolerant Applications
In this work, the designs of both non-iterative and iterative approximate logarithmic multipliers (LMs) are studied to further reduce power consumption and improve performance. Non-iterative approximate LMs (ALMs) that use three inexact mantissa adders, are presented. The proposed iterative approximate logarithmic multipliers (IALMs) use a set-one adder in both mantissa adders during an iteration; they also use lower-part-or adders and approximate mirror adders for the final addition. Error analysis and simulation results are also provided; it is found that the proposed approximate LMs with an appropriate number of inexact bits achieve a higher accuracy and lower power consumption than conventional LMs using exact units. Compared with conventional LMs with exact units, the normalized mean error distance (NMED) of 16-bit approximate LMs is decreased by up to 18% and the power-delay product (PDP) has a reduction of up to 37%. The proposed approximate LMs are also compared with previous approximate multipliers; it is found that the proposed approximate LMs are best suitable for applications allowing larger errors, but requiring lower energy consumption and low power. Approximate Booth multipliers fit applications with less stringent power requirements, but also requiring smaller errors. Case studies for error-tolerant computing applications are provided
Electronic systems for the restoration of the sense of touch in upper limb prosthetics
In the last few years, research on active prosthetics for upper limbs focused
on improving the human functionalities and the control. New methods have
been proposed for measuring the user muscle activity and translating it into
the prosthesis control commands. Developing the feed-forward interface so
that the prosthesis better follows the intention of the user is an important
step towards improving the quality of life of people with limb amputation.
However, prosthesis users can neither feel if something or someone is
touching them over the prosthesis and nor perceive the temperature or
roughness of objects. Prosthesis users are helped by looking at an object,
but they cannot detect anything otherwise. Their sight gives them most
information. Therefore, to foster the prosthesis embodiment and utility,
it is necessary to have a prosthetic system that not only responds to the
control signals provided by the user, but also transmits back to the user
the information about the current state of the prosthesis.
This thesis presents an electronic skin system to close the loop in prostheses
towards the restoration of the sense of touch in prosthesis users. The
proposed electronic skin system inlcudes an advanced distributed sensing
(electronic skin), a system for (i) signal conditioning, (ii) data acquisition,
and (iii) data processing, and a stimulation system. The idea is to integrate
all these components into a myoelectric prosthesis.
Embedding the electronic system and the sensing materials is a critical issue
on the way of development of new prostheses. In particular, processing
the data, originated from the electronic skin, into low- or high-level information
is the key issue to be addressed by the embedded electronic system.
Recently, it has been proved that the Machine Learning is a promising
approach in processing tactile sensors information. Many studies have
been shown the Machine Learning eectiveness in the classication of input
touch modalities.More specically, this thesis is focused on the stimulation system, allowing
the communication of a mechanical interaction from the electronic skin
to prosthesis users, and the dedicated implementation of algorithms for
processing tactile data originating from the electronic skin. On system
level, the thesis provides design of the experimental setup, experimental
protocol, and of algorithms to process tactile data. On architectural level,
the thesis proposes a design
ow for the implementation of digital circuits
for both FPGA and integrated circuits, and techniques for the power
management of embedded systems for Machine Learning algorithms
Near/Sub-Threshold Circuits and Approximate Computing: The Perfect Combination for Ultra-Low-Power Systems
While sub/near-threshold design offers the minimal power and energy consumption, such approach strongly deteriorates circuit performances and robustness against PVT (process/voltage/temperature) variations, leading to gigantic speed penalties and large silicon areas. Inexact and approximate circuit design can address these issues by trading calculation accuracy for better silicon area, circuit speed and even better power consumption. This paper reviews and proposes improvements for two approximate computing techniques applicable to arithmetic circuits: gate-level pruning and carry speculation. A critical study is then carried out considering several error metrics, and for the first time, those techniques are combined to produce approximate adders showing even higher gains at similar error levels. It is then shown that those techniques can be applied to a sub-threshold library to mitigate the large variability
Energy-efficient embedded machine learning algorithms for smart sensing systems
Embedded autonomous electronic systems are required in numerous application domains such as Internet of Things (IoT), wearable devices, and biomedical systems. Embedded electronic systems usually host sensors, and each sensor hosts multiple input channels (e.g., tactile, vision), tightly coupled to the electronic computing unit (ECU). The ECU extracts information by often employing sophisticated methods, e.g., Machine Learning. However, embedding Machine Learning algorithms poses essential challenges in terms of hardware resources and energy consumption because of: 1) the high amount of data to be processed; 2) computationally demanding methods. Leveraging on the trade-off between quality requirements versus computational complexity and time latency could reduce the system complexity without affecting the performance. The objectives of the thesis are to develop: 1) energy-efficient arithmetic circuits outperforming state of the art solutions for embedded machine learning algorithms, 2) an energy-efficient embedded electronic system for the \u201celectronic-skin\u201d (e-skin) application. As such, this thesis exploits two main approaches:
Approximate Computing: In recent years, the approximate computing paradigm became a significant major field of research since it is able to enhance the energy efficiency and performance of digital systems. \u201cApproximate Computing\u201d(AC) turned out to be a practical approach to trade accuracy for better power, latency, and size . AC targets error-resilient applications and offers promising benefits by conserving some resources. Usually, approximate results are acceptable for many applications, e.g., tactile data processing,image processing , and data mining ; thus, it is highly recommended to take advantage of energy reduction with minimal variation in performance . In our work, we developed two approximate multipliers: 1) the first one is called \u201cMETA\u201d multiplier and is based on the Error Tolerant Adder (ETA), 2) the second one is called \u201cApproximate Baugh-Wooley(BW)\u201d multiplier where the approximations are implemented in the generation of the partial products. We showed that the proposed approximate arithmetic circuits could achieve a relevant reduction in power consumption and time delay around 80.4% and 24%, respectively, with respect to the exact BW multiplier. Next, to prove the feasibility of AC in real world applications, we explored the approximate multipliers on a case study as the e-skin application. The e-skin application is defined as multiple sensing components, including 1) structural materials, 2) signal processing, 3) data acquisition, and 4) data processing. Particularly, processing the originated data from the e-skin into low or high-level information is the main problem to be addressed by the embedded electronic system. Many studies have shown that Machine Learning is a promising approach in processing tactile data when classifying input touch modalities. In our work, we proposed a methodology for evaluating the behavior of the system when introducing approximate arithmetic circuits in the main stages (i.e., signal and data processing stages) of the system. Based on the proposed methodology, we first implemented the approximate multipliers on the low-pass Finite Impulse Response (FIR) filter in the signal processing stage of the application. We noticed that the FIR filter based on (Approx-BW) outperforms state of the art solutions, while respecting the tradeoff between accuracy and power consumption, with an SNR degradation of 1.39dB. Second, we implemented approximate adders and multipliers respectively into the Coordinate Rotational Digital Computer (CORDIC) and the Singular Value Decomposition (SVD) circuits; since CORDIC and SVD take a significant part of the computationally expensive Machine Learning algorithms employed in tactile data processing. We showed benefits of up to 21% and 19% in power reduction at the cost of less than 5% accuracy loss for CORDIC and SVD circuits when scaling the number of approximated bits.
2) Parallel Computing Platforms (PCP): Exploiting parallel architectures for near-threshold computing based on multi-core clusters is a promising approach to improve the performance of smart sensing systems. In our work, we exploited a novel computing platform embedding a Parallel Ultra Low Power processor (PULP), called \u201cMr. Wolf,\u201d for the implementation of Machine Learning (ML) algorithms for touch modalities classification. First, we tested the ML algorithms at the software level; for RGB images as a case study and tactile dataset, we achieved accuracy respectively equal to 97% and 83.5%. After validating the effectiveness of the ML algorithm at the software level, we performed the on-board classification of two touch modalities, demonstrating the promising use of Mr. Wolf for smart sensing systems. Moreover, we proposed a memory management strategy for storing the needed amount of trained tensors (i.e., 50 trained tensors for each class) in the on-chip memory. We evaluated the execution cycles for Mr. Wolf using a single core, 2 cores, and 3 cores, taking advantage of the benefits of the parallelization. We presented a comparison with the popular low power ARM Cortex-M4F microcontroller employed, usually for battery-operated devices. We showed that the ML algorithm on the proposed platform runs 3.7 times faster than ARM Cortex M4F (STM32F40), consuming only 28 mW. The proposed platform achieves 15
7 better energy efficiency than the classification done on the STM32F40, consuming 81mJ per classification and 150 pJ per operation
Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation Techniques
The rapid growth of demanding applications in domains applying multimedia
processing and machine learning has marked a new era for edge and cloud
computing. These applications involve massive data and compute-intensive tasks,
and thus, typical computing paradigms in embedded systems and data centers are
stressed to meet the worldwide demand for high performance. Concurrently, the
landscape of the semiconductor field in the last 15 years has constituted power
as a first-class design concern. As a result, the community of computing
systems is forced to find alternative design approaches to facilitate
high-performance and/or power-efficient computing. Among the examined
solutions, Approximate Computing has attracted an ever-increasing interest,
with research works applying approximations across the entire traditional
computing stack, i.e., at software, hardware, and architectural levels. Over
the last decade, there is a plethora of approximation techniques in software
(programs, frameworks, compilers, runtimes, languages), hardware (circuits,
accelerators), and architectures (processors, memories). The current article is
Part I of our comprehensive survey on Approximate Computing, and it reviews its
motivation, terminology and principles, as well it classifies and presents the
technical details of the state-of-the-art software and hardware approximation
techniques.Comment: Under Review at ACM Computing Survey
Investigation of reconfigurable-accuracy approximate adder designs for image processing applications
Ph. D. Thesis.In the last decades, integrated circuits with CMOS technology show
progressive scaling challenges of both increased power density and
power dissipation. Meanwhile, high-performance requirements of
current and future application operations show rapid demands of
computing resources like power. This design conflict has pushed
much effort to search for high performance and energy efficient
design approach, such as approximate computing.
Approximate computing exploits the error resilience of compute-
intensive applications such as image processing applications to
implement approximation design techniques with different levels
of abstractions and scalability. The basic principle is to relax the
strict accuracy requirements in favour of a lower design complexity,
thereby achieving more computational performance (i.e., speed)
and energy saving. The adder arithmetic unit is considered one
of the essential computational blocks in most of the applications.
As such, much effort has explored new designs of an efficient
approximate adder design.
This thesis presents an investigation into design enhancement,
novel approximate adder designs and implementation approaches.
The first approach introduces a modification to the error detection
technique of a popular configurable-accuracy approximate adder
design. The proposed lightweight error detection technique reduces
the required gates of the error detection circuit, thus, mitigating
the design area overhead. Furthermore, at the error correction
process of the adder, we have proposed an extensive error detection
while activating more than one correction stage concurrently. As a
result, this ensures achieving an optimum accuracy of outputs for
the worst case of quality requirements.
In general, approximate (speculative) adder designs use the seg-
mentation technique to divide the adder into multiple short length
sub-adders which operate in parallel. Hence, this would limit the
long chains of carry propagation and result in a better performance
operations. However, the use of overlapped parts of sub-adders
regarding a better carry speculation and then more accuracy be-
comes a significant challenge of a large design area overhead. The
second approach continues mitigating this challenge by present-
ing a novel and simpler adder dividing technique to a number of
sub-adders. The new method uses what is known as the carry-kill
signal for both limiting the carry propagation and applying adder
segmentation. Further, between every two adjacent sub-adders,
one AND gate and one XOR gate are used for carry speculation
and error (i.e., carry propagation) detection respectively. Thus, a
significant reduction of the design overhead has been achieved, yet,
with acceptable levels of output results accuracy. In the third final
approach, simple logic OR gates are used to build the approximate
adder while compensating the conventional full adders operation.
The resulted approximate adder design presents very low complex-
ity, high speed, and low power consumption. Furthermore, instead
of augmenting error recovery circuit, short bit-length exact adders
are used as correction stages to control the general level of output
quality (i.e., without error detection overhead). At the final correc-
tion stage, the proposed design would operate the same as an exact
adder.
To validate the efficiency of these approaches, a number of adders
with different bit-widths are designed and synthesized showing
considerable reductions in the critical delay, silicon area and more
savings in energy consumption, compared to other existing ap-
proaches. In addition to acceptable levels or output errors, which
are extensively analysed for each proposed design.
In this study, the proposed configurable adder designs exhibit
energy/quality trade-offs at a different number of correction stages.
These trade-offs can be effectively exploited to implement adders
in applications, where energy can be gracefully minimised within
the envelope of quality requirements. As such, designs implemen-
tation in an image processing application known as Gaussian blur
filter was introduced, demonstrating the loss in the image quality
at each error correction stage. The output images showed promis-
ing results to use the proposed designs for more energy-efficient
applications, where output quality requirements can be relaxed.Mutah Universit
- …