474 research outputs found

    Hardware-Software Inexactness in Noise-aware Design of Low-Power Body Sensor Nodes

    Get PDF
    Wireless Body Sensor Nodes (WBSNs) are miniaturized and ultra-low-power devices, able to acquire and wirelessly trans- mit biosignals such as electrocardiograms (ECG) for extended periods of times and with little discomfort for subjects [1]. Energy efficiency is of paramount importance for WBSNs, because it allows a higher wearability (by requiring a smaller battery) and/or an increased mean time between charges. In this paper, we investigate how noise-aware design choices can be made to minimize energy consumption in WBSNs. Noise is unavoidable in biosignals acquisitions, either due to external factors (in case of ECGs, muscle contractions and respiration of subjects [2]) or to the design of the front- end analog acquisition block. From this observation stems the opportunity to apply inexact strategies such as on-node lossy compression to minimize the bandwidth over the energy- hungry wireless link [3], as long as the output quality of the signal, when reconstructed on the receiver side, is not constrained by the performed compression. To maximize gains, ultra-low-power platforms must be employed to perform the above-mentioned Digital Signal Processing (DSP) techniques. To this end, we propose an under-designed (but extremely efficient) architecture that only guarantees the correctness of operations performed on the most significant data (i.e., data most affecting the final results), while allowing sporadic errors for the less significant data

    An Inexact Ultra-low Power Bio-signal Processing Architecture With Lightweight Error Recovery

    Get PDF
    The energy efficiency of digital architectures is tightly linked to the voltage level (Vdd) at which they operate. Aggressive voltage scaling is therefore mandatory when ultra-low power processing is required. Nonetheless, the lowest admissible Vdd is oen bounded by reliability concerns, especially since static and dynamic non-idealities are exacerbated in the near-threshold region, imposing costly guard-bands to guarantee correctness under worst-case conditions. A striking alternative, explored in this paper, waives the requirement for unconditional correctness, undergoing more relaxed constraints. First, aer a run-time failure, processing correctly resumes at a later point in time. Second, failures induce a limited Quality-of-Service (QoS) degradation. We focus our investigation on the practical scenario of embedded bio-signal analysis, a domain in which energy efficiency is key, while applications are inherently error-tolerant to a certain degree. Targeting a domain-specific multi-core platform, we present a study of the impact of inexactness on application-visible errors. en, we introduce a novel methodology to manage them, which requires minimal hardware resources and a negligible energy overhead. Experimental evidence show that, by tolerating 900 errors/hour, the resulting inexact platform can achieve an efficiency increase of up to 24%, with a QoS degradation of less than 3%

    Energy-efficient embedded machine learning algorithms for smart sensing systems

    Get PDF
    Embedded autonomous electronic systems are required in numerous application domains such as Internet of Things (IoT), wearable devices, and biomedical systems. Embedded electronic systems usually host sensors, and each sensor hosts multiple input channels (e.g., tactile, vision), tightly coupled to the electronic computing unit (ECU). The ECU extracts information by often employing sophisticated methods, e.g., Machine Learning. However, embedding Machine Learning algorithms poses essential challenges in terms of hardware resources and energy consumption because of: 1) the high amount of data to be processed; 2) computationally demanding methods. Leveraging on the trade-off between quality requirements versus computational complexity and time latency could reduce the system complexity without affecting the performance. The objectives of the thesis are to develop: 1) energy-efficient arithmetic circuits outperforming state of the art solutions for embedded machine learning algorithms, 2) an energy-efficient embedded electronic system for the \u201celectronic-skin\u201d (e-skin) application. As such, this thesis exploits two main approaches: Approximate Computing: In recent years, the approximate computing paradigm became a significant major field of research since it is able to enhance the energy efficiency and performance of digital systems. \u201cApproximate Computing\u201d(AC) turned out to be a practical approach to trade accuracy for better power, latency, and size . AC targets error-resilient applications and offers promising benefits by conserving some resources. Usually, approximate results are acceptable for many applications, e.g., tactile data processing,image processing , and data mining ; thus, it is highly recommended to take advantage of energy reduction with minimal variation in performance . In our work, we developed two approximate multipliers: 1) the first one is called \u201cMETA\u201d multiplier and is based on the Error Tolerant Adder (ETA), 2) the second one is called \u201cApproximate Baugh-Wooley(BW)\u201d multiplier where the approximations are implemented in the generation of the partial products. We showed that the proposed approximate arithmetic circuits could achieve a relevant reduction in power consumption and time delay around 80.4% and 24%, respectively, with respect to the exact BW multiplier. Next, to prove the feasibility of AC in real world applications, we explored the approximate multipliers on a case study as the e-skin application. The e-skin application is defined as multiple sensing components, including 1) structural materials, 2) signal processing, 3) data acquisition, and 4) data processing. Particularly, processing the originated data from the e-skin into low or high-level information is the main problem to be addressed by the embedded electronic system. Many studies have shown that Machine Learning is a promising approach in processing tactile data when classifying input touch modalities. In our work, we proposed a methodology for evaluating the behavior of the system when introducing approximate arithmetic circuits in the main stages (i.e., signal and data processing stages) of the system. Based on the proposed methodology, we first implemented the approximate multipliers on the low-pass Finite Impulse Response (FIR) filter in the signal processing stage of the application. We noticed that the FIR filter based on (Approx-BW) outperforms state of the art solutions, while respecting the tradeoff between accuracy and power consumption, with an SNR degradation of 1.39dB. Second, we implemented approximate adders and multipliers respectively into the Coordinate Rotational Digital Computer (CORDIC) and the Singular Value Decomposition (SVD) circuits; since CORDIC and SVD take a significant part of the computationally expensive Machine Learning algorithms employed in tactile data processing. We showed benefits of up to 21% and 19% in power reduction at the cost of less than 5% accuracy loss for CORDIC and SVD circuits when scaling the number of approximated bits. 2) Parallel Computing Platforms (PCP): Exploiting parallel architectures for near-threshold computing based on multi-core clusters is a promising approach to improve the performance of smart sensing systems. In our work, we exploited a novel computing platform embedding a Parallel Ultra Low Power processor (PULP), called \u201cMr. Wolf,\u201d for the implementation of Machine Learning (ML) algorithms for touch modalities classification. First, we tested the ML algorithms at the software level; for RGB images as a case study and tactile dataset, we achieved accuracy respectively equal to 97% and 83.5%. After validating the effectiveness of the ML algorithm at the software level, we performed the on-board classification of two touch modalities, demonstrating the promising use of Mr. Wolf for smart sensing systems. Moreover, we proposed a memory management strategy for storing the needed amount of trained tensors (i.e., 50 trained tensors for each class) in the on-chip memory. We evaluated the execution cycles for Mr. Wolf using a single core, 2 cores, and 3 cores, taking advantage of the benefits of the parallelization. We presented a comparison with the popular low power ARM Cortex-M4F microcontroller employed, usually for battery-operated devices. We showed that the ML algorithm on the proposed platform runs 3.7 times faster than ARM Cortex M4F (STM32F40), consuming only 28 mW. The proposed platform achieves 15 7 better energy efficiency than the classification done on the STM32F40, consuming 81mJ per classification and 150 pJ per operation

    An Inexact Ultra-low Power Bio-signal Processing Architecture With Lightweight Error Recovery

    Full text link

    Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation Techniques

    Full text link
    The rapid growth of demanding applications in domains applying multimedia processing and machine learning has marked a new era for edge and cloud computing. These applications involve massive data and compute-intensive tasks, and thus, typical computing paradigms in embedded systems and data centers are stressed to meet the worldwide demand for high performance. Concurrently, the landscape of the semiconductor field in the last 15 years has constituted power as a first-class design concern. As a result, the community of computing systems is forced to find alternative design approaches to facilitate high-performance and/or power-efficient computing. Among the examined solutions, Approximate Computing has attracted an ever-increasing interest, with research works applying approximations across the entire traditional computing stack, i.e., at software, hardware, and architectural levels. Over the last decade, there is a plethora of approximation techniques in software (programs, frameworks, compilers, runtimes, languages), hardware (circuits, accelerators), and architectures (processors, memories). The current article is Part I of our comprehensive survey on Approximate Computing, and it reviews its motivation, terminology and principles, as well it classifies and presents the technical details of the state-of-the-art software and hardware approximation techniques.Comment: Under Review at ACM Computing Survey

    Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications

    Full text link
    The challenging deployment of compute-intensive applications from domains such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of computing systems to explore new design approaches. Approximate Computing appears as an emerging solution, allowing to tune the quality of results in the design of a system in order to improve the energy efficiency and/or performance. This radical paradigm shift has attracted interest from both academia and industry, resulting in significant research on approximation techniques and methodologies at different design layers (from system down to integrated circuits). Motivated by the wide appeal of Approximate Computing over the last 10 years, we conduct a two-part survey to cover key aspects (e.g., terminology and applications) and review the state-of-the art approximation techniques from all layers of the traditional computing stack. In Part II of our survey, we classify and present the technical details of application-specific and architectural approximation techniques, which both target the design of resource-efficient processors/accelerators & systems. Moreover, we present a detailed analysis of the application spectrum of Approximate Computing and discuss open challenges and future directions.Comment: Under Review at ACM Computing Survey

    Approximation Opportunities in Edge Computing Hardware : A Systematic Literature Review

    Get PDF
    With the increasing popularity of the Internet of Things and massive Machine Type Communication technologies, the number of connected devices is rising. However, while enabling valuable effects to our lives, bandwidth and latency constraints challenge Cloud processing of their associated data amounts. A promising solution to these challenges is the combination of Edge and approximate computing techniques that allows for data processing nearer to the user. This paper aims to survey the potential benefits of these paradigms’ intersection. We provide a state-of-the-art review of circuit-level and architecture-level hardware techniques and popular applications. We also outline essential future research directions.publishedVersionPeer reviewe

    Design and Applications of Approximate Circuits by Gate-Level Pruning

    Get PDF
    Energy-efficiency is a critical concern for many systems, ranging from Internet of things objects and mobile devices to high-performance computers. Moreover, after 40 years of prosperity, Moore's law is starting to show its economic and technical limits. Noticing that many circuits are over-engineered and that many applications are error-resilient or require less precision than offered by the existing hardware, approximate computing has emerged as a potential solution to pursue improvements of digital circuits. In this regard, a technique to systematically tradeoff accuracy in exchange for area, power, and delay savings in digital circuits is proposed: gate-level pruning (GLP). A CAD tool is build and integrated into a standard digital flow to offer a wide range of cost-accuracy tradeoffs for any conventional design. The methodology is first demonstrated on adders, achieving up to 78% energy-delay-area reduction for 10% mean relative error. It is then detailed how this methodology can be applied on a more complex system composed of a multitude of arithmetic blocks and memory: the discrete cosine transform (DCT), which is a key building block for image and video processing applications. Even though arithmetic circuits represent less than 4% of the entire DCT area, it is shown that the GLP technique can lead to 21% energy-delay-area savings over the entire system for a reasonable image quality loss of 24 dB. This significant saving is achieved thanks to the pruned arithmetic circuits, which sets some nodes at constant values, enabling the synthesis tool to further simplify the circuit and memory
    • …
    corecore