491 research outputs found

    Low-power digital processor for wireless sensor networks

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 69-72).In order to make sensor networks cost-effective and practical, the electronic components of a wireless sensor node need to run for months to years on the same battery. This thesis explores the design of a low-power digital processor for these sensor nodes, employing techniques such as hardwired algorithms, lowered supply voltages, clock gating and subsystem shutdown. Prototypes were built on both a FPGA and ASIC platform, in order to verify functionality and characterize power consumption. The resulting 0.18[micro]m silicon fabricated in National Semiconductor Corporation's process was operational for supply voltages ranging from 0.5V to 1.8V. At the lowest operating voltage of 0.5V and a frequency of 100KHz, the chip performs 8 full-accuracy FFT computations per second and draws 1.2nJ of total energy per cycle. Although this energy/cycle metric does not surpass existing low-energy processors demonstrated in literature or commercial products, several low-power techniques are suggested that could drastically improve the energy metrics of a future implementation.by Daniel Frederic Finchelstein.S.M

    A Construction Kit for Efficient Low Power Neural Network Accelerator Designs

    Get PDF
    Implementing embedded neural network processing at the edge requires efficient hardware acceleration that couples high computational performance with low power consumption. Driven by the rapid evolution of network architectures and their algorithmic features, accelerator designs are constantly updated and improved. To evaluate and compare hardware design choices, designers can refer to a myriad of accelerator implementations in the literature. Surveys provide an overview of these works but are often limited to system-level and benchmark-specific performance metrics, making it difficult to quantitatively compare the individual effect of each utilized optimization technique. This complicates the evaluation of optimizations for new accelerator designs, slowing-down the research progress. This work provides a survey of neural network accelerator optimization approaches that have been used in recent works and reports their individual effects on edge processing performance. It presents the list of optimizations and their quantitative effects as a construction kit, allowing to assess the design choices for each building block separately. Reported optimizations range from up to 10'000x memory savings to 33x energy reductions, providing chip designers an overview of design choices for implementing efficient low power neural network accelerators

    Design and implementation of components for renewably-powered base-stations with heterogeneous access channel

    Get PDF
    Providing high-speed broadband services in remote areas can be a challenging task, especially because of the lack of network infrastructure. As typical broadband technologies are often expensive to deploy, they require large investment from the local authorities. Previous studies have shown that a viable alternative is to use wireless base stations with high-throughput point to point (PTP) backhaul links. With base stations comes the problem of powering their systems, it is tackled in this thesis by relying on renewable energy harvesting, such as solar panels or wind turbines. This thesis, in the context of the sustainable cellular network harvesting ambient energy (SCAVENGE) project, aims to contribute to a reliable and energy efficient solution to this problem, by adjusting the design of an existing multi-radio energy harvesting base station. In Western Europe, 49 channels of 8 MHz were used for analogue TV transmissions, ranging from 470 MHz (Channel 21) to 862 MHz (Channel 69); this spectrum, now partially unused due to the digital television (DTV) switch-over, has been opened to alternative uses by the regulatory authorities. Using this newly freed ultra high frequency (UHF) range, also known as TV white space (TVWS), can offer reliable low-cost broadband access to housings and businesses in low-density areas. While UHF transmitters allow long range links, the overcrowding of the TV spectrum limits the achievable throughput; to increase the capacity of such TVWS rural broadband base station the UHF radio has previously been combined with a lower-range higher throughput GHz radio like Wireless Fidelity (WiFi). From the regulatory constraints of TVWS applications arises the need for frequency agile transceivers that observe strict spectral mask requirements, this guided previous works towards discrete Fourier transform (DFT) modulated filter-bank multicarrier (FBMC) systems. These systems are numerically efficient, as they permit the up-and-down conversion of the 40 TV channels at the cost of a single channel transceiver and the modulating transform. Typical implementations rely on power-of two fast Fourier transforms (FFTs); however the smallest transform covering the full 40 channels of the TVWS spectrum is a 64 points wide, thus involving 24 unused channels. In order to attain a more numerically-efficient implemented design, we introduce the use of mixed-radix FFTs modulating transform. Testing various sizes and architectures, this approach provides up to 6.7% of energy saving compared to previous designs. Different from orthogonal frequency-division multiplexing (OFDM), FBMC systems are generally expected to be more robust to synchronisation errors, as oversampled FBMC systems can include a guard band, and even in a doubly-dispersive channel, inter-carrier interference (ICI) can be considered negligible. Even though sub-channels can be treated independently—i.e. without the use of cross-terms—they still require equalisation. We introduce a per-band equalisation, amongst different options, a robust and fast blind approach based on a concurrent constant modulus (CM)/decision directed (DD) fractionally-space equaliser (FSE) is selected. The selected approach is capable of equalising a frequency-selective channel. Furthermore the proposed architecture is advantageous in terms of power consumption and implementation cost. After focussing on the design of the radio for TVWS transmission, we address a multi-radio user assignment problem. Using various power consumption and harvesting models for the base station, we formulate two optimisation problems, the first focuses on the base station power consumption, while the second concentrates on load balancing. We employ a dynamic programming approach to optimise the user assignment. The use of such algorithms could allow a downsizing of the power supply systems (harvesters and batteries), thus reducing the cost of the base station. Furthermore the algorithms provide a better balance between the number of users assigned to each network, resulting in a higher quality of service (QoS) and energy efficiency.Providing high-speed broadband services in remote areas can be a challenging task, especially because of the lack of network infrastructure. As typical broadband technologies are often expensive to deploy, they require large investment from the local authorities. Previous studies have shown that a viable alternative is to use wireless base stations with high-throughput point to point (PTP) backhaul links. With base stations comes the problem of powering their systems, it is tackled in this thesis by relying on renewable energy harvesting, such as solar panels or wind turbines. This thesis, in the context of the sustainable cellular network harvesting ambient energy (SCAVENGE) project, aims to contribute to a reliable and energy efficient solution to this problem, by adjusting the design of an existing multi-radio energy harvesting base station. In Western Europe, 49 channels of 8 MHz were used for analogue TV transmissions, ranging from 470 MHz (Channel 21) to 862 MHz (Channel 69); this spectrum, now partially unused due to the digital television (DTV) switch-over, has been opened to alternative uses by the regulatory authorities. Using this newly freed ultra high frequency (UHF) range, also known as TV white space (TVWS), can offer reliable low-cost broadband access to housings and businesses in low-density areas. While UHF transmitters allow long range links, the overcrowding of the TV spectrum limits the achievable throughput; to increase the capacity of such TVWS rural broadband base station the UHF radio has previously been combined with a lower-range higher throughput GHz radio like Wireless Fidelity (WiFi). From the regulatory constraints of TVWS applications arises the need for frequency agile transceivers that observe strict spectral mask requirements, this guided previous works towards discrete Fourier transform (DFT) modulated filter-bank multicarrier (FBMC) systems. These systems are numerically efficient, as they permit the up-and-down conversion of the 40 TV channels at the cost of a single channel transceiver and the modulating transform. Typical implementations rely on power-of two fast Fourier transforms (FFTs); however the smallest transform covering the full 40 channels of the TVWS spectrum is a 64 points wide, thus involving 24 unused channels. In order to attain a more numerically-efficient implemented design, we introduce the use of mixed-radix FFTs modulating transform. Testing various sizes and architectures, this approach provides up to 6.7% of energy saving compared to previous designs. Different from orthogonal frequency-division multiplexing (OFDM), FBMC systems are generally expected to be more robust to synchronisation errors, as oversampled FBMC systems can include a guard band, and even in a doubly-dispersive channel, inter-carrier interference (ICI) can be considered negligible. Even though sub-channels can be treated independently—i.e. without the use of cross-terms—they still require equalisation. We introduce a per-band equalisation, amongst different options, a robust and fast blind approach based on a concurrent constant modulus (CM)/decision directed (DD) fractionally-space equaliser (FSE) is selected. The selected approach is capable of equalising a frequency-selective channel. Furthermore the proposed architecture is advantageous in terms of power consumption and implementation cost. After focussing on the design of the radio for TVWS transmission, we address a multi-radio user assignment problem. Using various power consumption and harvesting models for the base station, we formulate two optimisation problems, the first focuses on the base station power consumption, while the second concentrates on load balancing. We employ a dynamic programming approach to optimise the user assignment. The use of such algorithms could allow a downsizing of the power supply systems (harvesters and batteries), thus reducing the cost of the base station. Furthermore the algorithms provide a better balance between the number of users assigned to each network, resulting in a higher quality of service (QoS) and energy efficiency

    An autonomous and intelligent system for rotating machinery diagnostics

    Get PDF
    Rotating machinery diagnostics (RMD) is a process of evaluating the condition of their components by acquiring a number of measurements and extracting condition related information using signal processing algorithms. A reliable RMD system is fundamental for condition based maintenance programmes to reduce maintenance cost and risk. It must be able to detect any abnormalities at early stages to allow preventing severe performance degradation, avoid economic losses and/or catastrophic failures. A conventional RMD system consists of sensing elements (transducers) and data acquisition system with a compliant software package. Such system is bulky and costly in practical deployment. The recent advancement in micro-scaled electronics have enabled wide spectrum of system design and capabilities at embedded scale. Micro electromechanical system (MEMS) based sensing technologies offer significant savings in terms of system’s price and size. Microcontroller units with embedded computation and sensing interface have enabled system-on-chip design of RMD system within a single sensing node. This research aims at exploiting this growth of microelectronics science to develop a remote and intelligent system to aid maintenance procedures. System’s operation is independent from central processing platform or operator’s analysis. Features include on-board time domain based statistical parameters calculations, frequency domain analysis techniques and a time controlled monitoring tasks within the limitations of its energy budget. A working prototype is developed to test the concept of the research. Two experimental testbeds are used to validate the performance of developed system: DC motor with rotor unbalance and 1.1kW induction motor with phase imbalance. By establishing a classification model with several training samples, the developed system achieved an accuracy of 93% in detecting quantified seeded faults while consumes minimum power at 16.8mW. The performance of developed system demonstrates its strong potential for full industry deployment and compliance

    Runtime Hardware Reconfiguration in Wireless Sensor Networks for Condition Monitoring

    Get PDF
    The integration of miniaturized heterogeneous electronic components has enabled the deployment of tiny sensing platforms empowered by wireless connectivity known as wireless sensor networks. Thanks to an optimized duty-cycled activity, the energy consumption of these battery-powered devices can be reduced to a level where several years of operation is possible. However, the processing capability of currently available wireless sensor nodes does not scale well with the observation of phenomena requiring a high sampling resolution. The large amount of data generated by the sensors cannot be handled efficiently by low-power wireless communication protocols without a preliminary filtering of the information relevant for the application. For this purpose, energy-efficient, flexible, fast and accurate processing units are required to extract important features from the sensor data and relieve the operating system from computationally demanding tasks. Reconfigurable hardware is identified as a suitable technology to fulfill these requirements, balancing implementation flexibility with performance and energy-efficiency. While both static and dynamic power consumption of field programmable gate arrays has often been pointed out as prohibitive for very-low-power applications, recent programmable logic chips based on non-volatile memory appear as a potential solution overcoming this constraint. This thesis first verifies this assumption with the help of a modular sensor node built around a field programmable gate array based on Flash technology. Short and autonomous duty-cycled operation combined with hardware acceleration efficiently drop the energy consumption of the device in the considered context. However, Flash-based devices suffer from restrictions such as long configuration times and limited resources, which reduce their suitability for complex processing tasks. A template of a dynamically reconfigurable architecture built around coarse-grained reconfigurable function units is proposed in a second part of this work to overcome these issues. The module is conceived as an overlay of the sensor node FPGA increasing the implementation flexibility and introducing a standardized programming model. Mechanisms for virtual reconfiguration tailored for resource-constrained systems are introduced to minimize the overhead induced by this genericity. The definition of this template architecture leaves room for design space exploration and application- specific customization. Nevertheless, this aspect must be supported by appropriate design tools which facilitate and automate the generation of low-level design files. For this purpose, a software tool is introduced to graphically configure the architecture and operation of the hardware accelerator. A middleware service is further integrated into the wireless sensor network operating system to bridge the gap between the hardware and the design tools, enabling remote reprogramming and scheduling of the hardware functionality at runtime. At last, this hardware and software toolchain is applied to real-world wireless sensor network deployments in the domain of condition monitoring. This category of applications often require the complex analysis of signals in the considered range of sampling frequencies such as vibrations or electrical currents, making the proposed system ideally suited for the implementation. The flexibility of the approach is demonstrated by taking examples with heterogeneous algorithmic specifications. Different data processing tasks executed by the sensor node hardware accelerator are modified at runtime according to application requests

    A Heterogeneous System Architecture for Low-Power Wireless Sensor Nodes in Compute-Intensive Distributed Applications

    Get PDF
    Wireless Sensor Networks (WSNs) combine embedded sensing and processing capabilities with a wireless communication infrastructure, thus supporting distributed monitoring applications. WSNs have been investigated for more than three decades, and recent social and industrial developments such as home automation, or the Internet of Things, have increased the commercial relevance of this key technology. The communication bandwidth of the sensor nodes is limited by the transportation media and the restricted energy budget of the nodes. To still keep up with the ever increasing sensor count and sampling rates, the basic data acquisition and collection capabilities of WSNs have been extended with decentralized smart feature extraction and data aggregation algorithms. Energy-efficient processing elements are thus required to meet the ever-growing compute demands of the WSN motes within the available energy budget. The Hardware-Accelerated Low Power Mote (HaLoMote) is proposed and evaluated in this thesis to address the requirements of compute-intensive WSN applications. It is a heterogeneous system architecture, that combines a Field Programmable Gate Array (FPGA) for hardware-accelerated data aggregation with an IEEE 802.15.4 based Radio Frequency System-on-Chip for the network management and the top-level control of the applications. To properly support Dynamic Power Management (DPM) on the HaLoMote, a Microsemi IGLOO FPGA with a non-volatile configuration storage was chosen for a prototype implementation, called Hardware-Accelerated Low Energy Wireless Embedded Sensor Node (HaLOEWEn). As for every multi-processor architecture, the inter-processor communication and coordination strongly influences the efficiency of the HaLoMote. Therefore, a generic communication framework is proposed in this thesis. It is tightly coupled with the DPM strategy of the HaLoMote, that supports fast transitions between active and idle modes. Low-power sleep periods can thus be scheduled within every sampling cycle, even for sampling rates of hundreds of hertz. In addition to the development of the heterogeneous system architecture, this thesis focuses on the energy consumption trade-off between wireless data transmission and in-sensor data aggregation. The HaLOEWEn is compared with typical software processors in terms of runtime and energy efficiency in the context of three monitoring applications. The building blocks of these applications comprise hardware-accelerated digital signal processing primitives, lossless data compression, a precise wireless time synchronization protocol, and a transceiver scheduling for contention free information flooding from multiple sources to all network nodes. Most of these concepts are applicable to similar distributed monitoring applications with in-sensor data aggregation. A Structural Health Monitoring (SHM) application is used for the system level evaluation of the HaLoMote concept. The Random Decrement Technique (RDT) is a particular SHM data aggregation algorithm, which determines the free-decay response of the monitored structure for subsequent modal identification. The hardware-accelerated RDT executed on a HaLOEWEn mote requires only 43 % of the energy that a recent ARM Cortex-M based microcontroller consumes for this algorithm. The functionality of the overall WSN-based SHM system is shown with a laboratory-scale demonstrator. Compared to reference data acquired by a wire-bound laboratory measurement system, the HaLOEWEn network can capture the structural information relevant for the SHM application with less than 1 % deviation

    Optimisation of the first principle code Octopus for massive parallel architectures: application to light harvesting complexes

    Get PDF
    [EN]: Computer simulation has become a powerful technique for assisting scientists in developing novel insights into the basic phenomena underlying a wide variety of complex physical systems. The work reported in this thesis is concerned with the use of massively parallel computers to simulate the fundamental features at the electronic structure level that control the initial stages of harvesting and transfer of solar energy in green plants which initiate the photosynthetic process. Currently available supercomputer facilities offer the possibility of using hundred of thousands of computing cores. However, obtaining a linear speed-up from HPC systems is far from trivial. Thus, great efforts must be devoted to understand the nature of the scientific code, the methods of parallel execution, data communication requirements in multi-process calculations, the efficient use of available memory, etc. This thesis deals with all of these themes, with a clear objective in mind: the electronic structure simulation of complete macro-molecular complexes, namely the Light Harvesting Complex II, with the aim of understanding its physical behaviour. In order to simulate this complex, we have used (with the assistance of the PRACE consortium) some of the most powerful supercomputers in Europe to run Octopus, a scientific software package for Density Functional Theory and TimeDependent Density Functional Theory calculations. Results obtained with Octopus have been analysed in depth in order to identify the main obstacles to optimal scaling using thousands of cores. Many problems have emerged, mainly the poor performance of the Poisson solver, high memory requirements, the transfer of high quantities of complex data structures among processes, and so on. Finally, all of these problems have been overcome, and the new version reaches a very high performance in massively parallel systems. Tests run efficiently up to 128K processors and thus we have been able to complete the largest TDDFT calculations performed to date. At the conclusion of this work it has been possible to study the Light Harvesting Complex II as originally envisioned.[EU]: Konputagailu bidezko simulazioa da, gaur egun, zientzialariek eskura duten tresnarik ahaltsuenetako bat sistema fisiko konplexuen portaera ulertzen saiatzeko. Oinarrizko fenomeno fisiko horiek simulatzeko superkonputagailuak erabili dira tesi honetan aurkezten den lanean. Konkretuki, punta-puntako konputagailuak erabili dira fotosintesiaren lehen urratsak ulertzeko, landare berdeetan eguzki-energiaren xurgatze-prozesua kontrolatzen duen molekula simulatuz. Superkonputazio-zentroek ehunka milaka prozesatze-nukleo dituzten makinak erabiltzeko aukera eskaintzen dute, baina ez da batere erraza azelerazio-faktore linealak lortzea halako konputagailuetan. Hori dela eta, ahalegin handiak egin behar dira, informatikaren ikuspegitik, sistema osoaren ezagutza ahalik eta sakonena lortzeko: kode zientifikoen izaera, beraren exekuzio paraleloen aukerak, prozesuen arteko datu-transmisioaren beharrak, sistemaren memoriaren erabilera eraginkorrena, eta abar. Tesi honek aurreko arazo guztiei aurre egiten die, helburu argi batekin: konplexu makromolekular osoen simulazioa, konkretuki Light Harvesting Complex II sistemaren egitura elektronikoaren simulazioa, beraren portaera fisikoa ulertu ahal izateko. Sistema hori simulatu ahal izateko bidean, Europako superkonputagailu azkarrenak erabili dira (PRACE partzuergoari esker) Octopus software-paketea exekutatzeko, zeina Density Functional Theory eta Time-Dependent Density Functional Theory izeneko teorien araberako simulazio elektronikoak egiten baititu. Lortutako emaitzak sakonki analizatu dira, milaka konputazio-nukleo eraginkorki erabiltzea oztopatzen zuten arazoak aurkitzeko. Problema ugari azaldu dira bide horretan, nagusiki Poisson ebazlearen errendimendu baxua, memoria eskaera handiak, datu-egitura konplexuen kopuru handiko transferentziak, eta abar. Azkenean, problema horiek guztiak ebatzi dira, eta bertsio berriak errendimendu handia lortu du superkonputagailu paraleloetan. Exekuzio eraginkorrak frogatu ahal izan ditugu 128K prozesadorera arte eta, ondorioz, inoizko TDDFT simulaziorik handienak egin ahal izan ditugu. Hala, lan honen amaieran, hasierako helburua bete ahal izan da: Light Harvesting Complex II sistema molekularraren azterketa egitea.University of the Basque Country, UPV/EHU, University of Coimbra, Red Española de Supercomputación (RES), Jülich Supercomputing Centre (JSC), Rechenzentrum Garching, Cineca, Barcelona Supercomputing Center (BSC), CeSViMa, European Research Council Advanced Grant DYNamo (ERC-2010-AdG-267374), Spanish Grant (FIS2013-46159-C3-1-P), Grupos Consolidados UPV/EHU del Gobierno Vasco (IT578-13), Grupos Consolidados UPV/EHU del Gobierno Vasco (IT395-10), European Community FP7 project CRONOS (Grant number 280879-2), COST Actions CM1204 (XLIC) and MP1306 (EUSpec), ALDAPA research group belongs to the Basque Advanced Informatics Laboratory (BAILab) supported by the University of the Basque Country UPV/EHU (grant UFI11/45).Peer Reviewe

    Low-voltage embedded biomedical processor design

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 180-190).Advances in mobile electronics are fueling new possibilities in a variety of applications, one of which is ambulatory medical monitoring with body-worn or implanted sensors. Digital processors on such sensors serve to analyze signals in real-time and extract key features for transmission or storage. To support diverse and evolving applications, the processor should be flexible, and to extend sensor operating lifetime, the processor should be energy-efficient. This thesis focuses on architectures and circuits for low power biomedical signal processing. A general-purpose processor is extended with custom hardware accelerators to reduce the cycle count and energy for common tasks, including FIR and median filtering as well as computing FFTs and mathematical functions. Improvements to classic architectures are proposed to reduce power and improve versatility: an FFT accelerator demonstrates a new control scheme to reduce datapath switching activity, and a modified CORDIC engine features increased input range and decreased quantization error over conventional designs. At the system level, the addition of accelerators increases leakage power and bus loading; strategies to mitigate these costs are analyzed in this thesis. A key strategy for improving energy efficiency is to aggressively scale the power supply voltage according to application performance demands. However, increased sensitivity to variation at low voltages must be mitigated in logic and SRAM design. For logic circuits, a design flow and a hold time verification methodology addressing local variation are proposed and demonstrated in a 65nm microcontroller functioning at 0.3V. For SRAMs, a model for the weak-cell read current is presented for near-V supply voltages, and a self-timed scheme for reducing internal bus glitches is employed with low leakage overhead. The above techniques are demonstrated in a 0.5-1.OV biomedical signal processing platform in 0.13p-Lm CMOS. The use of accelerators for key signal processing enabled greater than 10x energy reduction in two complete EEG and EKG analysis applications, as compared to implementations on a conventional processor.by Joyce Y. S. Kwong.Ph.D

    Ultra-low Power Circuits for Internet of Things (IOT)

    Full text link
    Miniaturized sensor nodes offer an unprecedented opportunity for the semiconductor industry which led to a rapid development of the application space: the Internet of Things (IoT). IoT is a global infrastructure that interconnects physical and virtual things which have the potential to dramatically improve people's daily lives. One of key aspect that makes IoT special is that the internet is expanding into places that has been ever reachable as device form factor continue to decreases. Extremely small sensors can be placed on plants, animals, humans, and geologic features, and connected to the Internet. Several challenges, however, exist that could possibly slow the development of IoT. In this thesis, several circuit techniques as well as system level optimizations to meet the challenging power/energy requirement for the IoT design space are described. First, a fully-integrated temperature sensor for battery-operated, ultra-low power microsystems is presented. Sensor operation is based on temperature independent/dependent current sources that are used with oscillators and counters to generate a digital temperature code. Second, an ultra-low power oscillator designed for wake-up timers in compact wireless sensors is presented. The proposed topology separates the continuous comparator from the oscillation path and activates it only for short period when it is required. As a result, both low power tracking and generation of precise wake-up signal is made possible. Third, an 8-bit sub-ranging SAR ADC for biomedical applications is discussed that takes an advantage of signal characteristics. ADC uses a moving window and stores the previous MSBs voltage value on a series capacitor to achieve energy saving compared to a conventional approach while maintaining its accuracy. Finally, an ultra-low power acoustic sensing and object recognition microsystem that uses frequency domain feature extraction and classification is presented. By introducing ultra-low 8-bit SAR-ADC with 50fF input capacitance, power consumption of the frontend amplifier has been reduced to single digit nW-level. Also, serialized discrete Fourier transform (DFT) feature extraction is proposed in a digital back-end, replacing a high-power/area-consuming conventional FFT.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/137157/1/seojeong_1.pd

    A full-custom digital-signal-processing unit for real-time cortical blood flow monitoring

    Get PDF
    Master'sMASTER OF ENGINEERIN
    corecore