37 research outputs found

    Design Techniques for Energy-Quality Scalable Digital Systems

    Get PDF
    Energy efficiency is one of the key design goals in modern computing. Increasingly complex tasks are being executed in mobile devices and Internet of Things end-nodes, which are expected to operate for long time intervals, in the orders of months or years, with the limited energy budgets provided by small form-factor batteries. Fortunately, many of such tasks are error resilient, meaning that they can toler- ate some relaxation in the accuracy, precision or reliability of internal operations, without a significant impact on the overall output quality. The error resilience of an application may derive from a number of factors. The processing of analog sensor inputs measuring quantities from the physical world may not always require maximum precision, as the amount of information that can be extracted is limited by the presence of external noise. Outputs destined for human consumption may also contain small or occasional errors, thanks to the limited capabilities of our vision and hearing systems. Finally, some computational patterns commonly found in domains such as statistics, machine learning and operational research, naturally tend to reduce or eliminate errors. Energy-Quality (EQ) scalable digital systems systematically trade off the quality of computations with energy efficiency, by relaxing the precision, the accuracy, or the reliability of internal software and hardware components in exchange for energy reductions. This design paradigm is believed to offer one of the most promising solutions to the impelling need for low-energy computing. Despite these high expectations, the current state-of-the-art in EQ scalable design suffers from important shortcomings. First, the great majority of techniques proposed in literature focus only on processing hardware and software components. Nonetheless, for many real devices, processing contributes only to a small portion of the total energy consumption, which is dominated by other components (e.g. I/O, memory or data transfers). Second, in order to fulfill its promises and become diffused in commercial devices, EQ scalable design needs to achieve industrial level maturity. This involves moving from purely academic research based on high-level models and theoretical assumptions to engineered flows compatible with existing industry standards. Third, the time-varying nature of error tolerance, both among different applications and within a single task, should become more central in the proposed design methods. This involves designing “dynamic” systems in which the precision or reliability of operations (and consequently their energy consumption) can be dynamically tuned at runtime, rather than “static” solutions, in which the output quality is fixed at design-time. This thesis introduces several new EQ scalable design techniques for digital systems that take the previous observations into account. Besides processing, the proposed methods apply the principles of EQ scalable design also to interconnects and peripherals, which are often relevant contributors to the total energy in sensor nodes and mobile systems respectively. Regardless of the target component, the presented techniques pay special attention to the accurate evaluation of benefits and overheads deriving from EQ scalability, using industrial-level models, and on the integration with existing standard tools and protocols. Moreover, all the works presented in this thesis allow the dynamic reconfiguration of output quality and energy consumption. More specifically, the contribution of this thesis is divided in three parts. In a first body of work, the design of EQ scalable modules for processing hardware data paths is considered. Three design flows are presented, targeting different technologies and exploiting different ways to achieve EQ scalability, i.e. timing-induced errors and precision reduction. These works are inspired by previous approaches from the literature, namely Reduced-Precision Redundancy and Dynamic Accuracy Scaling, which are re-thought to make them compatible with standard Electronic Design Automation (EDA) tools and flows, providing solutions to overcome their main limitations. The second part of the thesis investigates the application of EQ scalable design to serial interconnects, which are the de facto standard for data exchanges between processing hardware and sensors. In this context, two novel bus encodings are proposed, called Approximate Differential Encoding and Serial-T0, that exploit the statistical characteristics of data produced by sensors to reduce the energy consumption on the bus at the cost of controlled data approximations. The two techniques achieve different results for data of different origins, but share the common features of allowing runtime reconfiguration of the allowed error and being compatible with standard serial bus protocols. Finally, the last part of the manuscript is devoted to the application of EQ scalable design principles to displays, which are often among the most energy- hungry components in mobile systems. The two proposals in this context leverage the emissive nature of Organic Light-Emitting Diode (OLED) displays to save energy by altering the displayed image, thus inducing an output quality reduction that depends on the amount of such alteration. The first technique implements an image-adaptive form of brightness scaling, whose outputs are optimized in terms of balance between power consumption and similarity with the input. The second approach achieves concurrent power reduction and image enhancement, by means of an adaptive polynomial transformation. Both solutions focus on minimizing the overheads associated with a real-time implementation of the transformations in software or hardware, so that these do not offset the savings in the display. For each of these three topics, results show that the aforementioned goal of building EQ scalable systems compatible with existing best practices and mature for being integrated in commercial devices can be effectively achieved. Moreover, they also show that very simple and similar principles can be applied to design EQ scalable versions of different system components (processing, peripherals and I/O), and to equip these components with knobs for the runtime reconfiguration of the energy versus quality tradeoff

    医用超音波における散乱体分布の高解像かつ高感度な画像化に関する研究

    Get PDF
    Ultrasound imaging as an effective method is widely used in medical diagnosis andNDT (non-destructive testing). In particular, ultrasound imaging plays an important role in medical diagnosis due to its safety, noninvasive, inexpensiveness and real-time compared with other medical imaging techniques. However, in general the ultrasound imaging has more speckles and is low definition than the MRI (magnetic resonance imaging) and X-ray CT (computerized tomography). Therefore, it is important to improve the ultrasound imaging quality. In this study, there are three newproposals. The first is the development of a high sensitivity transducer that utilizes piezoelectric charge directly for FET (field effect transistor) channel control. The second is a proposal of a method for estimating the distribution of small scatterers in living tissue using the empirical Bayes method. The third is a super-resolution imagingmethod of scatterers with strong reflection such as organ boundaries and blood vessel walls. The specific description of each chapter is as follows: Chapter 1: The fundamental characteristics and the main applications of ultrasound are discussed, then the advantages and drawbacks of medical ultrasound are high-lighted. Based on the drawbacks, motivations and objectives of this study are stated. Chapter 2: To overcome disadvantages of medical ultrasound, we advanced our studyin two directions: designing new transducer improves the acquisition modality itself, onthe other hand new signal processing improve the acquired echo data. Therefore, the conventional techniques related to the two directions are reviewed. Chapter 3: For high performance piezoelectric, a structure that enables direct coupling of a PZT (lead zirconate titanate) element to the gate of a MOSFET (metal-oxide semiconductor field-effect transistor) to provide a device called the PZT-FET that acts as an ultrasound receiver was proposed. The experimental analysis of the PZT-FET, in terms of its reception sensitivity, dynamic range and -6 dB reception bandwidth have been investigated. The proposed PZT-FET receiver offers high sensitivity, wide dynamic range performance when compared to the typical ultrasound transducer. Chapter 4: In medical ultrasound imaging, speckle patterns caused by reflection interference from small scatterers in living tissue are often suppressed by various methodologies. However, accurate imaging of small scatterers is important in diagnosis; therefore, we investigated influence of speckle pattern on ultrasound imaging by the empirical Bayesian learning. Since small scatterers are spatially correlated and thereby constitute a microstructure, we assume that scatterers are distributed according to the AR (auto regressive) model with unknown parameters. Under this assumption, the AR parameters are estimated by maximizing the marginal likelihood function, and the scatterers distribution is estimated as a MAP (maximum a posteriori) estimator. The performance of our method is evaluated by simulations and experiments. Through the results, we confirmed that the band limited echo has sufficient information of the AR parameters and the power spectrum of the echoes from the scatterers is properly extrapolated. Chapter 5: The medical ultrasound imaging of strong reflectance scatterers based on the MUSIC algorithm is the main subject of Chapter 5. Previously, we have proposed a super-resolution ultrasound imaging based on multiple TRs (transmissions/receptions) with different carrier frequencies called SCM (super resolution FM-chirp correlation method). In order to reduce the number of required TRs for the SCM, the method has been extended to the SA (synthetic aperture) version called SA-SCM. However, since super-resolution processing is performed for each line data obtained by the RBF (reception beam forming) in the SA-SCM, image discontinuities tend to occur in the lateral direction. Therefore, a new method called SCM-weighted SA is proposed, in this version the SCM is performed on each transducer element, and then the SCM result is used as the weight for RBF. The SCM-weighted SA can generate multiple B-mode images each of which corresponds to each carrier frequency, and the appropriate low frequency images among them have no grating lobes. For a further improvement, instead of simple averaging, the SCM applied to the result of the SCM-weighted SA for all frequencies again, which is called SCM-weighted SA-SCM. We evaluated the effectiveness of all the methods by simulations and experiments. From the results, it can be confirmed that the extension of the SCM framework can help ultrasound imaging reduce grating lobes, perform super-resolution and better SNR(signal-to-noise ratio). Chapter 6: A discussion of the overall content of the thesis as well as suggestions for further development together with the remaining problems are summarized.首都大学東京, 2019-03-25, 博士(工学)首都大学東

    Plenoptic Signal Processing for Robust Vision in Field Robotics

    Get PDF
    This thesis proposes the use of plenoptic cameras for improving the robustness and simplicity of machine vision in field robotics applications. Dust, rain, fog, snow, murky water and insufficient light can cause even the most sophisticated vision systems to fail. Plenoptic cameras offer an appealing alternative to conventional imagery by gathering significantly more light over a wider depth of field, and capturing a rich 4D light field structure that encodes textural and geometric information. The key contributions of this work lie in exploring the properties of plenoptic signals and developing algorithms for exploiting them. It lays the groundwork for the deployment of plenoptic cameras in field robotics by establishing a decoding, calibration and rectification scheme appropriate to compact, lenslet-based devices. Next, the frequency-domain shape of plenoptic signals is elaborated and exploited by constructing a filter which focuses over a wide depth of field rather than at a single depth. This filter is shown to reject noise, improving contrast in low light and through attenuating media, while mitigating occluders such as snow, rain and underwater particulate matter. Next, a closed-form generalization of optical flow is presented which directly estimates camera motion from first-order derivatives. An elegant adaptation of this "plenoptic flow" to lenslet-based imagery is demonstrated, as well as a simple, additive method for rendering novel views. Finally, the isolation of dynamic elements from a static background is considered, a task complicated by the non-uniform apparent motion caused by a mobile camera. Two elegant closed-form solutions are presented dealing with monocular time-series and light field image pairs. This work emphasizes non-iterative, noise-tolerant, closed-form, linear methods with predictable and constant runtimes, making them suitable for real-time embedded implementation in field robotics applications

    Plenoptic Signal Processing for Robust Vision in Field Robotics

    Get PDF
    This thesis proposes the use of plenoptic cameras for improving the robustness and simplicity of machine vision in field robotics applications. Dust, rain, fog, snow, murky water and insufficient light can cause even the most sophisticated vision systems to fail. Plenoptic cameras offer an appealing alternative to conventional imagery by gathering significantly more light over a wider depth of field, and capturing a rich 4D light field structure that encodes textural and geometric information. The key contributions of this work lie in exploring the properties of plenoptic signals and developing algorithms for exploiting them. It lays the groundwork for the deployment of plenoptic cameras in field robotics by establishing a decoding, calibration and rectification scheme appropriate to compact, lenslet-based devices. Next, the frequency-domain shape of plenoptic signals is elaborated and exploited by constructing a filter which focuses over a wide depth of field rather than at a single depth. This filter is shown to reject noise, improving contrast in low light and through attenuating media, while mitigating occluders such as snow, rain and underwater particulate matter. Next, a closed-form generalization of optical flow is presented which directly estimates camera motion from first-order derivatives. An elegant adaptation of this "plenoptic flow" to lenslet-based imagery is demonstrated, as well as a simple, additive method for rendering novel views. Finally, the isolation of dynamic elements from a static background is considered, a task complicated by the non-uniform apparent motion caused by a mobile camera. Two elegant closed-form solutions are presented dealing with monocular time-series and light field image pairs. This work emphasizes non-iterative, noise-tolerant, closed-form, linear methods with predictable and constant runtimes, making them suitable for real-time embedded implementation in field robotics applications

    Analytical studies of the generalized likelikhood ratio technique for failure detection

    Get PDF
    Includes bibliographical references.Originally presented as author's thesis (M.S.--Masschusetts Insitute of Technology), 1976.by Edward Yik Chow

    Study and development of innovative strategies for energy-efficient cross-layer design of digital VLSI systems based on Approximate Computing

    Get PDF
    The increasing demand on requirements for high performance and energy efficiency in modern digital systems has led to the research of new design approaches that are able to go beyond the established energy-performance tradeoff. Looking at scientific literature, the Approximate Computing paradigm has been particularly prolific. Many applications in the domain of signal processing, multimedia, computer vision, machine learning are known to be particularly resilient to errors occurring on their input data and during computation, producing outputs that, although degraded, are still largely acceptable from the point of view of quality. The Approximate Computing design paradigm leverages the characteristics of this group of applications to develop circuits, architectures, algorithms that, by relaxing design constraints, perform their computations in an approximate or inexact manner reducing energy consumption. This PhD research aims to explore the design of hardware/software architectures based on Approximate Computing techniques, filling the gap in literature regarding effective applicability and deriving a systematic methodology to characterize its benefits and tradeoffs. The main contributions of this work are: -the introduction of approximate memory management inside the Linux OS, allowing dynamic allocation and de-allocation of approximate memory at user level, as for normal exact memory; - the development of an emulation environment for platforms with approximate memory units, where faults are injected during the simulation based on models that reproduce the effects on memory cells of circuital and architectural techniques for approximate memories; -the implementation and analysis of the impact of approximate memory hardware on real applications: the H.264 video encoder, internally modified to allocate selected data buffers in approximate memory, and signal processing applications (digital filter) using approximate memory for input/output buffers and tap registers; -the development of a fully reconfigurable and combinatorial floating point unit, which can work with reduced precision formats

    Accelerating Mobile Audio Sensing Algorithms through On-Chip GPU Offloading

    Get PDF
    GPUs have recently enjoyed increased popularity as general purpose software accelerators in multiple application domains including computer vision and natural language processing. However, there has been little exploration into the performance and energy trade-offs mobile GPUs can deliver for the increasingly popular workload of deep-inference audio sensing tasks, such as, spoken keyword spotting in energy-constrained smartphones and wearables. In this paper, we study these trade-offs and introduce an optimization engine that leverages a series of structural and memory access optimization techniques that allow audio algorithm performance to be automatically tuned as a function of GPU device specifications and model semantics. We find that parameter optimized audio routines obtain inferences an order of magnitude faster than sequential CPU implementations, and up to 6.5x times faster than cloud offloading with good connectivity, while critically consuming 3-4x less energy than the CPU. Under our optimized GPU, conventional wisdom about how to use the cloud and low power chips is broken. Unless the network has a throughput of at least 20Mbps (and a RTT of 25 ms or less), with only about 10 to 20 seconds of buffering audio data for batched execution, the optimized GPU audio sensing apps begin to consume less energy than cloud offloading. Under such conditions we find the optimized GPU can provide energy benefits comparable to low-power reference DSP implementations with some preliminary level of optimization; in addition to the GPU always winning with lower latency.This work was supported by Microsoft Research through its PhD Scholarship Program

    KAVUAKA: a low-power application-specific processor architecture for digital hearing aids

    Get PDF
    The power consumption of digital hearing aids is very restricted due to their small physical size and the available hardware resources for signal processing are limited. However, there is a demand for more processing performance to make future hearing aids more useful and smarter. Future hearing aids should be able to detect, localize, and recognize target speakers in complex acoustic environments to further improve the speech intelligibility of the individual hearing aid user. Computationally intensive algorithms are required for this task. To maintain acceptable battery life, the hearing aid processing architecture must be highly optimized for extremely low-power consumption and high processing performance.The integration of application-specific instruction-set processors (ASIPs) into hearing aids enables a wide range of architectural customizations to meet the stringent power consumption and performance requirements. In this thesis, the application-specific hearing aid processor KAVUAKA is presented, which is customized and optimized with state-of-the-art hearing aid algorithms such as speaker localization, noise reduction, beamforming algorithms, and speech recognition. Specialized and application-specific instructions are designed and added to the baseline instruction set architecture (ISA). Among the major contributions are a multiply-accumulate (MAC) unit for real- and complex-valued numbers, architectures for power reduction during register accesses, co-processors and a low-latency audio interface. With the proposed MAC architecture, the KAVUAKA processor requires 16 % less cycles for the computation of a 128-point fast Fourier transform (FFT) compared to related programmable digital signal processors. The power consumption during register file accesses is decreased by 6 %to 17 % with isolation and by-pass techniques. The hardware-induced audio latency is 34 %lower compared to related audio interfaces for frame size of 64 samples.The final hearing aid system-on-chip (SoC) with four KAVUAKA processor cores and ten co-processors is integrated as an application-specific integrated circuit (ASIC) using a 40 nm low-power technology. The die size is 3.6 mm2. Each of the processors and co-processors contains individual customizations and hardware features with a varying datapath width between 24-bit to 64-bit. The core area of the 64-bit processor configuration is 0.134 mm2. The processors are organized in two clusters that share memory, an audio interface, co-processors and serial interfaces. The average power consumption at a clock speed of 10 MHz is 2.4 mW for SoC and 0.6 mW for the 64-bit processor.Case studies with four reference hearing aid algorithms are used to present and evaluate the proposed hardware architectures and optimizations. The program code for each processor and co-processor is generated and optimized with evolutionary algorithms for operation merging,instruction scheduling and register allocation. The KAVUAKA processor architecture is com-pared to related processor architectures in terms of processing performance, average power consumption, and silicon area requirements

    Design and Realization of Fully-digital Microwave and Mm-wave Multi-beam Arrays with FPGA/RF-SOC Signal Processing

    Get PDF
    There has been a constant increase in data-traffic and device-connections in mobile wireless communications, which led the fifth generation (5G) implementations to exploit mm-wave bands at 24/28 GHz. The next-generation wireless access point (6G and beyond) will need to adopt large-scale transceiver arrays with a combination of multi-input-multi-output (MIMO) theory and fully digital multi-beam beamforming. The resulting high gain array factors will overcome the high path losses at mm-wave bands, and the simultaneous multi-beams will exploit the multi-directional channels due to multi-path effects and improve the signal-to-noise ratio. Such access points will be based on electronic systems which heavily depend on the integration of RF electronics with digital signal processing performed in Field programmable gate arrays (FPGA)/ RF-system-on-chip (SoC). This dissertation is directed towards the investigation and realization of fully-digital phased arrays that can produce wideband simultaneous multi-beams with FPGA or RF-SoC digital back-ends. The first proposed approach is a spatial bandpass (SBP) IIR filter-based beamformer, and is based on the concepts of space-time network resonance. A 2.4 GHz, 16-element array receiver, has been built for real-time experimental verification of this approach. The second and third approaches are respectively based on Discrete Fourier Transform (DFT) theory, and a lens plus focal planar array theory. Lens based approach is essentially an analog model of DFT. These two approaches are verified for a 28 GHz 800 MHz mm-wave implementation with RF-SoC as the digital back-end. It has been shown that for all proposed multibeam beamformer implementations, the measured beams are well aligned with those of the simulated. The proposed approaches differ in terms of their architectures, hardware complexity and costs, which will be discussed as this dissertation opens up. This dissertation also presents an application of multi-beam approaches for RF directional sensing applications to explore white spaces within the spatio-temporal spectral regions. A real-time directional sensing system is proposed to capture the white spaces within the 2.4 GHz Wi-Fi band. Further, this dissertation investigates the effect of electro-magnetic (EM) mutual coupling in antenna arrays on the real-time performance of fully-digital transceivers. Different algorithms are proposed to uncouple the mutual coupling in digital domain. The first one is based on finding the MC transfer function from the measured S-parameters of the antenna array and employing it in a Frost FIR filter in the beamforming backend. The second proposed method uses fast algorithms to realize the inverse of mutual coupling matrix via tridiagonal Toeplitz matrices having sparse factors. A 5.8 GHz 32-element array and 1-7 GHz 7-element tightly coupled dipole array (TCDA) have been employed to demonstrate the proof-of-concept of these algorithms
    corecore