15,309 research outputs found

    Throughput-Distortion Computation Of Generic Matrix Multiplication: Toward A Computation Channel For Digital Signal Processing Systems

    Get PDF
    The generic matrix multiply (GEMM) function is the core element of high-performance linear algebra libraries used in many computationally-demanding digital signal processing (DSP) systems. We propose an acceleration technique for GEMM based on dynamically adjusting the imprecision (distortion) of computation. Our technique employs adaptive scalar companding and rounding to input matrix blocks followed by two forms of packing in floating-point that allow for concurrent calculation of multiple results. Since the adaptive companding process controls the increase of concurrency (via packing), the increase in processing throughput (and the corresponding increase in distortion) depends on the input data statistics. To demonstrate this, we derive the optimal throughput-distortion control framework for GEMM for the broad class of zero-mean, independent identically distributed, input sources. Our approach converts matrix multiplication in programmable processors into a computation channel: when increasing the processing throughput, the output noise (error) increases due to (i) coarser quantization and (ii) computational errors caused by exceeding the machine-precision limitations. We show that, under certain distortion in the GEMM computation, the proposed framework can significantly surpass 100% of the peak performance of a given processor. The practical benefits of our proposal are shown in a face recognition system and a multi-layer perceptron system trained for metadata learning from a large music feature database.Comment: IEEE Transactions on Signal Processing (vol. 60, 2012

    Throughput Scaling Of Convolution For Error-Tolerant Multimedia Applications

    Full text link
    Convolution and cross-correlation are the basis of filtering and pattern or template matching in multimedia signal processing. We propose two throughput scaling options for any one-dimensional convolution kernel in programmable processors by adjusting the imprecision (distortion) of computation. Our approach is based on scalar quantization, followed by two forms of tight packing in floating-point (one of which is proposed in this paper) that allow for concurrent calculation of multiple results. We illustrate how our approach can operate as an optional pre- and post-processing layer for off-the-shelf optimized convolution routines. This is useful for multimedia applications that are tolerant to processing imprecision and for cases where the input signals are inherently noisy (error tolerant multimedia applications). Indicative experimental results with a digital music matching system and an MPEG-7 audio descriptor system demonstrate that the proposed approach offers up to 175% increase in processing throughput against optimized (full-precision) convolution with virtually no effect in the accuracy of the results. Based on marginal statistics of the input data, it is also shown how the throughput and distortion can be adjusted per input block of samples under constraints on the signal-to-noise ratio against the full-precision convolution.Comment: IEEE Trans. on Multimedia, 201

    Fundamentals of Large Sensor Networks: Connectivity, Capacity, Clocks and Computation

    Full text link
    Sensor networks potentially feature large numbers of nodes that can sense their environment over time, communicate with each other over a wireless network, and process information. They differ from data networks in that the network as a whole may be designed for a specific application. We study the theoretical foundations of such large scale sensor networks, addressing four fundamental issues- connectivity, capacity, clocks and function computation. To begin with, a sensor network must be connected so that information can indeed be exchanged between nodes. The connectivity graph of an ad-hoc network is modeled as a random graph and the critical range for asymptotic connectivity is determined, as well as the critical number of neighbors that a node needs to connect to. Next, given connectivity, we address the issue of how much data can be transported over the sensor network. We present fundamental bounds on capacity under several models, as well as architectural implications for how wireless communication should be organized. Temporal information is important both for the applications of sensor networks as well as their operation.We present fundamental bounds on the synchronizability of clocks in networks, and also present and analyze algorithms for clock synchronization. Finally we turn to the issue of gathering relevant information, that sensor networks are designed to do. One needs to study optimal strategies for in-network aggregation of data, in order to reliably compute a composite function of sensor measurements, as well as the complexity of doing so. We address the issue of how such computation can be performed efficiently in a sensor network and the algorithms for doing so, for some classes of functions.Comment: 10 pages, 3 figures, Submitted to the Proceedings of the IEE

    EC-CENTRIC: An Energy- and Context-Centric Perspective on IoT Systems and Protocol Design

    Get PDF
    The radio transceiver of an IoT device is often where most of the energy is consumed. For this reason, most research so far has focused on low power circuit and energy efficient physical layer designs, with the goal of reducing the average energy per information bit required for communication. While these efforts are valuable per se, their actual effectiveness can be partially neutralized by ill-designed network, processing and resource management solutions, which can become a primary factor of performance degradation, in terms of throughput, responsiveness and energy efficiency. The objective of this paper is to describe an energy-centric and context-aware optimization framework that accounts for the energy impact of the fundamental functionalities of an IoT system and that proceeds along three main technical thrusts: 1) balancing signal-dependent processing techniques (compression and feature extraction) and communication tasks; 2) jointly designing channel access and routing protocols to maximize the network lifetime; 3) providing self-adaptability to different operating conditions through the adoption of suitable learning architectures and of flexible/reconfigurable algorithms and protocols. After discussing this framework, we present some preliminary results that validate the effectiveness of our proposed line of action, and show how the use of adaptive signal processing and channel access techniques allows an IoT network to dynamically tune lifetime for signal distortion, according to the requirements dictated by the application

    Efficient DSP and Circuit Architectures for Massive MIMO: State-of-the-Art and Future Directions

    Full text link
    Massive MIMO is a compelling wireless access concept that relies on the use of an excess number of base-station antennas, relative to the number of active terminals. This technology is a main component of 5G New Radio (NR) and addresses all important requirements of future wireless standards: a great capacity increase, the support of many simultaneous users, and improvement in energy efficiency. Massive MIMO requires the simultaneous processing of signals from many antenna chains, and computational operations on large matrices. The complexity of the digital processing has been viewed as a fundamental obstacle to the feasibility of Massive MIMO in the past. Recent advances on system-algorithm-hardware co-design have led to extremely energy-efficient implementations. These exploit opportunities in deeply-scaled silicon technologies and perform partly distributed processing to cope with the bottlenecks encountered in the interconnection of many signals. For example, prototype ASIC implementations have demonstrated zero-forcing precoding in real time at a 55 mW power consumption (20 MHz bandwidth, 128 antennas, multiplexing of 8 terminals). Coarse and even error-prone digital processing in the antenna paths permits a reduction of consumption with a factor of 2 to 5. This article summarizes the fundamental technical contributions to efficient digital signal processing for Massive MIMO. The opportunities and constraints on operating on low-complexity RF and analog hardware chains are clarified. It illustrates how terminals can benefit from improved energy efficiency. The status of technology and real-life prototypes discussed. Open challenges and directions for future research are suggested.Comment: submitted to IEEE transactions on signal processin

    A smart environment for biometric capture

    No full text
    The development of large scale biometric systems require experiments to be performed on large amounts of data. Existing capture systems are designed for fixed experiments and are not easily scalable. In this scenario even the addition of extra data is difficult. We developed a prototype biometric tunnel for the capture of non-contact biometrics. It is self contained and autonomous. Such a configuration is ideal for building access or deployment in secure environments. The tunnel captures cropped images of the subject's face and performs a 3D reconstruction of the person's motion which is used to extract gait information. Interaction between the various parts of the system is performed via the use of an agent framework. The design of this system is a trade-off between parallel and serial processing due to various hardware bottlenecks. When tested on a small population the extracted features have been shown to be potent for recognition. We currently achieve a moderate throughput of approximate 15 subjects an hour and hope to improve this in the future as the prototype becomes more complete
    • …
    corecore