15,309 research outputs found
Throughput-Distortion Computation Of Generic Matrix Multiplication: Toward A Computation Channel For Digital Signal Processing Systems
The generic matrix multiply (GEMM) function is the core element of
high-performance linear algebra libraries used in many
computationally-demanding digital signal processing (DSP) systems. We propose
an acceleration technique for GEMM based on dynamically adjusting the
imprecision (distortion) of computation. Our technique employs adaptive scalar
companding and rounding to input matrix blocks followed by two forms of packing
in floating-point that allow for concurrent calculation of multiple results.
Since the adaptive companding process controls the increase of concurrency (via
packing), the increase in processing throughput (and the corresponding increase
in distortion) depends on the input data statistics. To demonstrate this, we
derive the optimal throughput-distortion control framework for GEMM for the
broad class of zero-mean, independent identically distributed, input sources.
Our approach converts matrix multiplication in programmable processors into a
computation channel: when increasing the processing throughput, the output
noise (error) increases due to (i) coarser quantization and (ii) computational
errors caused by exceeding the machine-precision limitations. We show that,
under certain distortion in the GEMM computation, the proposed framework can
significantly surpass 100% of the peak performance of a given processor. The
practical benefits of our proposal are shown in a face recognition system and a
multi-layer perceptron system trained for metadata learning from a large music
feature database.Comment: IEEE Transactions on Signal Processing (vol. 60, 2012
Throughput Scaling Of Convolution For Error-Tolerant Multimedia Applications
Convolution and cross-correlation are the basis of filtering and pattern or
template matching in multimedia signal processing. We propose two throughput
scaling options for any one-dimensional convolution kernel in programmable
processors by adjusting the imprecision (distortion) of computation. Our
approach is based on scalar quantization, followed by two forms of tight
packing in floating-point (one of which is proposed in this paper) that allow
for concurrent calculation of multiple results. We illustrate how our approach
can operate as an optional pre- and post-processing layer for off-the-shelf
optimized convolution routines. This is useful for multimedia applications that
are tolerant to processing imprecision and for cases where the input signals
are inherently noisy (error tolerant multimedia applications). Indicative
experimental results with a digital music matching system and an MPEG-7 audio
descriptor system demonstrate that the proposed approach offers up to 175%
increase in processing throughput against optimized (full-precision)
convolution with virtually no effect in the accuracy of the results. Based on
marginal statistics of the input data, it is also shown how the throughput and
distortion can be adjusted per input block of samples under constraints on the
signal-to-noise ratio against the full-precision convolution.Comment: IEEE Trans. on Multimedia, 201
Fundamentals of Large Sensor Networks: Connectivity, Capacity, Clocks and Computation
Sensor networks potentially feature large numbers of nodes that can sense
their environment over time, communicate with each other over a wireless
network, and process information. They differ from data networks in that the
network as a whole may be designed for a specific application. We study the
theoretical foundations of such large scale sensor networks, addressing four
fundamental issues- connectivity, capacity, clocks and function computation.
To begin with, a sensor network must be connected so that information can
indeed be exchanged between nodes. The connectivity graph of an ad-hoc network
is modeled as a random graph and the critical range for asymptotic connectivity
is determined, as well as the critical number of neighbors that a node needs to
connect to. Next, given connectivity, we address the issue of how much data can
be transported over the sensor network. We present fundamental bounds on
capacity under several models, as well as architectural implications for how
wireless communication should be organized.
Temporal information is important both for the applications of sensor
networks as well as their operation.We present fundamental bounds on the
synchronizability of clocks in networks, and also present and analyze
algorithms for clock synchronization. Finally we turn to the issue of gathering
relevant information, that sensor networks are designed to do. One needs to
study optimal strategies for in-network aggregation of data, in order to
reliably compute a composite function of sensor measurements, as well as the
complexity of doing so. We address the issue of how such computation can be
performed efficiently in a sensor network and the algorithms for doing so, for
some classes of functions.Comment: 10 pages, 3 figures, Submitted to the Proceedings of the IEE
EC-CENTRIC: An Energy- and Context-Centric Perspective on IoT Systems and Protocol Design
The radio transceiver of an IoT device is often where most of the energy is consumed. For this reason, most research so far has focused on low power circuit and energy efficient physical layer designs, with the goal of reducing the average energy per information bit required for communication. While these efforts are valuable per se, their actual effectiveness can be partially neutralized by ill-designed network, processing and resource management solutions, which can become a primary factor of performance degradation, in terms of throughput, responsiveness and energy efficiency. The objective of this paper is to describe an energy-centric and context-aware optimization framework that accounts for the energy impact of the fundamental functionalities of an IoT system and that proceeds along three main technical thrusts: 1) balancing signal-dependent processing techniques (compression and feature extraction) and communication tasks; 2) jointly designing channel access and routing protocols to maximize the network lifetime; 3) providing self-adaptability to different operating conditions through the adoption of suitable learning architectures and of flexible/reconfigurable algorithms and protocols. After discussing this framework, we present some preliminary results that validate the effectiveness of our proposed line of action, and show how the use of adaptive signal processing and channel access techniques allows an IoT network to dynamically tune lifetime for signal distortion, according to the requirements dictated by the application
Efficient DSP and Circuit Architectures for Massive MIMO: State-of-the-Art and Future Directions
Massive MIMO is a compelling wireless access concept that relies on the use
of an excess number of base-station antennas, relative to the number of active
terminals. This technology is a main component of 5G New Radio (NR) and
addresses all important requirements of future wireless standards: a great
capacity increase, the support of many simultaneous users, and improvement in
energy efficiency. Massive MIMO requires the simultaneous processing of signals
from many antenna chains, and computational operations on large matrices. The
complexity of the digital processing has been viewed as a fundamental obstacle
to the feasibility of Massive MIMO in the past. Recent advances on
system-algorithm-hardware co-design have led to extremely energy-efficient
implementations. These exploit opportunities in deeply-scaled silicon
technologies and perform partly distributed processing to cope with the
bottlenecks encountered in the interconnection of many signals. For example,
prototype ASIC implementations have demonstrated zero-forcing precoding in real
time at a 55 mW power consumption (20 MHz bandwidth, 128 antennas, multiplexing
of 8 terminals). Coarse and even error-prone digital processing in the antenna
paths permits a reduction of consumption with a factor of 2 to 5. This article
summarizes the fundamental technical contributions to efficient digital signal
processing for Massive MIMO. The opportunities and constraints on operating on
low-complexity RF and analog hardware chains are clarified. It illustrates how
terminals can benefit from improved energy efficiency. The status of technology
and real-life prototypes discussed. Open challenges and directions for future
research are suggested.Comment: submitted to IEEE transactions on signal processin
A smart environment for biometric capture
The development of large scale biometric systems require experiments to be performed on large amounts of data. Existing capture systems are designed for fixed experiments and are not easily scalable. In this scenario even the addition of extra data is difficult. We developed a prototype biometric tunnel for the capture of non-contact biometrics. It is self contained and autonomous. Such a configuration is ideal for building access or deployment in secure environments. The tunnel captures cropped images of the subject's face and performs a 3D reconstruction of the person's motion which is used to extract gait information. Interaction between the various parts of the system is performed via the use of an agent framework. The design of this system is a trade-off between parallel and serial processing due to various hardware bottlenecks. When tested on a small population the extracted features have been shown to be potent for recognition. We currently achieve a moderate throughput of approximate 15 subjects an hour and hope to improve this in the future as the prototype becomes more complete
Recommended from our members
An automatically curated first-principles database of ferroelectrics.
Ferroelectric materials have technological applications in information storage and electronic devices. The ferroelectric polar phase can be controlled with external fields, chemical substitution and size-effects in bulk and ultrathin film form, providing a platform for future technologies and for exploratory research. In this work, we integrate spin-polarized density functional theory (DFT) calculations, crystal structure databases, symmetry tools, workflow software, and a custom analysis toolkit to build a library of known, previously-proposed, and newly-proposed ferroelectric materials. With our automated workflow, we screen over 67,000 candidate materials from the Materials Project database to generate a dataset of 255 ferroelectric candidates, and propose 126 new ferroelectric materials. We benchmark our results against experimental data and previous first-principles results. The data provided includes atomic structures, output files, and DFT values of band gaps, energies, and the spontaneous polarization for each ferroelectric candidate. We contribute our workflow and analysis code to the open-source python packages atomate and pymatgen so others can conduct analogous symmetry driven searches for ferroelectrics and related phenomena
- …