87 research outputs found
Distributed Time-Frequency Division Multiple Access Protocol For Wireless Sensor Networks
It is well known that biology-inspired self-maintaining algorithms in
wireless sensor nodes achieve near optimum time division multiple access (TDMA)
characteristics in a decentralized manner and with very low complexity. We
extend such distributed TDMA approaches to multiple channels (frequencies).
This is achieved by extending the concept of collaborative reactive listening
in order to balance the number of nodes in all available channels. We prove the
stability of the new protocol and estimate the delay until the balanced system
state is reached. Our approach is benchmarked against single-channel
distributed TDMA and channel hopping approaches using TinyOS imote2 wireless
sensors.Comment: 4 pages, IEEE Wireless Communications Letters, to appear in 201
Throughput-Distortion Computation Of Generic Matrix Multiplication: Toward A Computation Channel For Digital Signal Processing Systems
The generic matrix multiply (GEMM) function is the core element of
high-performance linear algebra libraries used in many
computationally-demanding digital signal processing (DSP) systems. We propose
an acceleration technique for GEMM based on dynamically adjusting the
imprecision (distortion) of computation. Our technique employs adaptive scalar
companding and rounding to input matrix blocks followed by two forms of packing
in floating-point that allow for concurrent calculation of multiple results.
Since the adaptive companding process controls the increase of concurrency (via
packing), the increase in processing throughput (and the corresponding increase
in distortion) depends on the input data statistics. To demonstrate this, we
derive the optimal throughput-distortion control framework for GEMM for the
broad class of zero-mean, independent identically distributed, input sources.
Our approach converts matrix multiplication in programmable processors into a
computation channel: when increasing the processing throughput, the output
noise (error) increases due to (i) coarser quantization and (ii) computational
errors caused by exceeding the machine-precision limitations. We show that,
under certain distortion in the GEMM computation, the proposed framework can
significantly surpass 100% of the peak performance of a given processor. The
practical benefits of our proposal are shown in a face recognition system and a
multi-layer perceptron system trained for metadata learning from a large music
feature database.Comment: IEEE Transactions on Signal Processing (vol. 60, 2012
Improved Techniques for Adversarial Discriminative Domain Adaptation
Adversarial discriminative domain adaptation (ADDA) is an efficient framework
for unsupervised domain adaptation in image classification, where the source
and target domains are assumed to have the same classes, but no labels are
available for the target domain. We investigate whether we can improve
performance of ADDA with a new framework and new loss formulations. Following
the framework of semi-supervised GANs, we first extend the discriminator output
over the source classes, in order to model the joint distribution over domain
and task. We thus leverage on the distribution over the source encoder
posteriors (which is fixed during adversarial training) and propose maximum
mean discrepancy (MMD) and reconstruction-based loss functions for aligning the
target encoder distribution to the source domain. We compare and provide a
comprehensive analysis of how our framework and loss formulations extend over
simple multi-class extensions of ADDA and other discriminative variants of
semi-supervised GANs. In addition, we introduce various forms of regularization
for stabilizing training, including treating the discriminator as a denoising
autoencoder and regularizing the target encoder with source examples to reduce
overfitting under a contraction mapping (i.e., when the target per-class
distributions are contracting during alignment with the source). Finally, we
validate our framework on standard domain adaptation datasets, such as SVHN and
MNIST. We also examine how our framework benefits recognition problems based on
modalities that lack training data, by introducing and evaluating on a
neuromorphic vision sensing (NVS) sign language recognition dataset, where the
source and target domains constitute emulated and real neuromorphic spike
events respectively. Our results on all datasets show that our proposal
competes or outperforms the state-of-the-art in unsupervised domain adaptation.Comment: To appear in IEEE Transactions on Image Processin
Throughput Scaling Of Convolution For Error-Tolerant Multimedia Applications
Convolution and cross-correlation are the basis of filtering and pattern or
template matching in multimedia signal processing. We propose two throughput
scaling options for any one-dimensional convolution kernel in programmable
processors by adjusting the imprecision (distortion) of computation. Our
approach is based on scalar quantization, followed by two forms of tight
packing in floating-point (one of which is proposed in this paper) that allow
for concurrent calculation of multiple results. We illustrate how our approach
can operate as an optional pre- and post-processing layer for off-the-shelf
optimized convolution routines. This is useful for multimedia applications that
are tolerant to processing imprecision and for cases where the input signals
are inherently noisy (error tolerant multimedia applications). Indicative
experimental results with a digital music matching system and an MPEG-7 audio
descriptor system demonstrate that the proposed approach offers up to 175%
increase in processing throughput against optimized (full-precision)
convolution with virtually no effect in the accuracy of the results. Based on
marginal statistics of the input data, it is also shown how the throughput and
distortion can be adjusted per input block of samples under constraints on the
signal-to-noise ratio against the full-precision convolution.Comment: IEEE Trans. on Multimedia, 201
Reliable Linear, Sesquilinear and Bijective Operations On Integer Data Streams Via Numerical Entanglement
A new technique is proposed for fault-tolerant linear, sesquilinear and
bijective (LSB) operations on integer data streams (), such as:
scaling, additions/subtractions, inner or outer vector products, permutations
and convolutions. In the proposed method, the input integer data streams
are linearly superimposed to form numerically-entangled integer data
streams that are stored in-place of the original inputs. A series of LSB
operations can then be performed directly using these entangled data streams.
The results are extracted from the entangled output streams by additions
and arithmetic shifts. Any soft errors affecting any single disentangled output
stream are guaranteed to be detectable via a specific post-computation
reliability check. In addition, when utilizing a separate processor core for
each of the streams, the proposed approach can recover all outputs after
any single fail-stop failure. Importantly, unlike algorithm-based fault
tolerance (ABFT) methods, the number of operations required for the
entanglement, extraction and validation of the results is linearly related to
the number of the inputs and does not depend on the complexity of the performed
LSB operations. We have validated our proposal in an Intel processor (Haswell
architecture with AVX2 support) via fast Fourier transforms, circular
convolutions, and matrix multiplication operations. Our analysis and
experiments reveal that the proposed approach incurs between to
reduction in processing throughput for a wide variety of LSB operations. This
overhead is 5 to 1000 times smaller than that of the equivalent ABFT method
that uses a checksum stream. Thus, our proposal can be used in fault-generating
processor hardware or safety-critical applications, where high reliability is
required without the cost of ABFT or modular redundancy.Comment: to appear in IEEE Trans. on Signal Processing, 201
Failure Mitigation in Linear, Sesquilinear and Bijective Operations On Integer Data Streams Via Numerical Entanglement
A new roll-forward technique is proposed that recovers from any single
fail-stop failure in integer data streams () when undergoing
linear, sesquilinear or bijective (LSB) operations, such as: scaling,
additions/subtractions, inner or outer vector products and permutations. In the
proposed approach, the input integer data streams are linearly superimposed
to form numerically entangled integer data streams that are stored in-place
of the original inputs. A series of LSB operations can then be performed
directly using these entangled data streams. The output results can be
extracted from any entangled output streams by additions and arithmetic
shifts, thereby guaranteeing robustness to a fail-stop failure in any single
stream computation. Importantly, unlike other methods, the number of operations
required for the entanglement, extraction and recovery of the results is
linearly related to the number of the inputs and does not depend on the
complexity of the performed LSB operations. We have validated our proposal in
an Intel processor (Haswell architecture with AVX2 support) via convolution
operations. Our analysis and experiments reveal that the proposed approach
incurs only to reduction in processing throughput in comparison
to the failure-intolerant approach. This overhead is 9 to 14 times smaller than
that of the equivalent checksum-based method. Thus, our proposal can be used in
distributed systems and unreliable processor hardware, or safety-critical
applications, where robustness against fail-stop failures becomes a necessity.Comment: Proc. 21st IEEE International On-Line Testing Symposium (IOLTS 2015),
July 2015, Halkidiki, Greec
Video Classification With CNNs: Using The Codec As A Spatio-Temporal Activity Sensor
We investigate video classification via a two-stream convolutional neural
network (CNN) design that directly ingests information extracted from
compressed video bitstreams. Our approach begins with the observation that all
modern video codecs divide the input frames into macroblocks (MBs). We
demonstrate that selective access to MB motion vector (MV) information within
compressed video bitstreams can also provide for selective, motion-adaptive, MB
pixel decoding (a.k.a., MB texture decoding). This in turn allows for the
derivation of spatio-temporal video activity regions at extremely high speed in
comparison to conventional full-frame decoding followed by optical flow
estimation. In order to evaluate the accuracy of a video classification
framework based on such activity data, we independently train two CNN
architectures on MB texture and MV correspondences and then fuse their scores
to derive the final classification of each test video. Evaluation on two
standard datasets shows that the proposed approach is competitive to the best
two-stream video classification approaches found in the literature. At the same
time: (i) a CPU-based realization of our MV extraction is over 977 times faster
than GPU-based optical flow methods; (ii) selective decoding is up to 12 times
faster than full-frame decoding; (iii) our proposed spatial and temporal CNNs
perform inference at 5 to 49 times lower cloud computing cost than the fastest
methods from the literature.Comment: Accepted in IEEE Transactions on Circuits and Systems for Video
Technology. Extension of ICIP 2017 conference pape
Vectors of Locally Aggregated Centers for Compact Video Representation
We propose a novel vector aggregation technique for compact video
representation, with application in accurate similarity detection within large
video datasets. The current state-of-the-art in visual search is formed by the
vector of locally aggregated descriptors (VLAD) of Jegou et. al. VLAD generates
compact video representations based on scale-invariant feature transform (SIFT)
vectors (extracted per frame) and local feature centers computed over a
training set. With the aim to increase robustness to visual distortions, we
propose a new approach that operates at a coarser level in the feature
representation. We create vectors of locally aggregated centers (VLAC) by first
clustering SIFT features to obtain local feature centers (LFCs) and then
encoding the latter with respect to given centers of local feature centers
(CLFCs), extracted from a training set. The sum-of-differences between the LFCs
and the CLFCs are aggregated to generate an extremely-compact video description
used for accurate video segment similarity detection. Experimentation using a
video dataset, comprising more than 1000 minutes of content from the Open Video
Project, shows that VLAC obtains substantial gains in terms of mean Average
Precision (mAP) against VLAD and the hyper-pooling method of Douze et. al.,
under the same compaction factor and the same set of distortions.Comment: Proc. IEEE International Conference on Multimedia and Expo, ICME
2015, Torino, Ital
PAC-Bayesian Bounds on Rate-Efficient Classifiers
We derive analytic bounds on the noise invariance of majority vote classifiers operating on compressed inputs. Specifically, starting from recent
bounds on the true risk of majority vote classifiers,
we extend the applicability of PAC-Bayesian theory to quantify the resilience of majority votes to
input noise stemming from compression. The derived bounds are intuitive in binary classification
settings, where they can be measured as expressions of voter differentials and voter pair agreement. By combining measures of input distortion
with analytic guarantees on noise invariance, we
prescribe rate-efficient machines to compress inputs without affecting subsequent classification.
Our validation shows how bounding noise invariance can inform the compression stage for any
majority vote classifier such that worst-case implications of bad input reconstructions are known,
and inputs can be compressed to the minimum
amount of information needed prior to inference
Rate-Accuracy Trade-Off In Video Classification With Deep Convolutional Neural Networks
Advanced video classification systems decode video frames to derive the
necessary texture and motion representations for ingestion and analysis by
spatio-temporal deep convolutional neural networks (CNNs). However, when
considering visual Internet-of-Things applications, surveillance systems and
semantic crawlers of large video repositories, the video capture and the
CNN-based semantic analysis parts do not tend to be co-located. This
necessitates the transport of compressed video over networks and incurs
significant overhead in bandwidth and energy consumption, thereby significantly
undermining the deployment potential of such systems. In this paper, we
investigate the trade-off between the encoding bitrate and the achievable
accuracy of CNN-based video classification models that directly ingest
AVC/H.264 and HEVC encoded videos. Instead of retaining entire compressed video
bitstreams and applying complex optical flow calculations prior to CNN
processing, we only retain motion vector and select texture information at
significantly-reduced bitrates and apply no additional processing prior to CNN
ingestion. Based on three CNN architectures and two action recognition
datasets, we achieve 11%-94% saving in bitrate with marginal effect on
classification accuracy. A model-based selection between multiple CNNs
increases these savings further, to the point where, if up to 7% loss of
accuracy can be tolerated, video classification can take place with as little
as 3 kbps for the transport of the required compressed video information to the
system implementing the CNN models
- …