2,756 research outputs found
Comments on "phase-shifting for nonseparable 2-D haar wavelets"
In their recent paper, Alnasser and Foroosh derive a wavelet-domain (in-band) method for phase-shifting of 2-D "nonseparable" Haar transform coefficients. Their approach is parametrical to the (a priori known) image translation. In this correspondence, we show that the utilized transform is in fact the separable Haar discrete wavelet transform (DWT). As such, wavelet-domain phase shifting can be performed using previously-proposed phase-shifting approaches that utilize the overcomplete DWT (ODWT), if the given image translation is mapped to the phase component and in-band position within the ODWT
Distributed Time-Frequency Division Multiple Access Protocol For Wireless Sensor Networks
It is well known that biology-inspired self-maintaining algorithms in
wireless sensor nodes achieve near optimum time division multiple access (TDMA)
characteristics in a decentralized manner and with very low complexity. We
extend such distributed TDMA approaches to multiple channels (frequencies).
This is achieved by extending the concept of collaborative reactive listening
in order to balance the number of nodes in all available channels. We prove the
stability of the new protocol and estimate the delay until the balanced system
state is reached. Our approach is benchmarked against single-channel
distributed TDMA and channel hopping approaches using TinyOS imote2 wireless
sensors.Comment: 4 pages, IEEE Wireless Communications Letters, to appear in 201
Incremental refinement of image salient-point detection
Low-level image analysis systems typically detect "points of interest", i.e., areas of natural images that contain corners or edges. Most of the robust and computationally efficient detectors proposed for this task use the autocorrelation matrix of the localized image derivatives. Although the performance of such detectors and their suitability for particular applications has been studied in relevant literature, their behavior under limited input source (image) precision or limited computational or energy resources is largely unknown. All existing frameworks assume that the input image is readily available for processing and that sufficient computational and energy resources exist for the completion of the result. Nevertheless, recent advances in incremental image sensors or compressed sensing, as well as the demand for low-complexity scene analysis in sensor networks now challenge these assumptions. In this paper, we investigate an approach to compute salient points of images incrementally, i.e., the salient point detector can operate with a coarsely quantized input image representation and successively refine the result (the derived salient points) as the image precision is successively refined by the sensor. This has the advantage that the image sensing and the salient point detection can be terminated at any input image precision (e.g., bound set by the sensory equipment or by computation, or by the salient point accuracy required by the application) and the obtained salient points under this precision are readily available. We focus on the popular detector proposed by Harris and Stephens and demonstrate how such an approach can operate when the image samples are refined in a bitwise manner, i.e., the image bitplanes are received one-by-one from the image sensor. We estimate the required energy for image sensing as well as the computation required for the salient point detection based on stochastic source modeling. The computation and energy required by the proposed incremental refinement approach is compared against the conventional salient-point detector realization that operates directly on each source precision and cannot refine the result. Our experiments demonstrate the feasibility of incremental approaches for salient point detection in various classes of natural images. In addition, a first comparison between the results obtained by the intermediate detectors is presented and a novel application for adaptive low-energy image sensing based on points of saliency is presented
Improved Techniques for Adversarial Discriminative Domain Adaptation
Adversarial discriminative domain adaptation (ADDA) is an efficient framework
for unsupervised domain adaptation in image classification, where the source
and target domains are assumed to have the same classes, but no labels are
available for the target domain. We investigate whether we can improve
performance of ADDA with a new framework and new loss formulations. Following
the framework of semi-supervised GANs, we first extend the discriminator output
over the source classes, in order to model the joint distribution over domain
and task. We thus leverage on the distribution over the source encoder
posteriors (which is fixed during adversarial training) and propose maximum
mean discrepancy (MMD) and reconstruction-based loss functions for aligning the
target encoder distribution to the source domain. We compare and provide a
comprehensive analysis of how our framework and loss formulations extend over
simple multi-class extensions of ADDA and other discriminative variants of
semi-supervised GANs. In addition, we introduce various forms of regularization
for stabilizing training, including treating the discriminator as a denoising
autoencoder and regularizing the target encoder with source examples to reduce
overfitting under a contraction mapping (i.e., when the target per-class
distributions are contracting during alignment with the source). Finally, we
validate our framework on standard domain adaptation datasets, such as SVHN and
MNIST. We also examine how our framework benefits recognition problems based on
modalities that lack training data, by introducing and evaluating on a
neuromorphic vision sensing (NVS) sign language recognition dataset, where the
source and target domains constitute emulated and real neuromorphic spike
events respectively. Our results on all datasets show that our proposal
competes or outperforms the state-of-the-art in unsupervised domain adaptation.Comment: To appear in IEEE Transactions on Image Processin
Reliable Linear, Sesquilinear and Bijective Operations On Integer Data Streams Via Numerical Entanglement
A new technique is proposed for fault-tolerant linear, sesquilinear and
bijective (LSB) operations on integer data streams (), such as:
scaling, additions/subtractions, inner or outer vector products, permutations
and convolutions. In the proposed method, the input integer data streams
are linearly superimposed to form numerically-entangled integer data
streams that are stored in-place of the original inputs. A series of LSB
operations can then be performed directly using these entangled data streams.
The results are extracted from the entangled output streams by additions
and arithmetic shifts. Any soft errors affecting any single disentangled output
stream are guaranteed to be detectable via a specific post-computation
reliability check. In addition, when utilizing a separate processor core for
each of the streams, the proposed approach can recover all outputs after
any single fail-stop failure. Importantly, unlike algorithm-based fault
tolerance (ABFT) methods, the number of operations required for the
entanglement, extraction and validation of the results is linearly related to
the number of the inputs and does not depend on the complexity of the performed
LSB operations. We have validated our proposal in an Intel processor (Haswell
architecture with AVX2 support) via fast Fourier transforms, circular
convolutions, and matrix multiplication operations. Our analysis and
experiments reveal that the proposed approach incurs between to
reduction in processing throughput for a wide variety of LSB operations. This
overhead is 5 to 1000 times smaller than that of the equivalent ABFT method
that uses a checksum stream. Thus, our proposal can be used in fault-generating
processor hardware or safety-critical applications, where high reliability is
required without the cost of ABFT or modular redundancy.Comment: to appear in IEEE Trans. on Signal Processing, 201
Throughput Scaling Of Convolution For Error-Tolerant Multimedia Applications
Convolution and cross-correlation are the basis of filtering and pattern or
template matching in multimedia signal processing. We propose two throughput
scaling options for any one-dimensional convolution kernel in programmable
processors by adjusting the imprecision (distortion) of computation. Our
approach is based on scalar quantization, followed by two forms of tight
packing in floating-point (one of which is proposed in this paper) that allow
for concurrent calculation of multiple results. We illustrate how our approach
can operate as an optional pre- and post-processing layer for off-the-shelf
optimized convolution routines. This is useful for multimedia applications that
are tolerant to processing imprecision and for cases where the input signals
are inherently noisy (error tolerant multimedia applications). Indicative
experimental results with a digital music matching system and an MPEG-7 audio
descriptor system demonstrate that the proposed approach offers up to 175%
increase in processing throughput against optimized (full-precision)
convolution with virtually no effect in the accuracy of the results. Based on
marginal statistics of the input data, it is also shown how the throughput and
distortion can be adjusted per input block of samples under constraints on the
signal-to-noise ratio against the full-precision convolution.Comment: IEEE Trans. on Multimedia, 201
Vectors of Locally Aggregated Centers for Compact Video Representation
We propose a novel vector aggregation technique for compact video
representation, with application in accurate similarity detection within large
video datasets. The current state-of-the-art in visual search is formed by the
vector of locally aggregated descriptors (VLAD) of Jegou et. al. VLAD generates
compact video representations based on scale-invariant feature transform (SIFT)
vectors (extracted per frame) and local feature centers computed over a
training set. With the aim to increase robustness to visual distortions, we
propose a new approach that operates at a coarser level in the feature
representation. We create vectors of locally aggregated centers (VLAC) by first
clustering SIFT features to obtain local feature centers (LFCs) and then
encoding the latter with respect to given centers of local feature centers
(CLFCs), extracted from a training set. The sum-of-differences between the LFCs
and the CLFCs are aggregated to generate an extremely-compact video description
used for accurate video segment similarity detection. Experimentation using a
video dataset, comprising more than 1000 minutes of content from the Open Video
Project, shows that VLAC obtains substantial gains in terms of mean Average
Precision (mAP) against VLAD and the hyper-pooling method of Douze et. al.,
under the same compaction factor and the same set of distortions.Comment: Proc. IEEE International Conference on Multimedia and Expo, ICME
2015, Torino, Ital
Throughput-Distortion Computation Of Generic Matrix Multiplication: Toward A Computation Channel For Digital Signal Processing Systems
The generic matrix multiply (GEMM) function is the core element of
high-performance linear algebra libraries used in many
computationally-demanding digital signal processing (DSP) systems. We propose
an acceleration technique for GEMM based on dynamically adjusting the
imprecision (distortion) of computation. Our technique employs adaptive scalar
companding and rounding to input matrix blocks followed by two forms of packing
in floating-point that allow for concurrent calculation of multiple results.
Since the adaptive companding process controls the increase of concurrency (via
packing), the increase in processing throughput (and the corresponding increase
in distortion) depends on the input data statistics. To demonstrate this, we
derive the optimal throughput-distortion control framework for GEMM for the
broad class of zero-mean, independent identically distributed, input sources.
Our approach converts matrix multiplication in programmable processors into a
computation channel: when increasing the processing throughput, the output
noise (error) increases due to (i) coarser quantization and (ii) computational
errors caused by exceeding the machine-precision limitations. We show that,
under certain distortion in the GEMM computation, the proposed framework can
significantly surpass 100% of the peak performance of a given processor. The
practical benefits of our proposal are shown in a face recognition system and a
multi-layer perceptron system trained for metadata learning from a large music
feature database.Comment: IEEE Transactions on Signal Processing (vol. 60, 2012
Statistical framework for video decoding complexity modeling and prediction
Video decoding complexity modeling and prediction is an increasingly important issue for efficient resource utilization in a variety of applications, including task scheduling, receiver-driven complexity shaping, and adaptive dynamic voltage scaling. In this paper we present a novel view of this problem based on a statistical framework perspective. We explore the statistical structure (clustering) of the execution time required by each video decoder module (entropy decoding, motion compensation, etc.) in conjunction with complexity features that are easily extractable at encoding time (representing the properties of each module's input source data). For this purpose, we employ Gaussian mixture models (GMMs) and an expectation-maximization algorithm to estimate the joint execution-time - feature probability density function (PDF). A training set of typical video sequences is used for this purpose in an offline estimation process. The obtained GMM representation is used in conjunction with the complexity features of new video sequences to predict the execution time required for the decoding of these sequences. Several prediction approaches are discussed and compared. The potential mismatch between the training set and new video content is addressed by adaptive online joint-PDF re-estimation. An experimental comparison is performed to evaluate the different approaches and compare the proposed prediction scheme with related resource prediction schemes from the literature. The usefulness of the proposed complexity-prediction approaches is demonstrated in an application of rate-distortion-complexity optimized decoding
Rate-Accuracy Trade-Off In Video Classification With Deep Convolutional Neural Networks
Advanced video classification systems decode video frames to derive the
necessary texture and motion representations for ingestion and analysis by
spatio-temporal deep convolutional neural networks (CNNs). However, when
considering visual Internet-of-Things applications, surveillance systems and
semantic crawlers of large video repositories, the video capture and the
CNN-based semantic analysis parts do not tend to be co-located. This
necessitates the transport of compressed video over networks and incurs
significant overhead in bandwidth and energy consumption, thereby significantly
undermining the deployment potential of such systems. In this paper, we
investigate the trade-off between the encoding bitrate and the achievable
accuracy of CNN-based video classification models that directly ingest
AVC/H.264 and HEVC encoded videos. Instead of retaining entire compressed video
bitstreams and applying complex optical flow calculations prior to CNN
processing, we only retain motion vector and select texture information at
significantly-reduced bitrates and apply no additional processing prior to CNN
ingestion. Based on three CNN architectures and two action recognition
datasets, we achieve 11%-94% saving in bitrate with marginal effect on
classification accuracy. A model-based selection between multiple CNNs
increases these savings further, to the point where, if up to 7% loss of
accuracy can be tolerated, video classification can take place with as little
as 3 kbps for the transport of the required compressed video information to the
system implementing the CNN models
- …
