1,349 research outputs found
Coding overcomplete representations of audio using the MCLT
We propose a system for audio coding using the modulated complex
lapped transform (MCLT). In general, it is difficult to encode signals using
overcomplete representations without avoiding a penalty in rate-distortion
performance. We show that the penalty can be significantly reduced for
MCLT-based representations, without the need for iterative methods of
sparsity reduction. We achieve that via a magnitude-phase polar quantization
and the use of magnitude and phase prediction. Compared to systems based
on quantization of orthogonal representations such as the modulated lapped
transform (MLT), the new system allows for reduced warbling artifacts and
more precise computation of frequency-domain auditory masking functions
Advanced Telecommunications and Signal Processing Program
Contains an introduction and reports on eleven research projects.Advanced Telecommunications Research Progra
High-resolution distributed sampling of bandlimited fields with low-precision sensors
The problem of sampling a discrete-time sequence of spatially bandlimited
fields with a bounded dynamic range, in a distributed,
communication-constrained, processing environment is addressed. A central unit,
having access to the data gathered by a dense network of fixed-precision
sensors, operating under stringent inter-node communication constraints, is
required to reconstruct the field snapshots to maximum accuracy. Both
deterministic and stochastic field models are considered. For stochastic
fields, results are established in the almost-sure sense. The feasibility of
having a flexible tradeoff between the oversampling rate (sensor density) and
the analog-to-digital converter (ADC) precision, while achieving an exponential
accuracy in the number of bits per Nyquist-interval per snapshot is
demonstrated. This exposes an underlying ``conservation of bits'' principle:
the bit-budget per Nyquist-interval per snapshot (the rate) can be distributed
along the amplitude axis (sensor-precision) and space (sensor density) in an
almost arbitrary discrete-valued manner, while retaining the same (exponential)
distortion-rate characteristics. Achievable information scaling laws for field
reconstruction over a bounded region are also derived: With N one-bit sensors
per Nyquist-interval, Nyquist-intervals, and total network
bitrate (per-sensor bitrate ), the maximum pointwise distortion goes to zero as
or . This is shown to be possible
with only nearest-neighbor communication, distributed coding, and appropriate
interpolation algorithms. For a fixed, nonzero target distortion, the number of
fixed-precision sensors and the network rate needed is always finite.Comment: 17 pages, 6 figures; paper withdrawn from IEEE Transactions on Signal
Processing and re-submitted to the IEEE Transactions on Information Theor
Robust sound event detection in bioacoustic sensor networks
Bioacoustic sensors, sometimes known as autonomous recording units (ARUs),
can record sounds of wildlife over long periods of time in scalable and
minimally invasive ways. Deriving per-species abundance estimates from these
sensors requires detection, classification, and quantification of animal
vocalizations as individual acoustic events. Yet, variability in ambient noise,
both over time and across sensors, hinders the reliability of current automated
systems for sound event detection (SED), such as convolutional neural networks
(CNN) in the time-frequency domain. In this article, we develop, benchmark, and
combine several machine listening techniques to improve the generalizability of
SED models across heterogeneous acoustic environments. As a case study, we
consider the problem of detecting avian flight calls from a ten-hour recording
of nocturnal bird migration, recorded by a network of six ARUs in the presence
of heterogeneous background noise. Starting from a CNN yielding
state-of-the-art accuracy on this task, we introduce two noise adaptation
techniques, respectively integrating short-term (60 milliseconds) and long-term
(30 minutes) context. First, we apply per-channel energy normalization (PCEN)
in the time-frequency domain, which applies short-term automatic gain control
to every subband in the mel-frequency spectrogram. Secondly, we replace the
last dense layer in the network by a context-adaptive neural network (CA-NN)
layer. Combining them yields state-of-the-art results that are unmatched by
artificial data augmentation alone. We release a pre-trained version of our
best performing system under the name of BirdVoxDetect, a ready-to-use detector
of avian flight calls in field recordings.Comment: 32 pages, in English. Submitted to PLOS ONE journal in February 2019;
revised August 2019; published October 201
ECG Signal Reconstruction on the IoT-Gateway and Efficacy of Compressive Sensing Under Real-time Constraints
Remote health monitoring is becoming indispensable, though, Internet of Things (IoTs)-based solutions have many implementation challenges, including energy consumption at the sensing node, and delay and instability due to cloud computing. Compressive sensing (CS) has been explored as a method to extend the battery lifetime of medical wearable devices. However, it is usually associated with computational complexity at the decoding end, increasing the latency of the system. Meanwhile, mobile processors are becoming computationally stronger and more efficient. Heterogeneous multicore platforms (HMPs) offer a local processing solution that can alleviate the limitations of remote signal processing. This paper demonstrates the real-time performance of compressed ECG reconstruction on ARM's big.LITTLE HMP and the advantages they provide as the primary processing unit of the IoT architecture. It also investigates the efficacy of CS in minimizing power consumption of a wearable device under real-time and hardware constraints. Results show that both the orthogonal matching pursuit and subspace pursuit reconstruction algorithms can be executed on the platform in real time and yield optimum performance on a single A15 core at minimum frequency. The CS extends the battery life of wearable medical devices up to 15.4% considering ECGs suitable for wellness applications and up to 6.6% for clinical grade ECGs. Energy consumption at the gateway is largely due to an active internet connection; hence, processing the signals locally both mitigates system's latency and improves gateway's battery life. Many remote health solutions can benefit from an architecture centered around the use of HMPs, a step toward better remote health monitoring systems.Peer reviewedFinal Published versio
A Fast Mellin and Scale Transform
A fast algorithm for the discrete-scale (and -Mellin) transform is proposed. It performs a discrete-time discrete-scale approximation of the continuous-time transform, with subquadratic asymptotic complexity. The algorithm is based on a well-known relation between the Mellin and Fourier transforms, and it is practical and accurate. The paper gives some theoretical background on the Mellin, -Mellin, and scale transforms. Then the algorithm is presented and analyzed in terms of computational complexity and precision. The effects of different interpolation procedures used in the algorithm are discussed
An LPC Excitation Model Using Wavelets
This paper presents a new model of linear predictive coding (LPC) excitation using wavelets for speech signals. The LPC excitation becomes a linear combination of a set of self- similar, orthonormal, band-pass signals with time localization and constant bandwidth in a logarithmic scale. Thus, the set of the coefficients in the linear combination represents the LPC excitation. The discrete wavelet transform (DWT) obtains the coefficients, having several asymmetrical and non-uniform distribution properties that are attractive for speech processing and compression. The properties include magnitude dependent sensitivity, scale dependent sensitivity, and limited frame length, which can be used for having low bit-rate speech. We show that eliminating 8.97% highest magnitude coefficients degrades speech quality down to 1.49dB SNR, while eliminating 27.51% lowest magnitude coefficient maintain speech quality at a level of 27.42 dB SNR. Furthermore eliminating 6.25% coefficients located at a scale associated with 175-630 Hz band severely degrades speech quality down to 4.20 dB SNR. Finally, our results show that optimal frame length for telephony applications is among 32, 64, or 128 samples
- âŠ