1,349 research outputs found

    Coding overcomplete representations of audio using the MCLT

    Get PDF
    We propose a system for audio coding using the modulated complex lapped transform (MCLT). In general, it is difficult to encode signals using overcomplete representations without avoiding a penalty in rate-distortion performance. We show that the penalty can be significantly reduced for MCLT-based representations, without the need for iterative methods of sparsity reduction. We achieve that via a magnitude-phase polar quantization and the use of magnitude and phase prediction. Compared to systems based on quantization of orthogonal representations such as the modulated lapped transform (MLT), the new system allows for reduced warbling artifacts and more precise computation of frequency-domain auditory masking functions

    Advanced Telecommunications and Signal Processing Program

    Get PDF
    Contains an introduction and reports on eleven research projects.Advanced Telecommunications Research Progra

    High-resolution distributed sampling of bandlimited fields with low-precision sensors

    Full text link
    The problem of sampling a discrete-time sequence of spatially bandlimited fields with a bounded dynamic range, in a distributed, communication-constrained, processing environment is addressed. A central unit, having access to the data gathered by a dense network of fixed-precision sensors, operating under stringent inter-node communication constraints, is required to reconstruct the field snapshots to maximum accuracy. Both deterministic and stochastic field models are considered. For stochastic fields, results are established in the almost-sure sense. The feasibility of having a flexible tradeoff between the oversampling rate (sensor density) and the analog-to-digital converter (ADC) precision, while achieving an exponential accuracy in the number of bits per Nyquist-interval per snapshot is demonstrated. This exposes an underlying ``conservation of bits'' principle: the bit-budget per Nyquist-interval per snapshot (the rate) can be distributed along the amplitude axis (sensor-precision) and space (sensor density) in an almost arbitrary discrete-valued manner, while retaining the same (exponential) distortion-rate characteristics. Achievable information scaling laws for field reconstruction over a bounded region are also derived: With N one-bit sensors per Nyquist-interval, Θ(log⁥N)\Theta(\log N) Nyquist-intervals, and total network bitrate Rnet=Θ((log⁥N)2)R_{net} = \Theta((\log N)^2) (per-sensor bitrate Θ((log⁥N)/N)\Theta((\log N)/N)), the maximum pointwise distortion goes to zero as D=O((log⁥N)2/N)D = O((\log N)^2/N) or D=O(Rnet2−ÎČRnet)D = O(R_{net} 2^{-\beta \sqrt{R_{net}}}). This is shown to be possible with only nearest-neighbor communication, distributed coding, and appropriate interpolation algorithms. For a fixed, nonzero target distortion, the number of fixed-precision sensors and the network rate needed is always finite.Comment: 17 pages, 6 figures; paper withdrawn from IEEE Transactions on Signal Processing and re-submitted to the IEEE Transactions on Information Theor

    Robust sound event detection in bioacoustic sensor networks

    Full text link
    Bioacoustic sensors, sometimes known as autonomous recording units (ARUs), can record sounds of wildlife over long periods of time in scalable and minimally invasive ways. Deriving per-species abundance estimates from these sensors requires detection, classification, and quantification of animal vocalizations as individual acoustic events. Yet, variability in ambient noise, both over time and across sensors, hinders the reliability of current automated systems for sound event detection (SED), such as convolutional neural networks (CNN) in the time-frequency domain. In this article, we develop, benchmark, and combine several machine listening techniques to improve the generalizability of SED models across heterogeneous acoustic environments. As a case study, we consider the problem of detecting avian flight calls from a ten-hour recording of nocturnal bird migration, recorded by a network of six ARUs in the presence of heterogeneous background noise. Starting from a CNN yielding state-of-the-art accuracy on this task, we introduce two noise adaptation techniques, respectively integrating short-term (60 milliseconds) and long-term (30 minutes) context. First, we apply per-channel energy normalization (PCEN) in the time-frequency domain, which applies short-term automatic gain control to every subband in the mel-frequency spectrogram. Secondly, we replace the last dense layer in the network by a context-adaptive neural network (CA-NN) layer. Combining them yields state-of-the-art results that are unmatched by artificial data augmentation alone. We release a pre-trained version of our best performing system under the name of BirdVoxDetect, a ready-to-use detector of avian flight calls in field recordings.Comment: 32 pages, in English. Submitted to PLOS ONE journal in February 2019; revised August 2019; published October 201

    ECG Signal Reconstruction on the IoT-Gateway and Efficacy of Compressive Sensing Under Real-time Constraints

    Get PDF
    Remote health monitoring is becoming indispensable, though, Internet of Things (IoTs)-based solutions have many implementation challenges, including energy consumption at the sensing node, and delay and instability due to cloud computing. Compressive sensing (CS) has been explored as a method to extend the battery lifetime of medical wearable devices. However, it is usually associated with computational complexity at the decoding end, increasing the latency of the system. Meanwhile, mobile processors are becoming computationally stronger and more efficient. Heterogeneous multicore platforms (HMPs) offer a local processing solution that can alleviate the limitations of remote signal processing. This paper demonstrates the real-time performance of compressed ECG reconstruction on ARM's big.LITTLE HMP and the advantages they provide as the primary processing unit of the IoT architecture. It also investigates the efficacy of CS in minimizing power consumption of a wearable device under real-time and hardware constraints. Results show that both the orthogonal matching pursuit and subspace pursuit reconstruction algorithms can be executed on the platform in real time and yield optimum performance on a single A15 core at minimum frequency. The CS extends the battery life of wearable medical devices up to 15.4% considering ECGs suitable for wellness applications and up to 6.6% for clinical grade ECGs. Energy consumption at the gateway is largely due to an active internet connection; hence, processing the signals locally both mitigates system's latency and improves gateway's battery life. Many remote health solutions can benefit from an architecture centered around the use of HMPs, a step toward better remote health monitoring systems.Peer reviewedFinal Published versio

    A Fast Mellin and Scale Transform

    Get PDF
    A fast algorithm for the discrete-scale (and -Mellin) transform is proposed. It performs a discrete-time discrete-scale approximation of the continuous-time transform, with subquadratic asymptotic complexity. The algorithm is based on a well-known relation between the Mellin and Fourier transforms, and it is practical and accurate. The paper gives some theoretical background on the Mellin, -Mellin, and scale transforms. Then the algorithm is presented and analyzed in terms of computational complexity and precision. The effects of different interpolation procedures used in the algorithm are discussed

    An LPC Excitation Model Using Wavelets

    Full text link
    This paper presents a new model of linear predictive coding (LPC) excitation using wavelets for speech signals. The LPC excitation becomes a linear combination of a set of self- similar, orthonormal, band-pass signals with time localization and constant bandwidth in a logarithmic scale. Thus, the set of the coefficients in the linear combination represents the LPC excitation. The discrete wavelet transform (DWT) obtains the coefficients, having several asymmetrical and non-uniform distribution properties that are attractive for speech processing and compression. The properties include magnitude dependent sensitivity, scale dependent sensitivity, and limited frame length, which can be used for having low bit-rate speech. We show that eliminating 8.97% highest magnitude coefficients degrades speech quality down to 1.49dB SNR, while eliminating 27.51% lowest magnitude coefficient maintain speech quality at a level of 27.42 dB SNR. Furthermore eliminating 6.25% coefficients located at a scale associated with 175-630 Hz band severely degrades speech quality down to 4.20 dB SNR. Finally, our results show that optimal frame length for telephony applications is among 32, 64, or 128 samples
    • 

    corecore