Search CORE

1,349 research outputs found

Coding overcomplete representations of audio using the MCLT

Author: Malvar Henrique S.
Yoon Byung‐Jun
Publication venue: IEEE Computer Society
Publication date: 01/01/2008
Field of study

We propose a system for audio coding using the modulated complex lapped transform (MCLT). In general, it is difficult to encode signals using overcomplete representations without avoiding a penalty in rate-distortion performance. We show that the penalty can be significantly reduced for MCLT-based representations, without the need for iterative methods of sparsity reduction. We achieve that via a magnitude-phase polar quantization and the use of magnitude and phase prediction. Compared to systems based on quantization of orthogonal representations such as the modulated lapped transform (MLT), the new system allows for reduced warbling artifacts and more precise computation of frequency-domain auditory masking functions

CiteSeerX

Crossref

Caltech Authors

Advanced Telecommunications and Signal Processing Program

Author: Apostolopoulos John G.
Cheung Shiufun
Hajjahmad Ibrahim A.
Iwai Kyle K.
Lim Jae S.
Monta Peter A.
Narula Aradhana
Nicolas Julien J.
Pfajfer Alexsander
Sunshine Lon E.
Yoo Chang Dong
Publication venue: Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT)
Publication date
Field of study

Contains an introduction and reports on eleven research projects.Advanced Telecommunications Research Progra

DSpace@MIT

High-resolution distributed sampling of bandlimited fields with low-precision sensors

Author: Ishwar Prakash
Kumar Animesh
Ramchandran Kannan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/12/2008
Field of study

The problem of sampling a discrete-time sequence of spatially bandlimited fields with a bounded dynamic range, in a distributed, communication-constrained, processing environment is addressed. A central unit, having access to the data gathered by a dense network of fixed-precision sensors, operating under stringent inter-node communication constraints, is required to reconstruct the field snapshots to maximum accuracy. Both deterministic and stochastic field models are considered. For stochastic fields, results are established in the almost-sure sense. The feasibility of having a flexible tradeoff between the oversampling rate (sensor density) and the analog-to-digital converter (ADC) precision, while achieving an exponential accuracy in the number of bits per Nyquist-interval per snapshot is demonstrated. This exposes an underlying ``conservation of bits'' principle: the bit-budget per Nyquist-interval per snapshot (the rate) can be distributed along the amplitude axis (sensor-precision) and space (sensor density) in an almost arbitrary discrete-valued manner, while retaining the same (exponential) distortion-rate characteristics. Achievable information scaling laws for field reconstruction over a bounded region are also derived: With N one-bit sensors per Nyquist-interval,

\Theta(\log N)

Nyquist-intervals, and total network bitrate

R_{net} = \Theta((\log N)^2)

(per-sensor bitrate

\Theta((\log N)/N)

), the maximum pointwise distortion goes to zero as

D = O((\log N)^2/N)

D = O(R_{net} 2^{-\beta \sqrt{R_{net}}})

. This is shown to be possible with only nearest-neighbor communication, distributed coding, and appropriate interpolation algorithms. For a fixed, nonzero target distortion, the number of fixed-precision sensors and the network rate needed is always finite.Comment: 17 pages, 6 figures; paper withdrawn from IEEE Transactions on Signal Processing and re-submitted to the IEEE Transactions on Information Theor

arXiv.org e-Print Archive

CiteSeerX

Crossref

Robust sound event detection in bioacoustic sensor networks

Author: Bello Juan Pablo
Farnsworth Andrew
Kelling Steve
Lostanlen Vincent
Salamon Justin
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

Bioacoustic sensors, sometimes known as autonomous recording units (ARUs), can record sounds of wildlife over long periods of time in scalable and minimally invasive ways. Deriving per-species abundance estimates from these sensors requires detection, classification, and quantification of animal vocalizations as individual acoustic events. Yet, variability in ambient noise, both over time and across sensors, hinders the reliability of current automated systems for sound event detection (SED), such as convolutional neural networks (CNN) in the time-frequency domain. In this article, we develop, benchmark, and combine several machine listening techniques to improve the generalizability of SED models across heterogeneous acoustic environments. As a case study, we consider the problem of detecting avian flight calls from a ten-hour recording of nocturnal bird migration, recorded by a network of six ARUs in the presence of heterogeneous background noise. Starting from a CNN yielding state-of-the-art accuracy on this task, we introduce two noise adaptation techniques, respectively integrating short-term (60 milliseconds) and long-term (30 minutes) context. First, we apply per-channel energy normalization (PCEN) in the time-frequency domain, which applies short-term automatic gain control to every subband in the mel-frequency spectrogram. Secondly, we replace the last dense layer in the network by a context-adaptive neural network (CA-NN) layer. Combining them yields state-of-the-art results that are unmatched by artificial data augmentation alone. We release a pre-trained version of our best performing system under the name of BirdVoxDetect, a ready-to-use detector of avian flight calls in field recordings.Comment: 32 pages, in English. Submitted to PLOS ONE journal in February 2019; revised August 2019; published October 201

arXiv.org e-Print Archive

Directory of Open Access Journals

ECG Signal Reconstruction on the IoT-Gateway and Efficacy of Compressive Sensing Under Real-time Constraints

Author: Alinier Guillaume
Amira Abbes
Bensaali Faycal
Dimitrakopoulos George
Disi Mohammed Al
Djelouat Hamza
Kotronis Christos
Politis Elena
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Remote health monitoring is becoming indispensable, though, Internet of Things (IoTs)-based solutions have many implementation challenges, including energy consumption at the sensing node, and delay and instability due to cloud computing. Compressive sensing (CS) has been explored as a method to extend the battery lifetime of medical wearable devices. However, it is usually associated with computational complexity at the decoding end, increasing the latency of the system. Meanwhile, mobile processors are becoming computationally stronger and more efficient. Heterogeneous multicore platforms (HMPs) offer a local processing solution that can alleviate the limitations of remote signal processing. This paper demonstrates the real-time performance of compressed ECG reconstruction on ARM's big.LITTLE HMP and the advantages they provide as the primary processing unit of the IoT architecture. It also investigates the efficacy of CS in minimizing power consumption of a wearable device under real-time and hardware constraints. Results show that both the orthogonal matching pursuit and subspace pursuit reconstruction algorithms can be executed on the platform in real time and yield optimum performance on a single A15 core at minimum frequency. The CS extends the battery life of wearable medical devices up to 15.4% considering ECGs suitable for wellness applications and up to 6.6% for clinical grade ECGs. Energy consumption at the gateway is largely due to an active internet connection; hence, processing the signals locally both mitigates system's latency and improves gateway's battery life. Many remote health solutions can benefit from an architecture centered around the use of HMPs, a step toward better remote health monitoring systems.Peer reviewedFinal Published versio

Qatar University Institutional Repository

University of Hertfordshire Research Archive

A Fast Mellin and Scale Transform

Author: De Sena Antonio
Rocchesso Davide
Publication venue
Publication date: 01/01/2007
Field of study

A fast algorithm for the discrete-scale (and -Mellin) transform is proposed. It performs a discrete-time discrete-scale approximation of the continuous-time transform, with subquadratic asymptotic complexity. The algorithm is based on a well-known relation between the Mellin and Fourier transforms, and it is practical and accurate. The paper gives some theoretical background on the Mellin, -Mellin, and scale transforms. Then the algorithm is presented and analyzed in terms of computational complexity and precision. The effects of different interpolation procedures used in the algorithm are discussed

Archivio istituzionale della ricerca - Università IUAV di Venezia

Springer - Publisher Connector

Directory of Open Access Journals

Catalogo dei prodotti della ricerca

Open Access Repository

An LPC Excitation Model Using Wavelets

Author: Langi A. Z. (Armein)
Publication venue: Bandung Institute of Technology
Publication date: 01/01/2008
Field of study

This paper presents a new model of linear predictive coding (LPC) excitation using wavelets for speech signals. The LPC excitation becomes a linear combination of a set of self- similar, orthonormal, band-pass signals with time localization and constant bandwidth in a logarithmic scale. Thus, the set of the coefficients in the linear combination represents the LPC excitation. The discrete wavelet transform (DWT) obtains the coefficients, having several asymmetrical and non-uniform distribution properties that are attractive for speech processing and compression. The properties include magnitude dependent sensitivity, scale dependent sensitivity, and limited frame length, which can be used for having low bit-rate speech. We show that eliminating 8.97% highest magnitude coefficients degrades speech quality down to 1.49dB SNR, while eliminating 27.51% lowest magnitude coefficient maintain speech quality at a level of 27.42 dB SNR. Furthermore eliminating 6.25% coefficients located at a scale associated with 175-630 Hz band severely degrades speech quality down to 4.20 dB SNR. Finally, our results show that optimal frame length for telephony applications is among 32, 64, or 128 samples

Neliti