581 research outputs found
Low-delay nonuniform pseudo-QMF banks with application to speech enhancement
Journal ArticleAbstract-This paper presents a method for designing low-delay nonuniform pseudo quadrature mirror filter (QMF) banks. This method is motivated by the work of Li, Nguyen, and Tantaratana, in which the nonuniform filter bank is realized by combining an appropriate number of adjacent sub-bands of a uniform pseudo-QMF bank. In prior work, the prototype filter of the uniform pseudo-QMF bank was constrained to have linear phase and the overall delay associated with the filter bank was often unacceptably large for filter banks with a large number of sub-bands. This paper proposes a pseudo-QMF filter bank design technique that significantly reduces the delay by relaxing the linear phase constraints. An example in which an oversampled critical-band nonuniform filter bank is designed and applied to a two-state modeling speech enhancement system is presented in this paper. Comparison of the performance of this system to competing methods employing tree-structured, linear phase multiresolution analysis indicates that the approach described in this paper strikes a good balance between system performance and low delay
Frame Theory for Signal Processing in Psychoacoustics
This review chapter aims to strengthen the link between frame theory and
signal processing tasks in psychoacoustics. On the one side, the basic concepts
of frame theory are presented and some proofs are provided to explain those
concepts in some detail. The goal is to reveal to hearing scientists how this
mathematical theory could be relevant for their research. In particular, we
focus on frame theory in a filter bank approach, which is probably the most
relevant view-point for audio signal processing. On the other side, basic
psychoacoustic concepts are presented to stimulate mathematicians to apply
their knowledge in this field
Audio Coding Based on Integer Transforms
Die Audiocodierung hat sich in den letzten Jahren zu einem sehr
populären Forschungs- und Anwendungsgebiet entwickelt. Insbesondere
gehörangepasste Verfahren zur Audiocodierung, wie etwa MPEG-1 Layer-3
(MP3) oder MPEG-2 Advanced Audio Coding (AAC), werden häufig zur
effizienten Speicherung und Ăśbertragung von Audiosignalen verwendet. FĂĽr
professionelle Anwendungen, wie etwa die Archivierung und Ăśbertragung im
Studiobereich, ist hingegen eher eine verlustlose Audiocodierung angebracht.
Die bisherigen Ansätze für gehörangepasste und verlustlose
Audiocodierung sind technisch völlig verschieden. Moderne
gehörangepasste Audiocoder basieren meist auf Filterbänken, wie etwa der
ĂĽberlappenden orthogonalen Transformation "Modifizierte Diskrete
Cosinus-Transformation" (MDCT). Verlustlose Audiocoder hingegen
verwenden meist prädiktive Codierung zur Redundanzreduktion. Nur wenige
Ansätze zur transformationsbasierten verlustlosen Audiocodierung wurden
bisher versucht.
Diese Arbeit präsentiert einen neuen Ansatz hierzu, der das
Lifting-Schema auf die in der gehörangepassten Audiocodierung
verwendeten überlappenden Transformationen anwendet. Dies ermöglicht
eine invertierbare Integer-Approximation der ursprĂĽnglichen
Transformation, z.B. die IntMDCT als Integer-Approximation der MDCT. Die
selbe Technik kann auch für Filterbänke mit niedriger Systemverzögerung
angewandt werden. Weiterhin ermöglichen ein neuer, mehrdimensionaler
Lifting-Ansatz und eine Technik zur Spektralformung von
Quantisierungsfehlern eine Verbesserung der Approximation der
ursprĂĽnglichen Transformation.
Basierend auf diesen neuen Integer-Transformationen werden in dieser
Arbeit neue Verfahren zur Audiocodierung vorgestellt. Die Verfahren
umfassen verlustlose Audiocodierung, eine skalierbare verlustlose
Erweiterung eines gehörangepassten Audiocoders und einen integrierten
Ansatz zur fein skalierbaren gehörangepassten und verlustlosen
Audiocodierung. SchlieĂźlich wird mit Hilfe der Integer-Transformationen
ein neuer Ansatz zur unhörbaren Einbettung von Daten mit hohen
Datenraten in unkomprimierte Audiosignale vorgestellt.In recent years audio coding has become a very popular field for
research and applications. Especially perceptual audio coding schemes,
such as MPEG-1 Layer-3 (MP3) and MPEG-2 Advanced Audio Coding (AAC), are
widely used for efficient storage and transmission of music
signals. Nevertheless, for professional applications, such as archiving
and transmission in studio environments, lossless audio coding schemes
are considered more appropriate.
Traditionally, the technical approaches used in perceptual and lossless
audio coding have been separate worlds. In perceptual audio coding, the
use of filter banks, such as the lapped orthogonal transform "Modified
Discrete Cosine Transform" (MDCT), has been the approach of choice being
used by many state of the art coding schemes. On the other hand,
lossless audio coding schemes mostly employ predictive coding of
waveforms to remove redundancy. Only few attempts have been made so far
to use transform coding for the purpose of lossless audio coding.
This work presents a new approach of applying the lifting scheme to
lapped transforms used in perceptual audio coding. This allows for an
invertible integer-to-integer approximation of the original transform,
e.g. the IntMDCT as an integer approximation of the MDCT. The same
technique can also be applied to low-delay filter banks. A generalized,
multi-dimensional lifting approach and a noise-shaping technique are
introduced, allowing to further optimize the accuracy of the
approximation to the original transform.
Based on these new integer transforms, this work presents new audio
coding schemes and applications. The audio coding applications cover
lossless audio coding, scalable lossless enhancement of a perceptual
audio coder and fine-grain scalable perceptual and lossless audio
coding. Finally an approach to data hiding with high data rates in
uncompressed audio signals based on integer transforms is described
Scalable and perceptual audio compression
This thesis deals with scalable perceptual audio compression. Two scalable perceptual solutions as well as a scalable to lossless solution are proposed and investigated. One of the scalable perceptual solutions is built around sinusoidal modelling of the audio signal whilst the other is built on a transform coding paradigm. The scalable coders are shown to scale both in a waveform matching manner as well as a psychoacoustic manner. In order to measure the psychoacoustic scalability of the systems investigated in this thesis, the similarity between the original signal\u27s psychoacoustic parameters and that of the synthesized signal are compared. The psychoacoustic parameters used are loudness, sharpness, tonahty and roughness. This analysis technique is a novel method used in this thesis and it allows an insight into the perceptual distortion that has been introduced by any coder analyzed in this manner
Time-Domain Audio Source Separation Based on Wave-U-Net Combined with Discrete Wavelet Transform
We propose a time-domain audio source separation method using down-sampling
(DS) and up-sampling (US) layers based on a discrete wavelet transform (DWT).
The proposed method is based on one of the state-of-the-art deep neural
networks, Wave-U-Net, which successively down-samples and up-samples feature
maps. We find that this architecture resembles that of multiresolution
analysis, and reveal that the DS layers of Wave-U-Net cause aliasing and may
discard information useful for the separation. Although the effects of these
problems may be reduced by training, to achieve a more reliable source
separation method, we should design DS layers capable of overcoming the
problems. With this belief, focusing on the fact that the DWT has an
anti-aliasing filter and the perfect reconstruction property, we design the
proposed layers. Experiments on music source separation show the efficacy of
the proposed method and the importance of simultaneously considering the
anti-aliasing filters and the perfect reconstruction property.Comment: 5 pages, to appear in IEEE International Conference on Acoustics,
Speech, and Signal Processing 2020 (ICASSP 2020
Digital Signal Processing Research Program
Contains table of contents for Section 2, an introduction, reports on sixteen research projects and a list of publications.Bose CorporationMIT-Woods Hole Oceanographic Institution Joint Graduate Program in Oceanographic EngineeringAdvanced Research Projects Agency/U.S. Navy - Office of Naval Research Grant N00014-93-1-0686Lockheed Sanders, Inc./U.S. Navy - Office of Naval Research Contract N00014-91-C-0125U.S. Air Force - Office of Scientific Research Grant AFOSR-91-0034AT&T Laboratories Doctoral Support ProgramAdvanced Research Projects Agency/U.S. Navy - Office of Naval Research Grant N00014-89-J-1489U.S. Navy - Office of Naval Research Grant N00014-93-1-0686National Science Foundation FellowshipMaryland Procurement Office Contract MDA904-93-C-4180U.S. Navy - Office of Naval Research Grant N00014-91-J-162
- …