1,011 research outputs found
Audio Coding Based on Integer Transforms
Die Audiocodierung hat sich in den letzten Jahren zu einem sehr
populären Forschungs- und Anwendungsgebiet entwickelt. Insbesondere
gehörangepasste Verfahren zur Audiocodierung, wie etwa MPEG-1 Layer-3
(MP3) oder MPEG-2 Advanced Audio Coding (AAC), werden häufig zur
effizienten Speicherung und Übertragung von Audiosignalen verwendet. Für
professionelle Anwendungen, wie etwa die Archivierung und Übertragung im
Studiobereich, ist hingegen eher eine verlustlose Audiocodierung angebracht.
Die bisherigen Ansätze für gehörangepasste und verlustlose
Audiocodierung sind technisch völlig verschieden. Moderne
gehörangepasste Audiocoder basieren meist auf Filterbänken, wie etwa der
überlappenden orthogonalen Transformation "Modifizierte Diskrete
Cosinus-Transformation" (MDCT). Verlustlose Audiocoder hingegen
verwenden meist prädiktive Codierung zur Redundanzreduktion. Nur wenige
Ansätze zur transformationsbasierten verlustlosen Audiocodierung wurden
bisher versucht.
Diese Arbeit präsentiert einen neuen Ansatz hierzu, der das
Lifting-Schema auf die in der gehörangepassten Audiocodierung
verwendeten überlappenden Transformationen anwendet. Dies ermöglicht
eine invertierbare Integer-Approximation der ursprünglichen
Transformation, z.B. die IntMDCT als Integer-Approximation der MDCT. Die
selbe Technik kann auch für Filterbänke mit niedriger Systemverzögerung
angewandt werden. Weiterhin ermöglichen ein neuer, mehrdimensionaler
Lifting-Ansatz und eine Technik zur Spektralformung von
Quantisierungsfehlern eine Verbesserung der Approximation der
ursprünglichen Transformation.
Basierend auf diesen neuen Integer-Transformationen werden in dieser
Arbeit neue Verfahren zur Audiocodierung vorgestellt. Die Verfahren
umfassen verlustlose Audiocodierung, eine skalierbare verlustlose
Erweiterung eines gehörangepassten Audiocoders und einen integrierten
Ansatz zur fein skalierbaren gehörangepassten und verlustlosen
Audiocodierung. Schließlich wird mit Hilfe der Integer-Transformationen
ein neuer Ansatz zur unhörbaren Einbettung von Daten mit hohen
Datenraten in unkomprimierte Audiosignale vorgestellt.In recent years audio coding has become a very popular field for
research and applications. Especially perceptual audio coding schemes,
such as MPEG-1 Layer-3 (MP3) and MPEG-2 Advanced Audio Coding (AAC), are
widely used for efficient storage and transmission of music
signals. Nevertheless, for professional applications, such as archiving
and transmission in studio environments, lossless audio coding schemes
are considered more appropriate.
Traditionally, the technical approaches used in perceptual and lossless
audio coding have been separate worlds. In perceptual audio coding, the
use of filter banks, such as the lapped orthogonal transform "Modified
Discrete Cosine Transform" (MDCT), has been the approach of choice being
used by many state of the art coding schemes. On the other hand,
lossless audio coding schemes mostly employ predictive coding of
waveforms to remove redundancy. Only few attempts have been made so far
to use transform coding for the purpose of lossless audio coding.
This work presents a new approach of applying the lifting scheme to
lapped transforms used in perceptual audio coding. This allows for an
invertible integer-to-integer approximation of the original transform,
e.g. the IntMDCT as an integer approximation of the MDCT. The same
technique can also be applied to low-delay filter banks. A generalized,
multi-dimensional lifting approach and a noise-shaping technique are
introduced, allowing to further optimize the accuracy of the
approximation to the original transform.
Based on these new integer transforms, this work presents new audio
coding schemes and applications. The audio coding applications cover
lossless audio coding, scalable lossless enhancement of a perceptual
audio coder and fine-grain scalable perceptual and lossless audio
coding. Finally an approach to data hiding with high data rates in
uncompressed audio signals based on integer transforms is described
Audio Coding Based on Integer Transforms
Die Audiocodierung hat sich in den letzten Jahren zu einem sehr
populären Forschungs- und Anwendungsgebiet entwickelt. Insbesondere
gehörangepasste Verfahren zur Audiocodierung, wie etwa MPEG-1 Layer-3
(MP3) oder MPEG-2 Advanced Audio Coding (AAC), werden häufig zur
effizienten Speicherung und Übertragung von Audiosignalen verwendet. Für
professionelle Anwendungen, wie etwa die Archivierung und Übertragung im
Studiobereich, ist hingegen eher eine verlustlose Audiocodierung angebracht.
Die bisherigen Ansätze für gehörangepasste und verlustlose
Audiocodierung sind technisch völlig verschieden. Moderne
gehörangepasste Audiocoder basieren meist auf Filterbänken, wie etwa der
überlappenden orthogonalen Transformation "Modifizierte Diskrete
Cosinus-Transformation" (MDCT). Verlustlose Audiocoder hingegen
verwenden meist prädiktive Codierung zur Redundanzreduktion. Nur wenige
Ansätze zur transformationsbasierten verlustlosen Audiocodierung wurden
bisher versucht.
Diese Arbeit präsentiert einen neuen Ansatz hierzu, der das
Lifting-Schema auf die in der gehörangepassten Audiocodierung
verwendeten überlappenden Transformationen anwendet. Dies ermöglicht
eine invertierbare Integer-Approximation der ursprünglichen
Transformation, z.B. die IntMDCT als Integer-Approximation der MDCT. Die
selbe Technik kann auch für Filterbänke mit niedriger Systemverzögerung
angewandt werden. Weiterhin ermöglichen ein neuer, mehrdimensionaler
Lifting-Ansatz und eine Technik zur Spektralformung von
Quantisierungsfehlern eine Verbesserung der Approximation der
ursprünglichen Transformation.
Basierend auf diesen neuen Integer-Transformationen werden in dieser
Arbeit neue Verfahren zur Audiocodierung vorgestellt. Die Verfahren
umfassen verlustlose Audiocodierung, eine skalierbare verlustlose
Erweiterung eines gehörangepassten Audiocoders und einen integrierten
Ansatz zur fein skalierbaren gehörangepassten und verlustlosen
Audiocodierung. Schließlich wird mit Hilfe der Integer-Transformationen
ein neuer Ansatz zur unhörbaren Einbettung von Daten mit hohen
Datenraten in unkomprimierte Audiosignale vorgestellt.In recent years audio coding has become a very popular field for
research and applications. Especially perceptual audio coding schemes,
such as MPEG-1 Layer-3 (MP3) and MPEG-2 Advanced Audio Coding (AAC), are
widely used for efficient storage and transmission of music
signals. Nevertheless, for professional applications, such as archiving
and transmission in studio environments, lossless audio coding schemes
are considered more appropriate.
Traditionally, the technical approaches used in perceptual and lossless
audio coding have been separate worlds. In perceptual audio coding, the
use of filter banks, such as the lapped orthogonal transform "Modified
Discrete Cosine Transform" (MDCT), has been the approach of choice being
used by many state of the art coding schemes. On the other hand,
lossless audio coding schemes mostly employ predictive coding of
waveforms to remove redundancy. Only few attempts have been made so far
to use transform coding for the purpose of lossless audio coding.
This work presents a new approach of applying the lifting scheme to
lapped transforms used in perceptual audio coding. This allows for an
invertible integer-to-integer approximation of the original transform,
e.g. the IntMDCT as an integer approximation of the MDCT. The same
technique can also be applied to low-delay filter banks. A generalized,
multi-dimensional lifting approach and a noise-shaping technique are
introduced, allowing to further optimize the accuracy of the
approximation to the original transform.
Based on these new integer transforms, this work presents new audio
coding schemes and applications. The audio coding applications cover
lossless audio coding, scalable lossless enhancement of a perceptual
audio coder and fine-grain scalable perceptual and lossless audio
coding. Finally an approach to data hiding with high data rates in
uncompressed audio signals based on integer transforms is described
Considering Bluetooth's Subband Codec (SBC) for Wideband Speech and Audio on the Internet
The Bluetooth Special Interest Group (SIG) has standardized the subband coding (SBC) audio codec to connect headphones via wireless Bluetooth links. SBC compresses audio at high fidelity while having an ultra-low algorithm delay. To make SBC suitable for the Internet, we extend it by using a time and packet loss concealment (PLC) algorithm that is based on ITU's G.711 Appendix I. The design is novel in the aspect of the interface between codec and speech receiver. We developed a new approach on how to distribute the functionality of a speech receiver between codec and application. Our approach leads to easier implementations of high quality VoIP applications.
We conducted subjective and objective listening tests of the audio quality of SBC and PLC in order to determine an optimal coding mode and the trade-off between coding mode and packet loss rate. More precisely, we conducted MUSHRA listening tests for selected sample items. These tests results are then compared with the results of multiple objective assessment algorithms (ITU P.862 PESQ, ITU BS.1387-1 PEAQ, Creusere's algorithm). We found out that a combination of the PEAQ basic and advanced values best matches---after third order linear regression---the subjective MUSHRA results . The linear regression has coefficient of determination of R²=0.907². By comparison, our individual human ratings show a correlation of about R=0.9 compared to our averaged human rating results.
Using the combination of both PEAQ algorithms, we calculate hundred thousands of objective audio quality ratings varying audio content and algorithmic parameters of SBC and PLC. The results show which set of parameters value are best suitable for a bandwidth and delay constrained link. The transmission quality of SBC is enhanced significantly by selecting optimal encoding parameters as compared to the default parameter sets given in the standard.
Finally, we present preliminary objective tests results on the comparison of the audio codecs SBC, CELT, APT-X and ULD coding speech and audio transmission. They all allow a mono and stereo transmission of music at ultra-low coding delays (<10ms), which is especially useful for distributed ensemble performances over the Internet
Database of audio records
Diplomka a prakticky castDiplome with partical part
Homogeneous and heterogeneous MPSoC architectures with network-on-chip connectivity for low-power and real-time multimedia signal processing
Two multiprocessor system-on-chip (MPSoC) architectures are proposed and compared in the paper with reference to audio and video processing applications. One architecture exploits a homogeneous topology; it consists of 8 identical tiles, each made of a 32-bit RISC core enhanced by a 64-bit DSP coprocessor with local memory. The other MPSoC architecture exploits a heterogeneous-tile topology with on-chip distributed memory resources; the tiles act as application specific processors supporting a different class of algorithms. In both architectures, the multiple tiles are interconnected by a network-on-chip (NoC) infrastructure, through network interfaces and routers, which allows parallel operations of the multiple tiles. The functional performances and the implementation complexity of the NoC-based MPSoC architectures are assessed by synthesis results in submicron CMOS technology. Among the large set of supported algorithms, two case studies are considered: the real-time implementation of an H.264/MPEG AVC video codec and of a low-distortion digital audio amplifier. The heterogeneous architecture ensures a higher power efficiency and a smaller area occupation and is more suited for low-power multimedia processing, such as in mobile devices. The homogeneous scheme allows for a higher flexibility and easier system scalability and is more suited for general-purpose DSP tasks in power-supplied devices
A Panorama on Multiscale Geometric Representations, Intertwining Spatial, Directional and Frequency Selectivity
The richness of natural images makes the quest for optimal representations in
image processing and computer vision challenging. The latter observation has
not prevented the design of image representations, which trade off between
efficiency and complexity, while achieving accurate rendering of smooth regions
as well as reproducing faithful contours and textures. The most recent ones,
proposed in the past decade, share an hybrid heritage highlighting the
multiscale and oriented nature of edges and patterns in images. This paper
presents a panorama of the aforementioned literature on decompositions in
multiscale, multi-orientation bases or dictionaries. They typically exhibit
redundancy to improve sparsity in the transformed domain and sometimes its
invariance with respect to simple geometric deformations (translation,
rotation). Oriented multiscale dictionaries extend traditional wavelet
processing and may offer rotation invariance. Highly redundant dictionaries
require specific algorithms to simplify the search for an efficient (sparse)
representation. We also discuss the extension of multiscale geometric
decompositions to non-Euclidean domains such as the sphere or arbitrary meshed
surfaces. The etymology of panorama suggests an overview, based on a choice of
partially overlapping "pictures". We hope that this paper will contribute to
the appreciation and apprehension of a stream of current research directions in
image understanding.Comment: 65 pages, 33 figures, 303 reference
PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS
Multichannel acoustic signal processing has undergone major development
in recent years due to the increased complexity of current audio processing
applications. People want to collaborate through communication with the
feeling of being together and sharing the same environment, what is considered
as Immersive Audio Schemes. In this phenomenon, several acoustic
e ects are involved: 3D spatial sound, room compensation, crosstalk cancelation,
sound source localization, among others. However, high computing
capacity is required to achieve any of these e ects in a real large-scale system,
what represents a considerable limitation for real-time applications.
The increase of the computational capacity has been historically linked
to the number of transistors in a chip. However, nowadays the improvements
in the computational capacity are mainly given by increasing the
number of processing units, i.e expanding parallelism in computing. This
is the case of the Graphics Processing Units (GPUs), that own now thousands
of computing cores. GPUs were traditionally related to graphic or image
applications, but new releases in the GPU programming environments,
CUDA or OpenCL, allowed that most applications were computationally
accelerated in elds beyond graphics. This thesis aims to demonstrate
that GPUs are totally valid tools to carry out audio applications that require
high computational resources. To this end, di erent applications in
the eld of audio processing are studied and performed using GPUs. This
manuscript also analyzes and solves possible limitations in each GPU-based
implementation both from the acoustic point of view as from the computational
point of view. In this document, we have addressed the following
problems:
Most of audio applications are based on massive ltering. Thus, the
rst implementation to undertake is a fundamental operation in the audio
processing: the convolution. It has been rst developed as a computational
kernel and afterwards used for an application that combines multiples convolutions
concurrently: generalized crosstalk cancellation and equalization.
The proposed implementation can successfully manage two di erent and
common situations: size of bu ers that are much larger than the size of the
lters and size of bu ers that are much smaller than the size of the lters.
Two spatial audio applications that use the GPU as a co-processor have been developed from the massive multichannel ltering. First application
deals with binaural audio. Its main feature is that this application is able
to synthesize sound sources in spatial positions that are not included in the
database of HRTF and to generate smoothly movements of sound sources.
Both features were designed after di erent tests (objective and subjective).
The performance regarding number of sound source that could be rendered
in real time was assessed on GPUs with di erent GPU architectures. A
similar performance is measured in a Wave Field Synthesis system (second
spatial audio application) that is composed of 96 loudspeakers. The proposed
GPU-based implementation is able to reduce the room e ects during
the sound source rendering.
A well-known approach for sound source localization in noisy and reverberant
environments is also addressed on a multi-GPU system. This
is the case of the Steered Response Power with Phase Transform (SRPPHAT)
algorithm. Since localization accuracy can be improved by using
high-resolution spatial grids and a high number of microphones, accurate
acoustic localization systems require high computational power. The solutions
implemented in this thesis are evaluated both from localization and
from computational performance points of view, taking into account different
acoustic environments, and always from a real-time implementation
perspective.
Finally, This manuscript addresses also massive multichannel ltering
when the lters present an In nite Impulse Response (IIR). Two cases are
analyzed in this manuscript: 1) IIR lters composed of multiple secondorder
sections, and 2) IIR lters that presents an allpass response. Both
cases are used to develop and accelerate two di erent applications: 1) to
execute multiple Equalizations in a WFS system, and 2) to reduce the
dynamic range in an audio signal.Belloch Rodríguez, JA. (2014). PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/40651TESISPremios Extraordinarios de tesis doctorale
- …