183 research outputs found

    Wavelet Filter Banks in Perceptual Audio Coding

    Get PDF
    This thesis studies the application of the wavelet filter bank (WFB) in perceptual audio coding by providing brief overviews of perceptual coding, psychoacoustics, wavelet theory, and existing wavelet coding algorithms. Furthermore, it describes the poor frequency localization property of the WFB and explores one filter design method, in particular, for improving channel separation between the wavelet bands. A wavelet audio coder has also been developed by the author to test the new filters. Preliminary tests indicate that the new filters provide some improvement over other wavelet filters when coding audio signals that are stationary-like and contain only a few harmonic components, and similar results for other types of audio signals that contain many spectral and temporal components. It has been found that the WFB provides a flexible decomposition scheme through the choice of the tree structure and basis filter, but at the cost of poor localization properties. This flexibility can be a benefit in the context of audio coding but the poor localization properties represent a drawback. Determining ways to fully utilize this flexibility, while minimizing the effects of poor time-frequency localization, is an area that is still very much open for research

    Scalable and perceptual audio compression

    Get PDF
    This thesis deals with scalable perceptual audio compression. Two scalable perceptual solutions as well as a scalable to lossless solution are proposed and investigated. One of the scalable perceptual solutions is built around sinusoidal modelling of the audio signal whilst the other is built on a transform coding paradigm. The scalable coders are shown to scale both in a waveform matching manner as well as a psychoacoustic manner. In order to measure the psychoacoustic scalability of the systems investigated in this thesis, the similarity between the original signal\u27s psychoacoustic parameters and that of the synthesized signal are compared. The psychoacoustic parameters used are loudness, sharpness, tonahty and roughness. This analysis technique is a novel method used in this thesis and it allows an insight into the perceptual distortion that has been introduced by any coder analyzed in this manner

    Biorthogonality in lapped transforms : a study in high-quality audio compression

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1996.Includes bibliographical references (leaves 76-82).by Shiufun Cheung.Ph.D

    Compressive Sensing for Speech Signals in Mobile Systems

    Get PDF
    Compressive sensing is an emerging and revolutionary technology that strongly relies on the sparsity of the signal. In compressive sensing the signal is sparsely compressively sampled by taking a small number of random projections of the signal, which contain most of the salient information. Compressive sensing has been previously applied in areas like: image processing, radar systems and sonar systems. This research work will discuss the potential implementation of compressive sensing in mobile communication systems and how it will influence their data rates. In a typical mobile communication system, the signal of interest is sampled at least at the Nyquist rate. The Nyquist sampling theorem states that the frequency used to sample a signal should be at least twice the maximum frequency contained within the signal. However, this is not the most efficient way to compress the signal, as it places a lot of burden in sampling the entire signal while only a small percentage of the transform coefficients are needed to represent it. The recent results in compressive sampling (also known as compressive sensing) provide a new way to reconstruct the original signal with a minimal number of observations. In compressive sensing the significant information about the signal/image is directly acquired, rather than acquiring the significant information that will be eventually thrown away. The goal of this research is to propose a new mobile communication system which employs compressive sampling to compress the speech signal at the transmitter and decompress it at the receiver. The expected results from the proposed system will be an increment in the data rates of these systems. In order to simulate how compressive sensing could be applied, a small speech signal was recorded in MATLAB. The signal at the transmitter is then multiplied by the measurement matrix which in this case is composed of randomly generated numbers. The measurement matrix is chosen in such a way that the sparse signal can be exactly recovered at the receiver using one of the different optimization techniques available. Once the signal has gone through the process of compressive sampling, it is ready to be transmitted through the mobile system. The transmitted signal is then reconstructed by the receiver from a significantly small number of samples by using any of the multiple optimization techniques available. The algorithm is simulated in MATLAB. The results show that if a threshold window is applied to the transmitted speech signal and the length of the signal is kept constant, the compression rate of the speech signal is increased

    New Trends in Biologically-Inspired Audio Coding

    Get PDF
    This book chapter deals with the generation of auditory-inspired spectro-temporal features aimed at audio coding. To do so, we first generate sparse audio representations we call spikegrams, using projections on gammatone or gammachirp kernels that generate neural spikes. Unlike Fourier-based representations, these representations are powerful at identifying auditory events, such as onsets, offsets, transients and harmonic structures. We show that the introduction of adaptiveness in the selection of gammachirp kernels enhances the compression rate compared to the case where the kernels are non-adaptive. We also integrate a masking model that helps reduce bitrate without loss of perceptible audio quality. We then quantize coding values using the genetic algorithm that is more optimal than uniform quantization for this framework. We finally propose a method to extract frequent auditory objects (patterns) in the aforementioned sparse representations. The extracted frequency-domain patterns (auditory objects) help us address spikes (auditory events) collectively rather than individually. When audio compression is needed, the different patterns are stored in a small codebook that can be used to efficiently encode audio materials in a lossless way. The approach is applied to different audio signals and results are discussed and compared. This work is a first step towards the design of a high-quality auditory-inspired \"object-based\" audio coder

    Methods of covert communication of speech signals based on a bio-inspired principle

    Get PDF
    This work presents two speech hiding methods based on a bio-inspired concept known as the ability of adaptation of speech signals. A cryptographic model uses the adaptation to transform a secret message to a non-sensitive target speech signal, and then, the scrambled speech signal is an intelligible signal. The residual intelligibility is extremely low and it is appropriate to transmit secure speech signals. On the other hand, in a steganographic model, the adapted speech signal is hidden into a host signal by using indirect substitution or direct substitution. In the first case, the scheme is known as Efficient Wavelet Masking (EWM), and in the second case, it is known as improved-EWM (iEWM). While EWM demonstrated to be highly statistical transparent, the second one, iEWM, demonstrated to be highly robust against signal manipulations. Finally, with the purpose to transmit secure speech signals in real-time operation, a hardware-based scheme is proposedEsta tesis presenta dos métodos de comunicación encubierta de señales de voz utilizando un concepto bio-inspirado, conocido como la “habilidad de adaptación de señales de voz”. El modelo de criptografía utiliza la adaptación para transformar un mensaje secreto a una señal de voz no confidencial, obteniendo una señal de voz encriptada legible. Este método es apropiado para transmitir señales de voz seguras porque en la señal encriptada no quedan rastros del mensaje secreto original. En el caso de esteganografía, la señal de voz adaptada se oculta en una señal de voz huésped, utilizando sustitución directa o indirecta. En el primer caso el esquema se denomina EWM y en el segundo caso iEWM. EWM demostró ser altamente transparente, mientras que iEWM demostró ser altamente robusto contra manipulaciones de señal. Finalmente, con el propósito de transmitir señales de voz seguras en tiempo real, se propone un esquema para dispositivos hardware

    1994 Science Information Management and Data Compression Workshop

    Get PDF
    This document is the proceedings from the 'Science Information Management and Data Compression Workshop,' which was held on September 26-27, 1994, at the NASA Goddard Space Flight Center, Greenbelt, Maryland. The Workshop explored promising computational approaches for handling the collection, ingestion, archival and retrieval of large quantities of data in future Earth and space science missions. It consisted of eleven presentations covering a range of information management and data compression approaches that are being or have been integrated into actual or prototypical Earth or space science data information systems, or that hold promise for such an application. The workshop was organized by James C. Tilton and Robert F. Cromp of the NASA Goddard Space Flight Center

    Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)

    Get PDF
    The implicit objective of the biennial "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) is to foster collaboration between international scientific teams by disseminating ideas through both specific oral/poster presentations and free discussions. For its second edition, the iTWIST workshop took place in the medieval and picturesque town of Namur in Belgium, from Wednesday August 27th till Friday August 29th, 2014. The workshop was conveniently located in "The Arsenal" building within walking distance of both hotels and town center. iTWIST'14 has gathered about 70 international participants and has featured 9 invited talks, 10 oral presentations, and 14 posters on the following themes, all related to the theory, application and generalization of the "sparsity paradigm": Sparsity-driven data sensing and processing; Union of low dimensional subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph sensing/processing; Blind inverse problems and dictionary learning; Sparsity and computational neuroscience; Information theory, geometry and randomness; Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?; Sparse machine learning and inference.Comment: 69 pages, 24 extended abstracts, iTWIST'14 website: http://sites.google.com/site/itwist1

    Dense light field coding: a survey

    Get PDF
    Light Field (LF) imaging is a promising solution for providing more immersive and closer to reality multimedia experiences to end-users with unprecedented creative freedom and flexibility for applications in different areas, such as virtual and augmented reality. Due to the recent technological advances in optics, sensor manufacturing and available transmission bandwidth, as well as the investment of many tech giants in this area, it is expected that soon many LF transmission systems will be available to both consumers and professionals. Recognizing this, novel standardization initiatives have recently emerged in both the Joint Photographic Experts Group (JPEG) and the Moving Picture Experts Group (MPEG), triggering the discussion on the deployment of LF coding solutions to efficiently handle the massive amount of data involved in such systems. Since then, the topic of LF content coding has become a booming research area, attracting the attention of many researchers worldwide. In this context, this paper provides a comprehensive survey of the most relevant LF coding solutions proposed in the literature, focusing on angularly dense LFs. Special attention is placed on a thorough description of the different LF coding methods and on the main concepts related to this relevant area. Moreover, comprehensive insights are presented into open research challenges and future research directions for LF coding.info:eu-repo/semantics/publishedVersio
    corecore