19 research outputs found

    Scalable and perceptual audio compression

    Get PDF
    This thesis deals with scalable perceptual audio compression. Two scalable perceptual solutions as well as a scalable to lossless solution are proposed and investigated. One of the scalable perceptual solutions is built around sinusoidal modelling of the audio signal whilst the other is built on a transform coding paradigm. The scalable coders are shown to scale both in a waveform matching manner as well as a psychoacoustic manner. In order to measure the psychoacoustic scalability of the systems investigated in this thesis, the similarity between the original signal\u27s psychoacoustic parameters and that of the synthesized signal are compared. The psychoacoustic parameters used are loudness, sharpness, tonahty and roughness. This analysis technique is a novel method used in this thesis and it allows an insight into the perceptual distortion that has been introduced by any coder analyzed in this manner

    Visual Data Compression for Multimedia Applications

    Get PDF
    The compression of visual information in the framework of multimedia applications is discussed. To this end, major approaches to compress still as well as moving pictures are reviewed. The most important objective in any compression algorithm is that of compression efficiency. High-compression coding of still pictures can be split into three categories: waveform, second-generation, and fractal coding techniques. Each coding approach introduces a different artifact at the target bit rates. The primary objective of most ongoing research in this field is to mask these artifacts as much as possible to the human visual system. Video-compression techniques have to deal with data enriched by one more component, namely, the temporal coordinate. Either compression techniques developed for still images can be generalized for three-dimensional signals (space and time) or a hybrid approach can be defined based on motion compensation. The video compression techniques can then be classified into the following four classes: waveform, object-based, model-based, and fractal coding techniques. This paper provides the reader with a tutorial on major visual data-compression techniques and a list of references for further information as the details of each metho

    Scalable Speech Coding for IP Networks

    Get PDF
    The emergence of Voice over Internet Protocol (VoIP) has posed new challenges to the development of speech codecs. The key issue of transporting real-time voice packet over IP networks is the lack of guarantee for reasonable speech quality due to packet delay or loss. Most of the widely used narrowband codecs depend on the Code Excited Linear Prediction (CELP) coding technique. The CELP technique utilizes the long-term prediction across the frame boundaries and therefore causes error propagation in the case of packet loss and need to transmit redundant information in order to mitigate the problem. The internet Low Bit-rate Codec (iLBC) employs the frame-independent coding and therefore inherently possesses high robustness to packet loss. However, the original iLBC lacks in some of the key features of speech codecs for IP networks: Rate flexibility, Scalability, and Wideband support. This dissertation presents novel scalable narrowband and wideband speech codecs for IP networks using the frame independent coding scheme based on the iLBC. The rate flexibility is added to the iLBC by employing the discrete cosine transform (DCT) and iii the scalable algebraic vector quantization (AVQ) and by allocating different number of bits to the AVQ. The bit-rate scalability is obtained by adding the enhancement layer to the core layer of the multi-rate iLBC. The enhancement layer encodes the weighted iLBC coding error in the modified DCT (MDCT) domain. The proposed wideband codec employs the bandwidth extension technique to extend the capabilities of existing narrowband codecs to provide wideband coding functionality. The wavelet transform is also used to further enhance the performance of the proposed codec. The performance evaluation results show that the proposed codec provides high robustness to packet loss and achieves equivalent or higher speech quality than state-of-the-art codecs under the clean channel condition

    Efficient compression of motion compensated residuals

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Discrete Wavelet Transforms

    Get PDF
    The discrete wavelet transform (DWT) algorithms have a firm position in processing of signals in several areas of research and industry. As DWT provides both octave-scale frequency and spatial timing of the analyzed signal, it is constantly used to solve and treat more and more advanced problems. The present book: Discrete Wavelet Transforms: Algorithms and Applications reviews the recent progress in discrete wavelet transform algorithms and applications. The book covers a wide range of methods (e.g. lifting, shift invariance, multi-scale analysis) for constructing DWTs. The book chapters are organized into four major parts. Part I describes the progress in hardware implementations of the DWT algorithms. Applications include multitone modulation for ADSL and equalization techniques, a scalable architecture for FPGA-implementation, lifting based algorithm for VLSI implementation, comparison between DWT and FFT based OFDM and modified SPIHT codec. Part II addresses image processing algorithms such as multiresolution approach for edge detection, low bit rate image compression, low complexity implementation of CQF wavelets and compression of multi-component images. Part III focuses watermaking DWT algorithms. Finally, Part IV describes shift invariant DWTs, DC lossless property, DWT based analysis and estimation of colored noise and an application of the wavelet Galerkin method. The chapters of the present book consist of both tutorial and highly advanced material. Therefore, the book is intended to be a reference text for graduate students and researchers to obtain state-of-the-art knowledge on specific applications

    Wavelet Filter Banks in Perceptual Audio Coding

    Get PDF
    This thesis studies the application of the wavelet filter bank (WFB) in perceptual audio coding by providing brief overviews of perceptual coding, psychoacoustics, wavelet theory, and existing wavelet coding algorithms. Furthermore, it describes the poor frequency localization property of the WFB and explores one filter design method, in particular, for improving channel separation between the wavelet bands. A wavelet audio coder has also been developed by the author to test the new filters. Preliminary tests indicate that the new filters provide some improvement over other wavelet filters when coding audio signals that are stationary-like and contain only a few harmonic components, and similar results for other types of audio signals that contain many spectral and temporal components. It has been found that the WFB provides a flexible decomposition scheme through the choice of the tree structure and basis filter, but at the cost of poor localization properties. This flexibility can be a benefit in the context of audio coding but the poor localization properties represent a drawback. Determining ways to fully utilize this flexibility, while minimizing the effects of poor time-frequency localization, is an area that is still very much open for research

    Framework for privacy-aware content distribution in peer-to- peer networks with copyright protection

    Get PDF
    The use of peer-to-peer (P2P) networks for multimedia distribution has spread out globally in recent years. This mass popularity is primarily driven by the efficient distribution of content, also giving rise to piracy and copyright infringement as well as privacy concerns. An end user (buyer) of a P2P content distribution system does not want to reveal his/her identity during a transaction with a content owner (merchant), whereas the merchant does not want the buyer to further redistribute the content illegally. Therefore, there is a strong need for content distribution mechanisms over P2P networks that do not pose security and privacy threats to copyright holders and end users, respectively. However, the current systems being developed to provide copyright and privacy protection to merchants and end users employ cryptographic mechanisms, which incur high computational and communication costs, making these systems impractical for the distribution of big files, such as music albums or movies.El uso de soluciones de igual a igual (peer-to-peer, P2P) para la distribución multimedia se ha extendido mundialmente en los últimos años. La amplia popularidad de este paradigma se debe, principalmente, a la distribución eficiente de los contenidos, pero también da lugar a la piratería, a la violación del copyright y a problemas de privacidad. Un usuario final (comprador) de un sistema de distribución de contenidos P2P no quiere revelar su identidad durante una transacción con un propietario de contenidos (comerciante), mientras que el comerciante no quiere que el comprador pueda redistribuir ilegalmente el contenido más adelante. Por lo tanto, existe una fuerte necesidad de mecanismos de distribución de contenidos por medio de redes P2P que no supongan un riesgo de seguridad y privacidad a los titulares de derechos y los usuarios finales, respectivamente. Sin embargo, los sistemas actuales que se desarrollan con el propósito de proteger el copyright y la privacidad de los comerciantes y los usuarios finales emplean mecanismos de cifrado que implican unas cargas computacionales y de comunicaciones muy elevadas que convierten a estos sistemas en poco prácticos para distribuir archivos de gran tamaño, tales como álbumes de música o películas.L'ús de solucions d'igual a igual (peer-to-peer, P2P) per a la distribució multimèdia s'ha estès mundialment els darrers anys. L'àmplia popularitat d'aquest paradigma es deu, principalment, a la distribució eficient dels continguts, però també dóna lloc a la pirateria, a la violació del copyright i a problemes de privadesa. Un usuari final (comprador) d'un sistema de distribució de continguts P2P no vol revelar la seva identitat durant una transacció amb un propietari de continguts (comerciant), mentre que el comerciant no vol que el comprador pugui redistribuir il·legalment el contingut més endavant. Per tant, hi ha una gran necessitat de mecanismes de distribució de continguts per mitjà de xarxes P2P que no comportin un risc de seguretat i privadesa als titulars de drets i els usuaris finals, respectivament. Tanmateix, els sistemes actuals que es desenvolupen amb el propòsit de protegir el copyright i la privadesa dels comerciants i els usuaris finals fan servir mecanismes d'encriptació que impliquen unes càrregues computacionals i de comunicacions molt elevades que fan aquests sistemes poc pràctics per a distribuir arxius de grans dimensions, com ara àlbums de música o pel·lícules

    Towards Predictive Rendering in Virtual Reality

    Get PDF
    The strive for generating predictive images, i.e., images representing radiometrically correct renditions of reality, has been a longstanding problem in computer graphics. The exactness of such images is extremely important for Virtual Reality applications like Virtual Prototyping, where users need to make decisions impacting large investments based on the simulated images. Unfortunately, generation of predictive imagery is still an unsolved problem due to manifold reasons, especially if real-time restrictions apply. First, existing scenes used for rendering are not modeled accurately enough to create predictive images. Second, even with huge computational efforts existing rendering algorithms are not able to produce radiometrically correct images. Third, current display devices need to convert rendered images into some low-dimensional color space, which prohibits display of radiometrically correct images. Overcoming these limitations is the focus of current state-of-the-art research. This thesis also contributes to this task. First, it briefly introduces the necessary background and identifies the steps required for real-time predictive image generation. Then, existing techniques targeting these steps are presented and their limitations are pointed out. To solve some of the remaining problems, novel techniques are proposed. They cover various steps in the predictive image generation process, ranging from accurate scene modeling over efficient data representation to high-quality, real-time rendering. A special focus of this thesis lays on real-time generation of predictive images using bidirectional texture functions (BTFs), i.e., very accurate representations for spatially varying surface materials. The techniques proposed by this thesis enable efficient handling of BTFs by compressing the huge amount of data contained in this material representation, applying them to geometric surfaces using texture and BTF synthesis techniques, and rendering BTF covered objects in real-time. Further approaches proposed in this thesis target inclusion of real-time global illumination effects or more efficient rendering using novel level-of-detail representations for geometric objects. Finally, this thesis assesses the rendering quality achievable with BTF materials, indicating a significant increase in realism but also confirming the remainder of problems to be solved to achieve truly predictive image generation

    Proceedings of the 7th Sound and Music Computing Conference

    Get PDF
    Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010
    corecore