428 research outputs found

    Audio Source Separation Using Sparse Representations

    Get PDF
    This is the author's final version of the article, first published as A. Nesbit, M. G. Jafari, E. Vincent and M. D. Plumbley. Audio Source Separation Using Sparse Representations. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 10, pp. 246-264. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch010file: NesbitJafariVincentP11-audio.pdf:n\NesbitJafariVincentP11-audio.pdf:PDF owner: markp timestamp: 2011.02.04file: NesbitJafariVincentP11-audio.pdf:n\NesbitJafariVincentP11-audio.pdf:PDF owner: markp timestamp: 2011.02.04The authors address the problem of audio source separation, namely, the recovery of audio signals from recordings of mixtures of those signals. The sparse component analysis framework is a powerful method for achieving this. Sparse orthogonal transforms, in which only few transform coefficients differ significantly from zero, are developed; once the signal has been transformed, energy is apportioned from each transform coefficient to each estimated source, and, finally, the signal is reconstructed using the inverse transform. The overriding aim of this chapter is to demonstrate how this framework, as exemplified here by two different decomposition methods which adapt to the signal to represent it sparsely, can be used to solve different problems in different mixing scenarios. To address the instantaneous (neither delays nor echoes) and underdetermined (more sources than mixtures) mixing model, a lapped orthogonal transform is adapted to the signal by selecting a basis from a library of predetermined bases. This method is highly related to the windowing methods used in the MPEG audio coding framework. In considering the anechoic (delays but no echoes) and determined (equal number of sources and mixtures) mixing case, a greedy adaptive transform is used based on orthogonal basis functions that are learned from the observed data, instead of being selected from a predetermined library of bases. This is found to encode the signal characteristics, by introducing a feedback system between the bases and the observed data. Experiments on mixtures of speech and music signals demonstrate that these methods give good signal approximations and separation performance, and indicate promising directions for future research

    Coding gain in paraunitary analysis/synthesis systems

    Get PDF
    A formal proof that bit allocation results hold for the entire class of paraunitary subband coders is presented. The problem of finding an optimal paraunitary subband coder, so as to maximize the coding gain of the system, is discussed. The bit allocation problem is analyzed for the case of the paraunitary tree-structured filter banks, such as those used for generating orthonormal wavelets. The even more general case of nonuniform filter banks is also considered. In all cases it is shown that under optimal bit allocation, the variances of the errors introduced by each of the quantizers have to be equal. Expressions for coding gains for these systems are derived

    Advanced Telecommunications and Signal Processing Program

    Get PDF
    Contains an introduction and reports on twelve research projects.AT&T FellowshipAdvanced Telecommunications Research ProgramINTEL FellowshipU.S. Navy - Office of Naval Research NDSEG Graduate FellowshipMaryland Procurement Office Contract MDA904-93-C-418

    Image Compression using Discrete Cosine Transform & Discrete Wavelet Transform

    Get PDF
    Image Compression addresses the problem of reducing the amount of data required to represent the digital image. Compression is achieved by the removal of one or more of three basic data redundancies: (1) Coding redundancy, which is present when less than optimal (i.e. the smallest length) code words are used; (2) Interpixel redundancy, which results from correlations between the pixels of an image & (3) psycho visual redundancy which is due to data that is ignored by the human visual system (i.e. visually nonessential information). Huffman codes contain the smallest possible number of code symbols (e.g., bits) per source symbol (e.g., grey level value) subject to the constraint that the source symbols are coded one at a time. So, Huffman coding when combined with technique of reducing the image redundancies using Discrete Cosine Transform (DCT) helps in compressing the image data to a very good extent. The Discrete Cosine Transform (DCT) is an example of transform coding. The current JPEG standard uses the DCT as its basis. The DC relocates the highest energies to the upper left corner of the image. The lesser energy or information is relocated into other areas. The DCT is fast. It can be quickly calculated and is best for images with smooth edges like photos with human subjects. The DCT coefficients are all real numbers unlike the Fourier Transform. The Inverse Discrete Cosine Transform (IDCT) can be used to retrieve the image from its transform representation. The Discrete wavelet transform (DWT) has gained widespread acceptance in signal processing and image compression. Because of their inherent multi-resolution nature, wavelet-coding schemes are especially suitable for applications where scalability and tolerable degradation are important. Recently the JPEG committee has released its new image coding standard, JPEG-2000, which has been based upon DWT

    Digital acoustics: processing wave fields in space and time using DSP tools

    Get PDF
    Systems with hundreds of microphones for acoustic field acquisition, or hundreds of loudspeakers for rendering, have been proposed and built. To analyze, design, and apply such systems requires a framework that allows us to leverage the vast set of tools available in digital signal processing in order to achieve intuitive and efficient algorithms. We thus propose a discrete space-time framework, grounded in classical acoustics, which addresses the discrete nature of the spatial and temporal sampling. In particular, a short-space/time Fourier transform is introduced, which is the natural extension of the localized or short-time Fourier transform. Processing in this intuitive domain allows us to easily devise algorithms for beam-forming, source separation, and multi-channel compression, among other useful tasks. The essential space band-limitedness of the Fourier spectrum is also used to solve the spatial equalization task required for sound field rendering in a region of interest. Examples of applications are show

    Contributions in image and video coding

    Get PDF
    Orientador: Max Henrique Machado CostaTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: A comunidade de codificação de imagens e vídeo vem também trabalhando em inovações que vão além das tradicionais técnicas de codificação de imagens e vídeo. Este trabalho é um conjunto de contribuições a vários tópicos que têm recebido crescente interesse de pesquisadores na comunidade, nominalmente, codificação escalável, codificação de baixa complexidade para dispositivos móveis, codificação de vídeo de múltiplas vistas e codificação adaptativa em tempo real. A primeira contribuição estuda o desempenho de três transformadas 3-D rápidas por blocos em um codificador de vídeo de baixa complexidade. O codificador recebeu o nome de Fast Embedded Video Codec (FEVC). Novos métodos de implementação e ordens de varredura são propostos para as transformadas. Os coeficiente 3-D são codificados por planos de bits pelos codificadores de entropia, produzindo um fluxo de bits (bitstream) de saída totalmente embutida. Todas as implementações são feitas usando arquitetura com aritmética inteira de 16 bits. Somente adições e deslocamentos de bits são necessários, o que reduz a complexidade computacional. Mesmo com essas restrições, um bom desempenho em termos de taxa de bits versus distorção pôde ser obtido e os tempos de codificação são significativamente menores (em torno de 160 vezes) quando comparados ao padrão H.264/AVC. A segunda contribuição é a otimização de uma recente abordagem proposta para codificação de vídeo de múltiplas vistas em aplicações de video-conferência e outras aplicações do tipo "unicast" similares. O cenário alvo nessa abordagem é fornecer vídeo com percepção real em 3-D e ponto de vista livre a boas taxas de compressão. Para atingir tal objetivo, pesos são atribuídos a cada vista e mapeados em parâmetros de quantização. Neste trabalho, o mapeamento ad-hoc anteriormente proposto entre pesos e parâmetros de quantização é mostrado ser quase-ótimo para uma fonte Gaussiana e um mapeamento ótimo é derivado para fonte típicas de vídeo. A terceira contribuição explora várias estratégias para varredura adaptativa dos coeficientes da transformada no padrão JPEG XR. A ordem de varredura original, global e adaptativa do JPEG XR é comparada com os métodos de varredura localizados e híbridos propostos neste trabalho. Essas novas ordens não requerem mudanças nem nos outros estágios de codificação e decodificação, nem na definição da bitstream A quarta e última contribuição propõe uma transformada por blocos dependente do sinal. As transformadas hierárquicas usualmente exploram a informação residual entre os níveis no estágio da codificação de entropia, mas não no estágio da transformada. A transformada proposta neste trabalho é uma técnica de compactação de energia que também explora as similaridades estruturais entre os níveis de resolução. A idéia central da técnica é incluir na transformada hierárquica um número de funções de base adaptativas derivadas da resolução menor do sinal. Um codificador de imagens completo foi desenvolvido para medir o desempenho da nova transformada e os resultados obtidos são discutidos neste trabalhoAbstract: The image and video coding community has often been working on new advances that go beyond traditional image and video architectures. This work is a set of contributions to various topics that have received increasing attention from researchers in the community, namely, scalable coding, low-complexity coding for portable devices, multiview video coding and run-time adaptive coding. The first contribution studies the performance of three fast block-based 3-D transforms in a low complexity video codec. The codec has received the name Fast Embedded Video Codec (FEVC). New implementation methods and scanning orders are proposed for the transforms. The 3-D coefficients are encoded bit-plane by bit-plane by entropy coders, producing a fully embedded output bitstream. All implementation is performed using 16-bit integer arithmetic. Only additions and bit shifts are necessary, thus lowering computational complexity. Even with these constraints, reasonable rate versus distortion performance can be achieved and the encoding time is significantly smaller (around 160 times) when compared to the H.264/AVC standard. The second contribution is the optimization of a recent approach proposed for multiview video coding in videoconferencing applications or other similar unicast-like applications. The target scenario in this approach is providing realistic 3-D video with free viewpoint video at good compression rates. To achieve such an objective, weights are computed for each view and mapped into quantization parameters. In this work, the previously proposed ad-hoc mapping between weights and quantization parameters is shown to be quasi-optimum for a Gaussian source and an optimum mapping is derived for a typical video source. The third contribution exploits several strategies for adaptive scanning of transform coefficients in the JPEG XR standard. The original global adaptive scanning order applied in JPEG XR is compared with the localized and hybrid scanning methods proposed in this work. These new orders do not require changes in either the other coding and decoding stages or in the bitstream definition. The fourth and last contribution proposes an hierarchical signal dependent block-based transform. Hierarchical transforms usually exploit the residual cross-level information at the entropy coding step, but not at the transform step. The transform proposed in this work is an energy compaction technique that can also exploit these cross-resolution-level structural similarities. The core idea of the technique is to include in the hierarchical transform a number of adaptive basis functions derived from the lower resolution of the signal. A full image codec is developed in order to measure the performance of the new transform and the obtained results are discussed in this workDoutoradoTelecomunicações e TelemáticaDoutor em Engenharia Elétric

    Pipelined implementation of Jpeg image compression using Hdl

    Full text link
    This thesis presents the architecture and design of a JPEG compressor for color images using VHDL. The system consists of major parts like color space converter, down sampler, 2-D DCT module, quantization, zigzag scanning and entropy coDing The color space conversion transforms the RGB colors to YCbCr color coDing The down sampling operation reduces the sampling rate of the color information (Cb and Cr). The 2-D DCT transform the pixel data from the spatial domain to the frequency domain. The quantization operation eliminates the high frequency components and the small amplitude coefficients of the co-sine expansion. Finally, the entropy coding uses run-length encoding (RLE), Huffman, variable length coding (VLC) and differential coding to decrease the number of bits used to represent the image. The JPEG compression is a lossy compression, since downsampling and quantization operations are irreversible. But the losses can be controlled in order to keep the necessary image quality; Architectures for these parts were designed and described in VHDL. The results were observed using Active-HDL simulator and the code being synthesized using xilinx ise for vertex-4 FPGA. This pipelined architecture has a minimum latency of 187 clock cycles

    Attractor image coding with low blocking effects.

    Get PDF
    by Ho, Hau Lai.Thesis (M.Phil.)--Chinese University of Hong Kong, 1997.Includes bibliographical references (leaves 97-103).Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Overview of Attractor Image Coding --- p.2Chapter 1.2 --- Scope of Thesis --- p.3Chapter 2 --- Fundamentals of Attractor Coding --- p.6Chapter 2.1 --- Notations --- p.6Chapter 2.2 --- Mathematical Preliminaries --- p.7Chapter 2.3 --- Partitioned Iterated Function Systems --- p.10Chapter 2.3.1 --- Mathematical Formulation of the PIFS --- p.12Chapter 2.4 --- Attractor Coding using the PIFS --- p.16Chapter 2.4.1 --- Quadtree Partitioning --- p.18Chapter 2.4.2 --- Inclusion of an Orthogonalization Operator --- p.19Chapter 2.5 --- Coding Examples --- p.21Chapter 2.5.1 --- Evaluation Criterion --- p.22Chapter 2.5.2 --- Experimental Settings --- p.22Chapter 2.5.3 --- Results and Discussions --- p.23Chapter 2.6 --- Summary --- p.25Chapter 3 --- Attractor Coding with Adjacent Block Parameter Estimations --- p.27Chapter 3.1 --- δ-Minimum Edge Difference --- p.29Chapter 3.1.1 --- Definition --- p.29Chapter 3.1.2 --- Theoretical Analysis --- p.31Chapter 3.2 --- Adjacent Block Parameter Estimation Scheme --- p.33Chapter 3.2.1 --- Joint Optimization --- p.34Chapter 3.2.2 --- Predictive Coding --- p.36Chapter 3.3 --- Algorithmic Descriptions of the Proposed Scheme --- p.39Chapter 3.4 --- Experimental Results --- p.40Chapter 3.5 --- Summary --- p.50Chapter 4 --- Attractor Coding using Lapped Partitioned Iterated Function Sys- tems --- p.51Chapter 4.1 --- Lapped Partitioned Iterated Function Systems --- p.53Chapter 4.1.1 --- Weighting Operator --- p.54Chapter 4.1.2 --- Mathematical Formulation of the LPIFS --- p.57Chapter 4.2 --- Attractor Coding using the LPIFS --- p.62Chapter 4.2.1 --- Choice of Weighting Operator --- p.64Chapter 4.2.2 --- Range Block Preprocessing --- p.69Chapter 4.2.3 --- Decoder Convergence Analysis --- p.73Chapter 4.3 --- Local Domain Block Searching --- p.74Chapter 4.3.1 --- Theoretical Foundation --- p.75Chapter 4.3.2 --- Local Block Searching Algorithm --- p.77Chapter 4.4 --- Experimental Results --- p.79Chapter 4.5 --- Summary --- p.90Chapter 5 --- Conclusion --- p.91Chapter 5.1 --- Original Contributions --- p.91Chapter 5.2 --- Subjects for Future Research --- p.92Chapter A --- Fundamental Definitions --- p.94Chapter B --- Appendix B --- p.96Bibliography --- p.9
    corecore