212 research outputs found

    Solutions to non-stationary problems in wavelet space.

    Get PDF

    Contributions in image and video coding

    Get PDF
    Orientador: Max Henrique Machado CostaTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: A comunidade de codificação de imagens e vídeo vem também trabalhando em inovações que vão além das tradicionais técnicas de codificação de imagens e vídeo. Este trabalho é um conjunto de contribuições a vários tópicos que têm recebido crescente interesse de pesquisadores na comunidade, nominalmente, codificação escalável, codificação de baixa complexidade para dispositivos móveis, codificação de vídeo de múltiplas vistas e codificação adaptativa em tempo real. A primeira contribuição estuda o desempenho de três transformadas 3-D rápidas por blocos em um codificador de vídeo de baixa complexidade. O codificador recebeu o nome de Fast Embedded Video Codec (FEVC). Novos métodos de implementação e ordens de varredura são propostos para as transformadas. Os coeficiente 3-D são codificados por planos de bits pelos codificadores de entropia, produzindo um fluxo de bits (bitstream) de saída totalmente embutida. Todas as implementações são feitas usando arquitetura com aritmética inteira de 16 bits. Somente adições e deslocamentos de bits são necessários, o que reduz a complexidade computacional. Mesmo com essas restrições, um bom desempenho em termos de taxa de bits versus distorção pôde ser obtido e os tempos de codificação são significativamente menores (em torno de 160 vezes) quando comparados ao padrão H.264/AVC. A segunda contribuição é a otimização de uma recente abordagem proposta para codificação de vídeo de múltiplas vistas em aplicações de video-conferência e outras aplicações do tipo "unicast" similares. O cenário alvo nessa abordagem é fornecer vídeo com percepção real em 3-D e ponto de vista livre a boas taxas de compressão. Para atingir tal objetivo, pesos são atribuídos a cada vista e mapeados em parâmetros de quantização. Neste trabalho, o mapeamento ad-hoc anteriormente proposto entre pesos e parâmetros de quantização é mostrado ser quase-ótimo para uma fonte Gaussiana e um mapeamento ótimo é derivado para fonte típicas de vídeo. A terceira contribuição explora várias estratégias para varredura adaptativa dos coeficientes da transformada no padrão JPEG XR. A ordem de varredura original, global e adaptativa do JPEG XR é comparada com os métodos de varredura localizados e híbridos propostos neste trabalho. Essas novas ordens não requerem mudanças nem nos outros estágios de codificação e decodificação, nem na definição da bitstream A quarta e última contribuição propõe uma transformada por blocos dependente do sinal. As transformadas hierárquicas usualmente exploram a informação residual entre os níveis no estágio da codificação de entropia, mas não no estágio da transformada. A transformada proposta neste trabalho é uma técnica de compactação de energia que também explora as similaridades estruturais entre os níveis de resolução. A idéia central da técnica é incluir na transformada hierárquica um número de funções de base adaptativas derivadas da resolução menor do sinal. Um codificador de imagens completo foi desenvolvido para medir o desempenho da nova transformada e os resultados obtidos são discutidos neste trabalhoAbstract: The image and video coding community has often been working on new advances that go beyond traditional image and video architectures. This work is a set of contributions to various topics that have received increasing attention from researchers in the community, namely, scalable coding, low-complexity coding for portable devices, multiview video coding and run-time adaptive coding. The first contribution studies the performance of three fast block-based 3-D transforms in a low complexity video codec. The codec has received the name Fast Embedded Video Codec (FEVC). New implementation methods and scanning orders are proposed for the transforms. The 3-D coefficients are encoded bit-plane by bit-plane by entropy coders, producing a fully embedded output bitstream. All implementation is performed using 16-bit integer arithmetic. Only additions and bit shifts are necessary, thus lowering computational complexity. Even with these constraints, reasonable rate versus distortion performance can be achieved and the encoding time is significantly smaller (around 160 times) when compared to the H.264/AVC standard. The second contribution is the optimization of a recent approach proposed for multiview video coding in videoconferencing applications or other similar unicast-like applications. The target scenario in this approach is providing realistic 3-D video with free viewpoint video at good compression rates. To achieve such an objective, weights are computed for each view and mapped into quantization parameters. In this work, the previously proposed ad-hoc mapping between weights and quantization parameters is shown to be quasi-optimum for a Gaussian source and an optimum mapping is derived for a typical video source. The third contribution exploits several strategies for adaptive scanning of transform coefficients in the JPEG XR standard. The original global adaptive scanning order applied in JPEG XR is compared with the localized and hybrid scanning methods proposed in this work. These new orders do not require changes in either the other coding and decoding stages or in the bitstream definition. The fourth and last contribution proposes an hierarchical signal dependent block-based transform. Hierarchical transforms usually exploit the residual cross-level information at the entropy coding step, but not at the transform step. The transform proposed in this work is an energy compaction technique that can also exploit these cross-resolution-level structural similarities. The core idea of the technique is to include in the hierarchical transform a number of adaptive basis functions derived from the lower resolution of the signal. A full image codec is developed in order to measure the performance of the new transform and the obtained results are discussed in this workDoutoradoTelecomunicações e TelemáticaDoutor em Engenharia Elétric

    Efficient Image Segmentation and Segment-Based Analysis in Computer Vision Applications

    Get PDF
    This dissertation focuses on efficient image segmentation and segment-based object recognition in computer vision applications. Special attention is devoted to analyzing shape, of particular importance for our two applications: plant species identification from leaf photos, and object classification in remote sensing images. Additionally, both problems are bound by efficiency, constraining the choice of applicable methods: leaf recognition results are to be used within an interactive system, while remote sensing image analysis must scale well over very large image sets. Leafsnap was the first mobile app to provide automatic recognition of tree species, currently counting with over 1.7 million downloads. We present an overview of the mobile app and corresponding back end recognition system, as well as a preliminary analysis of user-submitted data. More than 1.7 million valid leaf photos have been uploaded by users, 1.3 million of which are GPS-tagged. We then focus on the problem of segmenting photos of leaves taken against plain light-colored backgrounds. These types of photos are used in practice within Leafsnap for tree species recognition. A good segmentation is essential in order to make use of the distinctive shape of leaves for recognition. We present a comparative experimental evaluation of several segmentation methods, including quantitative and qualitative results. We then introduce a custom-tailored leaf segmentation method that shows superior performance while maintaining computational efficiency. The other contribution of this work is a set of attributes for analysis of image segments. The set of attributes is designed for use in knowledge-based systems, so they are selected to be intuitive and easily describable. The attributes can also be computed efficiently, to allow applicability across different problems. We experiment with several descriptive measures from the literature and encounter certain limitations, leading us to introduce new attribute formulations and more efficient computational methods. Finally, we experiment with the attribute set on our two applications: plant species identification from leaf photos and object recognition in remote sensing images

    Journal of Asian Finance, Economics and Business, v. 4, no. 4

    Get PDF

    Harmony Analysis in A’Capella Singing

    Get PDF
    Speech production is made by the larynx and then modified by the articulators; this speech contains large amounts of useful information. Similar to speech, singing is made by the same method; albeit with a specific acoustic difference; singing contains rhythm and is usually of a higher intensity. Singing is almost always accompanied by musical instruments which generally makes detecting and separating voice difficult (Kim Hm 2012). A’ Capella singing is known for singing without musical accompaniment, making it somewhat easier to retrieve vocal information. The methods developed to detect information from speech are not new concepts and are commonly applied to almost every item in the average household. Singing processing adapts a large portion of these techniques to detect vocal information of singers including melody, language, emotion, harmony and pitch. The techniques used in speech and singing processing are catagorised into one of three categories: 1. Time Domain 2. Frequency Domain 3. Other Algorithms This project will utilise an algorithm from each category; In particular, Average Magnitude Difference Function (AMDF), Cepstral Analysis and Linear Predictive Coding (LPC). AMDF is the result of taking the absolute value of a sample taken a time (k) and a delayed version of itself at (k-n). Its known to provide relatively good accuracy with low computational cost, however it is prone to variation in background noise (Hui, L et al 2006). Cepstral Analysis is known for separating the convolved version of a signal into the source and voice tract components and provides fast computational speeds from utilising the ii Fourier Transform and its Inverse. LPC provides a linear estimation of past values of a signal, the resulting predictor and error coefficients are utilised to develop the spectral envelope for pitch detection. The project tested the algorithms against 11 tracks containing different harmonic content, each method was compared on their speed, accuracy, where applicable the number of notes correctly identified. All three algorithms gave relatively good results against single note tracks, with the LPC algorithms providing the most accurate results. When tested against multi-note tracks and pre-recorder singing tracks the AMDF and Cepstral Analysis methods performed poorly in terms of the accuracy and number of correctly identified notes. LPC method performed considerably better returning an average of 66.8% of notes correctly

    Sperm and Cilia Dynamics

    Get PDF
    Spermien schwimmen durch Flüssigkeiten mithilfe einer aktiven schlangenförmigen Bewegung ihres Schwanzes, dem Flagellum. Experimentell hat sich herausgestellt, dass sich Spermien stets an Oberflächen ansammeln. An der Oberfläche schwimmen sie dann in einer kreisförmiger Bewegung, deren Ausrichtung von der Spezies abhängt. Zilien sind haarähnliche Zellfortsätze, die mit einer peitschenförmigen Bewegung Flüssigkeit, oder die Zelle, bewegen. Zilien finden sich in den verschiedensten Organismen. Zum Beispiel benutzt das Pantoffeltierchen Zilien zur Fortbewegung, während in der menschlichen Lunge Zilien Schleim und Fremdkörper heraus transportieren. Das spannendste Phänomen, welches man bei Zilien beobachten kann, ist wohl die "Metachronal Wave". Wenn viele Zilien gemeinsam schlagen, bildet sich spontan ein Wellenmuster aus, ganz ähnlich dem eines Weizenfeldes im Wind. Zilien und Flagellen haben eine gemeinsame Struktur, das Axonem. Wir simulieren ein Modellaxonem aus drei semiflexiblen Polymerstäben die zu einer kranähnlichen Struktur zusammengefasst sind. Mithilfe einer mesoskopischen Simulationsmethode, genannt Multi-Particle Collision Dynamics (MPCD), werden hydrodynamische Wechselwirkungen berücksichtigt. Im Zuge dieser Arbeit wird MPCD zum ersten Mal erfolgreich auf aktive biologische Systeme angewandt. In Simulationen von Spermien wird die Axonemstruktur chiral um einen Kopf ergänzt. Es zeigt sich, dass die Schwimmtrajektorie des Spermiums stark vom Grad der Chiralität abhängt. In freier Flüssigkeit finden wir einen dynamischen Übergang der Trajektorie zwischen einer ausgeprägten Helix und einer fast geradlinigen Bewegung. In der Nähe einer Wand können wir sowohl die Adhäsion an der Grenzfläche, als auch die orientierte kreisförmige Bewegung reproduzieren. Die Ursache für die Adhäsion an der Wand findet sich interessanterweise in der Abstossung des Flagellums von der Wand. Kreisförmige Bewegung und Richtung werden hingegen von der Chiralität des Spermiums bestimmt. Zur Untersuchung der Ziliendynamik wird ein Gitter von typischerweise 20 mal 20 Zilien betrachtet, in dem Axonemstrukturen senkrecht auf einer Wand verankert werden. Das Schlagmuster der Zilien wird der biologischen Situation nachempfunden. Dabei ist entscheidend, dass das Schlagmuster duch äussere Einflüsse modifiziert werden kann, so dass die Entstehung einer Metachronal Wave durch Synchronisation verschiedener Zilien ermöglicht wird. Zum ersten Mal sind wir in der Lage, die Metachronal Wave auf einer ausgedehnten Fläche unabhängig schlagender Zilien in Simulationen zu beobachten. Es zeigt sich, dass die Metachronal Wave gravierende Auswirkungen auf Transportgeschwindigkeit und Effizienz hat. Die durchschnittliche Geschwindigkeit der Flüssigkeit steigt durch die Metachronal Wave um bis zu einem Faktor 3.2 im Vergleich zu einem gleichartigen, synchron schlagenden System. Da gleichzeitig die Leistungsaufnahme sinkt, steigt zudem die Effizienz um bis zu einer Grössenordnung. Weiterhin charakterisieren wir Transport und Welleneigenschaften als Funktionen der Schlagrichtung, dem Zilienabstand und der Viskosität der Flüssigkeit. Wir sind überzeugt, dass sowohl die Effizienz als auch im besonderen die Transportgeschwindigkeit entscheidend sind für die Fitness der Zelle. Die Metachronal wave ist daher von grosser funktionaler Bedeutung für Zellen mit Zilien

    Spatial regression in large datasets: problem set solution

    Get PDF
    In this dissertation we investigate a possible attempt to combine the Data Mining methods and traditional Spatial Autoregressive models, in the context of large spatial datasets. We start to considere the numerical difficulties to handle massive datasets by the usual approach based on Maximum Likelihood estimation for spatial models and Spatial Two-Stage Least Squares. So, we conduct an experiment by Monte Carlo simulations to compare the accuracy and computational complexity for decomposition and approximation techniques to solve the problem of computing the Jacobian in spatial models, for various regular lattice structures. In particular, we consider one of the most common spatial econometric models: spatial lag (or SAR, spatial autoregressive model). Also, we provide new evidences in the literature, by examining the double effect on computational complexity of these methods: the influence of "size effect" and "sparsity effect". To overcome this computational problem, we propose a data mining methodology as CART (Classification and Regression Tree) that explicitly considers the phenomenon of spatial autocorrelation on pseudo-residuals, in order to remove this effect and to improve the accuracy, with significant saving in computational complexity in wide range of spatial datasets: realand simulated data

    Wavelet methods in speech recognition

    Get PDF
    In this thesis, novel wavelet techniques are developed to improve parametrization of speech signals prior to classification. It is shown that non-linear operations carried out in the wavelet domain improve the performance of a speech classifier and consistently outperform classical Fourier methods. This is because of the localised nature of the wavelet, which captures correspondingly well-localised time-frequency features within the speech signal. Furthermore, by taking advantage of the approximation ability of wavelets, efficient representation of the non-stationarity inherent in speech can be achieved in a relatively small number of expansion coefficients. This is an attractive option when faced with the so-called 'Curse of Dimensionality' problem of multivariate classifiers such as Linear Discriminant Analysis (LDA) or Artificial Neural Networks (ANNs). Conventional time-frequency analysis methods such as the Discrete Fourier Transform either miss irregular signal structures and transients due to spectral smearing or require a large number of coefficients to represent such characteristics efficiently. Wavelet theory offers an alternative insight in the representation of these types of signals. As an extension to the standard wavelet transform, adaptive libraries of wavelet and cosine packets are introduced which increase the flexibility of the transform. This approach is observed to be yet more suitable for the highly variable nature of speech signals in that it results in a time-frequency sampled grid that is well adapted to irregularities and transients. They result in a corresponding reduction in the misclassification rate of the recognition system. However, this is necessarily at the expense of added computing time. Finally, a framework based on adaptive time-frequency libraries is developed which invokes the final classifier to choose the nature of the resolution for a given classification problem. The classifier then performs dimensionaIity reduction on the transformed signal by choosing the top few features based on their discriminant power. This approach is compared and contrasted to an existing discriminant wavelet feature extractor. The overall conclusions of the thesis are that wavelets and their relatives are capable of extracting useful features for speech classification problems. The use of adaptive wavelet transforms provides the flexibility within which powerful feature extractors can be designed for these types of application

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition
    corecore