118 research outputs found

    Frequency-warped autoregressive modeling and filtering

    Get PDF
    This thesis consists of an introduction and nine articles. The articles are related to the application of frequency-warping techniques to audio signal processing, and in particular, predictive coding of wideband audio signals. The introduction reviews the literature and summarizes the results of the articles. Frequency-warping, or simply warping techniques are based on a modification of a conventional signal processing system so that the inherent frequency representation in the system is changed. It is demonstrated that this may be done for basically all traditional signal processing algorithms. In audio applications it is beneficial to modify the system so that the new frequency representation is close to that of human hearing. One of the articles is a tutorial paper on the use of warping techniques in audio applications. Majority of the articles studies warped linear prediction, WLP, and its use in wideband audio coding. It is proposed that warped linear prediction would be particularly attractive method for low-delay wideband audio coding. Warping techniques are also applied to various modifications of classical linear predictive coding techniques. This was made possible partly by the introduction of a class of new implementation techniques for recursive filters in one of the articles. The proposed implementation algorithm for recursive filters having delay-free loops is a generic technique. This inspired to write an article which introduces a generalized warped linear predictive coding scheme. One example of the generalized approach is a linear predictive algorithm using almost logarithmic frequency representation.reviewe

    Generalized linear-in-parameter models : theory and audio signal processing applications

    Get PDF
    This thesis presents a mathematically oriented perspective to some basic concepts of digital signal processing. A general framework for the development of alternative signal and system representations is attained by defining a generalized linear-in-parameter model (GLM) configuration. The GLM provides a direct view into the origins of many familiar methods in signal processing, implying a variety of generalizations, and it serves as a natural introduction to rational orthonormal model structures. In particular, the conventional division between finite impulse response (FIR) and infinite impulse response (IIR) filtering methods is reconsidered. The latter part of the thesis consists of audio oriented case studies, including loudspeaker equalization, musical instrument body modeling, and room response modeling. The proposed collection of IIR filter design techniques is submitted to challenging modeling tasks. The most important practical contribution of this thesis is the introduction of a procedure for the optimization of rational orthonormal filter structures, called the BU-method. More generally, the BU-method and its variants, including the (complex) warped extension, the (C)WBU-method, can be consider as entirely new IIR filter design strategies.reviewe

    Deep Burst Denoising

    Full text link
    Noise is an inherent issue of low-light image capture, one which is exacerbated on mobile devices due to their narrow apertures and small sensors. One strategy for mitigating noise in a low-light situation is to increase the shutter time of the camera, thus allowing each photosite to integrate more light and decrease noise variance. However, there are two downsides of long exposures: (a) bright regions can exceed the sensor range, and (b) camera and scene motion will result in blurred images. Another way of gathering more light is to capture multiple short (thus noisy) frames in a "burst" and intelligently integrate the content, thus avoiding the above downsides. In this paper, we use the burst-capture strategy and implement the intelligent integration via a recurrent fully convolutional deep neural net (CNN). We build our novel, multiframe architecture to be a simple addition to any single frame denoising model, and design to handle an arbitrary number of noisy input frames. We show that it achieves state of the art denoising results on our burst dataset, improving on the best published multi-frame techniques, such as VBM4D and FlexISP. Finally, we explore other applications of image enhancement by integrating content from multiple frames and demonstrate that our DNN architecture generalizes well to image super-resolution

    Action Recognition in Videos: from Motion Capture Labs to the Web

    Full text link
    This paper presents a survey of human action recognition approaches based on visual data recorded from a single video camera. We propose an organizing framework which puts in evidence the evolution of the area, with techniques moving from heavily constrained motion capture scenarios towards more challenging, realistic, "in the wild" videos. The proposed organization is based on the representation used as input for the recognition task, emphasizing the hypothesis assumed and thus, the constraints imposed on the type of video that each technique is able to address. Expliciting the hypothesis and constraints makes the framework particularly useful to select a method, given an application. Another advantage of the proposed organization is that it allows categorizing newest approaches seamlessly with traditional ones, while providing an insightful perspective of the evolution of the action recognition task up to now. That perspective is the basis for the discussion in the end of the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4 table

    Alternativas Ă  modelagem de funçÔes de transferĂȘncia de ambientes

    Get PDF
    Dissertação [mestrado) - Universidade Federal de Santa Catarina. Centro TecnolĂłgico. Programa de PĂłs-Graduação em Engenharia ElĂ©trica.Esta dissertação trata da modelagem de funçÔes de transferĂȘncia de ambientes (RTF) utilizando modelos auto-regressivo e mĂ©dia mĂłvel (ARMA). A representação de tais funçÔes atravĂ©s de sistemas digitais Ă© decisiva em aplicaçÔes visando Ă  correção de problemas do ambiente acĂșstico como tambĂ©m na sĂ­ntese de fenĂŽmenos acĂșsticos associados. O problema da estimação baseada em mĂ­nimos quadrados dos coeficientes de modelos ARMA Ă© tratado com a introdução do mĂ©todo de Brandenstein e Unbehauen, denominado LSBU. Como alternativa Ă  implementação do modelo estimado por tal mĂ©todo Ă© introduzido o filtro de Kautz. Considerando o comprimento elevado da resposta ao impulso do ambiente, sĂŁo utilizadas estratĂ©gias de decomposição (polifĂĄsica e transformada wavelet) para viabilizar sua aproximação. O trabalho culmina com a modelagem de mĂșltiplas RFTs atravĂ©s do modelo de zeros e pĂłlos acĂșsticos comuns (CAPZ). PropĂ”e-se entĂŁo uma estratĂ©gia de determinação dos pĂłlos comuns com garantia de estabilidade, denominada matriz de singularidades quantizadas (MSQ). Aliada a MSQ, a viabilidade da utilização do mĂ©todo LSBU para a estimação de seus coeficientes Ă© investigada. Uma comparação dos resultados obtidos atravĂ©s das tĂ©cnicas aqui propostas com os fornecidos pelo modelo CAPZ, originalmente proposto na literatura, e os obtidos baseado no mĂ©todo de mĂ­nimos quadrados de Shanks associado ao algoritmo de agrupamento c-means mostram a aplicabilidade das estratĂ©gias propostas. This work deals with the modeling of room transfer functions (RTFs) by using autoregressive moving average (ARMA) models. The modeling of such functions is critical in applications of both room equalization and sound field simulation. The least squares estimation to obtain the ARMA model coefficients is attained through the Brandestein and Unbehauen (LSBU) algorithm. As an alternative to implement the required modeling, the Kautz filter is introduced. Due to the large length of room impulse responses, such functions are decomposed by using either wavelet or polyphase filters, aiming to approximate them by an ARMA model. The research work provides a common acoustic pole/zero (CAPZ) modeling of multiple RTFs. A novel approach to obtain the common acoustic poles of a room, which assures the stability of the estimated CAPZ model, termed singularity matrix (MSQ), is proposed. Besides the MSQ procedure, the parameter estimation of such a model by using the LSBU method is presented. The obtained results by considering the proposed approaches are similar to those obtained with both the original CAPZ model and a more recent one, the latter based on both the Shanks method and the c-means algorithm, corroborating the applicability of the proposed strategies

    Advanced methods and deep learning for video and satellite data compression

    Get PDF
    L'abstract Ăš presente nell'allegato / the abstract is in the attachmen

    Understanding Video Transformers for Segmentation: A Survey of Application and Interpretability

    Full text link
    Video segmentation encompasses a wide range of categories of problem formulation, e.g., object, scene, actor-action and multimodal video segmentation, for delineating task-specific scene components with pixel-level masks. Recently, approaches in this research area shifted from concentrating on ConvNet-based to transformer-based models. In addition, various interpretability approaches have appeared for transformer models and video temporal dynamics, motivated by the growing interest in basic scientific understanding, model diagnostics and societal implications of real-world deployment. Previous surveys mainly focused on ConvNet models on a subset of video segmentation tasks or transformers for classification tasks. Moreover, component-wise discussion of transformer-based video segmentation models has not yet received due focus. In addition, previous reviews of interpretability methods focused on transformers for classification, while analysis of video temporal dynamics modelling capabilities of video models received less attention. In this survey, we address the above with a thorough discussion of various categories of video segmentation, a component-wise discussion of the state-of-the-art transformer-based models, and a review of related interpretability methods. We first present an introduction to the different video segmentation task categories, their objectives, specific challenges and benchmark datasets. Next, we provide a component-wise review of recent transformer-based models and document the state of the art on different video segmentation tasks. Subsequently, we discuss post-hoc and ante-hoc interpretability methods for transformer models and interpretability methods for understanding the role of the temporal dimension in video models. Finally, we conclude our discussion with future research directions

    Photo-realistic face synthesis and reenactment with deep generative models

    Get PDF
    The advent of Deep Learning has led to numerous breakthroughs in the field of Computer Vision. Over the last decade, a significant amount of research has been undertaken towards designing neural networks for visual data analysis. At the same time, rapid advancements have been made towards the direction of deep generative modeling, especially after the introduction of Generative Adversarial Networks (GANs), which have shown particularly promising results when it comes to synthesising visual data. Since then, considerable attention has been devoted to the problem of photo-realistic human face animation due to its wide range of applications, including image and video editing, virtual assistance, social media, teleconferencing, and augmented reality. The objective of this thesis is to make progress towards generating photo-realistic videos of human faces. To that end, we propose novel generative algorithms that provide explicit control over the facial expression and head pose of synthesised subjects. Despite the major advances in face reenactment and motion transfer, current methods struggle to generate video portraits that are indistinguishable from real data. In this work, we aim to overcome the limitations of existing approaches, by combining concepts from deep generative networks and video-to-video translation with 3D face modelling, and more specifically by capitalising on prior knowledge of faces that is enclosed within statistical models such as 3D Morphable Models (3DMMs). In the first part of this thesis, we introduce a person-specific system that performs full head reenactment using ideas from video-to-video translation. Subsequently, we propose a novel approach to controllable video portrait synthesis, inspired from Implicit Neural Representations (INR). In the second part of the thesis, we focus on person-agnostic methods and present a GAN-based framework that performs video portrait reconstruction, full head reenactment, expression editing, novel pose synthesis and face frontalisation.Open Acces

    Uses of uncalibrated images to enrich 3D models information

    Get PDF
    The decrease in costs of semi-professional digital cameras has led to the possibility for everyone to acquire a very detailed description of a scene in a very short time. Unfortunately, the interpretation of the images is usually quite hard, due to the amount of data and the lack of robust and generic image analysis methods. Nevertheless, if a geometric description of the depicted scene is available, it gets much easier to extract information from 2D data. This information can be used to enrich the quality of the 3D data in several ways. In this thesis, several uses of sets of unregistered images for the enrichment of 3D models are shown. In particular, two possible fields of application are presented: the color acquisition, projection and visualization and the geometry modification. Regarding color management, several practical and cheap solutions to overcome the main issues in this field are presented. Moreover, some real applications, mainly related to Cultural Heritage, show that provided methods are robust and effective. In the context of geometry modification, two approaches are presented to modify already existing 3D models. In the first one, information extracted from images is used to deform a dummy model to obtain accurate 3D head models, used for simulation in the context of three-dimensional audio rendering. The second approach presents a method to fill holes in 3D models, with the use of registered images depicting a pattern projected on the real object. Finally, some useful indications about the possible future work in all the presented fields are given, in order to delineate the developments of this promising direction of research
    • 

    corecore