118 research outputs found
Frequency-warped autoregressive modeling and filtering
This thesis consists of an introduction and nine articles. The articles are related to the application of frequency-warping techniques to audio signal processing, and in particular, predictive coding of wideband audio signals. The introduction reviews the literature and summarizes the results of the articles.
Frequency-warping, or simply warping techniques are based on a modification of a conventional signal processing system so that the inherent frequency representation in the system is changed. It is demonstrated that this may be done for basically all traditional signal processing algorithms. In audio applications it is beneficial to modify the system so that the new frequency representation is close to that of human hearing. One of the articles is a tutorial paper on the use of warping techniques in audio applications.
Majority of the articles studies warped linear prediction, WLP, and its use in wideband audio coding. It is proposed that warped linear prediction would be particularly attractive method for low-delay wideband audio coding. Warping techniques are also applied to various modifications of classical linear predictive coding techniques. This was made possible partly by the introduction of a class of new implementation techniques for recursive filters in one of the articles. The proposed implementation algorithm for recursive filters having delay-free loops is a generic technique. This inspired to write an article which introduces a generalized warped linear predictive coding scheme. One example of the generalized approach is a linear predictive algorithm using almost logarithmic frequency representation.reviewe
Generalized linear-in-parameter models : theory and audio signal processing applications
This thesis presents a mathematically oriented perspective to some basic concepts of digital signal processing. A general framework for the development of alternative signal and system representations is attained by defining a generalized linear-in-parameter model (GLM) configuration. The GLM provides a direct view into the origins of many familiar methods in signal processing, implying a variety of generalizations, and it serves as a natural introduction to rational orthonormal model structures. In particular, the conventional division between finite impulse response (FIR) and infinite impulse response (IIR) filtering methods is reconsidered. The latter part of the thesis consists of audio oriented case studies, including loudspeaker equalization, musical instrument body modeling, and room response modeling. The proposed collection of IIR filter design techniques is submitted to challenging modeling tasks. The most important practical contribution of this thesis is the introduction of a procedure for the optimization of rational orthonormal filter structures, called the BU-method. More generally, the BU-method and its variants, including the (complex) warped extension, the (C)WBU-method, can be consider as entirely new IIR filter design strategies.reviewe
Deep Burst Denoising
Noise is an inherent issue of low-light image capture, one which is
exacerbated on mobile devices due to their narrow apertures and small sensors.
One strategy for mitigating noise in a low-light situation is to increase the
shutter time of the camera, thus allowing each photosite to integrate more
light and decrease noise variance. However, there are two downsides of long
exposures: (a) bright regions can exceed the sensor range, and (b) camera and
scene motion will result in blurred images. Another way of gathering more light
is to capture multiple short (thus noisy) frames in a "burst" and intelligently
integrate the content, thus avoiding the above downsides. In this paper, we use
the burst-capture strategy and implement the intelligent integration via a
recurrent fully convolutional deep neural net (CNN). We build our novel,
multiframe architecture to be a simple addition to any single frame denoising
model, and design to handle an arbitrary number of noisy input frames. We show
that it achieves state of the art denoising results on our burst dataset,
improving on the best published multi-frame techniques, such as VBM4D and
FlexISP. Finally, we explore other applications of image enhancement by
integrating content from multiple frames and demonstrate that our DNN
architecture generalizes well to image super-resolution
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
Alternativas Ă modelagem de funçÔes de transferĂȘncia de ambientes
Dissertação [mestrado) - Universidade Federal de Santa Catarina. Centro TecnolĂłgico. Programa de PĂłs-Graduação em Engenharia ElĂ©trica.Esta dissertação trata da modelagem de funçÔes de transferĂȘncia de ambientes (RTF) utilizando modelos auto-regressivo e mĂ©dia mĂłvel (ARMA). A representação de tais funçÔes atravĂ©s de sistemas digitais Ă© decisiva em aplicaçÔes visando Ă correção de problemas do ambiente acĂșstico como tambĂ©m na sĂntese de fenĂŽmenos acĂșsticos associados. O problema da estimação baseada em mĂnimos quadrados dos coeficientes de modelos ARMA Ă© tratado com a introdução do mĂ©todo de Brandenstein e Unbehauen, denominado LSBU. Como alternativa Ă implementação do modelo estimado por tal mĂ©todo Ă© introduzido o filtro de Kautz. Considerando o comprimento elevado da resposta ao impulso do ambiente, sĂŁo utilizadas estratĂ©gias de decomposição (polifĂĄsica e transformada wavelet) para viabilizar sua aproximação. O trabalho culmina com a modelagem de mĂșltiplas RFTs atravĂ©s do modelo de zeros e pĂłlos acĂșsticos comuns (CAPZ). PropĂ”e-se entĂŁo uma estratĂ©gia de determinação dos pĂłlos comuns com garantia de estabilidade, denominada matriz de singularidades quantizadas (MSQ). Aliada a MSQ, a viabilidade da utilização do mĂ©todo LSBU para a estimação de seus coeficientes Ă© investigada. Uma comparação dos resultados obtidos atravĂ©s das tĂ©cnicas aqui propostas com os fornecidos pelo modelo CAPZ, originalmente proposto na literatura, e os obtidos baseado no mĂ©todo de mĂnimos quadrados de Shanks associado ao algoritmo de agrupamento c-means mostram a aplicabilidade das estratĂ©gias propostas. This work deals with the modeling of room transfer functions (RTFs) by using autoregressive moving average (ARMA) models. The modeling of such functions is critical in applications of both room equalization and sound field simulation. The least squares estimation to obtain the ARMA model coefficients is attained through the Brandestein and Unbehauen (LSBU) algorithm. As an alternative to implement the required modeling, the Kautz filter is introduced. Due to the large length of room impulse responses, such functions are decomposed by using either wavelet or polyphase filters, aiming to approximate them by an ARMA model. The research work provides a common acoustic pole/zero (CAPZ) modeling of multiple RTFs. A novel approach to obtain the common acoustic poles of a room, which assures the stability of the estimated CAPZ model, termed singularity matrix (MSQ), is proposed. Besides the MSQ procedure, the parameter estimation of such a model by using the LSBU method is presented. The obtained results by considering the proposed approaches are similar to those obtained with both the original CAPZ model and a more recent one, the latter based on both the Shanks method and the c-means algorithm, corroborating the applicability of the proposed strategies
Advanced methods and deep learning for video and satellite data compression
L'abstract Ăš presente nell'allegato / the abstract is in the attachmen
Understanding Video Transformers for Segmentation: A Survey of Application and Interpretability
Video segmentation encompasses a wide range of categories of problem
formulation, e.g., object, scene, actor-action and multimodal video
segmentation, for delineating task-specific scene components with pixel-level
masks. Recently, approaches in this research area shifted from concentrating on
ConvNet-based to transformer-based models. In addition, various
interpretability approaches have appeared for transformer models and video
temporal dynamics, motivated by the growing interest in basic scientific
understanding, model diagnostics and societal implications of real-world
deployment. Previous surveys mainly focused on ConvNet models on a subset of
video segmentation tasks or transformers for classification tasks. Moreover,
component-wise discussion of transformer-based video segmentation models has
not yet received due focus. In addition, previous reviews of interpretability
methods focused on transformers for classification, while analysis of video
temporal dynamics modelling capabilities of video models received less
attention. In this survey, we address the above with a thorough discussion of
various categories of video segmentation, a component-wise discussion of the
state-of-the-art transformer-based models, and a review of related
interpretability methods. We first present an introduction to the different
video segmentation task categories, their objectives, specific challenges and
benchmark datasets. Next, we provide a component-wise review of recent
transformer-based models and document the state of the art on different video
segmentation tasks. Subsequently, we discuss post-hoc and ante-hoc
interpretability methods for transformer models and interpretability methods
for understanding the role of the temporal dimension in video models. Finally,
we conclude our discussion with future research directions
Photo-realistic face synthesis and reenactment with deep generative models
The advent of Deep Learning has led to numerous breakthroughs in the field of Computer Vision. Over the last decade, a significant amount of research has been undertaken towards designing neural networks for visual data analysis. At the same time, rapid advancements have been made towards the direction of deep generative modeling, especially after the introduction of Generative Adversarial Networks (GANs), which have shown particularly promising results when it comes to synthesising visual data. Since then, considerable attention has been devoted to the problem of photo-realistic human face animation due to its wide range of applications, including image and video editing, virtual assistance, social media, teleconferencing, and augmented reality. The objective of this thesis is to make progress towards generating photo-realistic videos of human faces. To that end, we propose novel generative algorithms that provide explicit control over the facial expression and head pose of synthesised subjects. Despite the major advances in face reenactment and motion transfer, current methods struggle to generate video portraits that are indistinguishable from real data. In this work, we aim to overcome the limitations of existing approaches, by combining concepts from deep generative networks and video-to-video translation with 3D face modelling, and more specifically by capitalising on prior knowledge of faces that is enclosed within statistical models such as 3D Morphable Models (3DMMs). In the first part of this thesis, we introduce a person-specific system that performs full head reenactment using ideas from video-to-video translation. Subsequently, we propose a novel approach to controllable video portrait synthesis, inspired from Implicit Neural Representations (INR). In the second part of the thesis, we focus on person-agnostic methods and present a GAN-based framework that performs video portrait reconstruction, full head reenactment, expression editing, novel pose synthesis and face frontalisation.Open Acces
Uses of uncalibrated images to enrich 3D models information
The decrease in costs of semi-professional digital cameras has led to the possibility
for everyone to acquire a very detailed description of a scene in a very short time.
Unfortunately, the interpretation of the images is usually quite hard, due to the amount
of data and the lack of robust and generic image analysis methods. Nevertheless, if a
geometric description of the depicted scene is available, it gets much easier to extract
information from 2D data.
This information can be used to enrich the quality of the 3D data in several ways.
In this thesis, several uses of sets of unregistered images for the enrichment of 3D
models are shown.
In particular, two possible fields of application are presented: the color acquisition,
projection and visualization and the geometry modification.
Regarding color management, several practical and cheap solutions to overcome the
main issues in this field are presented. Moreover, some real applications, mainly related
to Cultural Heritage, show that provided methods are robust and effective.
In the context of geometry modification, two approaches are presented to modify already
existing 3D models. In the first one, information extracted from images is used
to deform a dummy model to obtain accurate 3D head models, used for simulation
in the context of three-dimensional audio rendering. The second approach presents
a method to fill holes in 3D models, with the use of registered images depicting a
pattern projected on the real object.
Finally, some useful indications about the possible future work in all the presented
fields are given, in order to delineate the developments of this promising direction of
research
- âŠ