Search CORE

118 research outputs found

A scalable algorithm for physically motivated and sparse approximation of room impulse responses with orthonormal basis functions

Author: Catrysse Michael
De Sena Enzo
Jensen Soren Holdt
Moonen Marc
Vairetti Giacomo
van Waterschoot Toon
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2017
Field of study

Crossref

VBN

Frequency-warped autoregressive modeling and filtering

Author: Härmä Aki
Publication venue: Teknillinen korkeakoulu
Publication date: 25/05/2001
Field of study

This thesis consists of an introduction and nine articles. The articles are related to the application of frequency-warping techniques to audio signal processing, and in particular, predictive coding of wideband audio signals. The introduction reviews the literature and summarizes the results of the articles. Frequency-warping, or simply warping techniques are based on a modification of a conventional signal processing system so that the inherent frequency representation in the system is changed. It is demonstrated that this may be done for basically all traditional signal processing algorithms. In audio applications it is beneficial to modify the system so that the new frequency representation is close to that of human hearing. One of the articles is a tutorial paper on the use of warping techniques in audio applications. Majority of the articles studies warped linear prediction, WLP, and its use in wideband audio coding. It is proposed that warped linear prediction would be particularly attractive method for low-delay wideband audio coding. Warping techniques are also applied to various modifications of classical linear predictive coding techniques. This was made possible partly by the introduction of a class of new implementation techniques for recursive filters in one of the articles. The proposed implementation algorithm for recursive filters having delay-free loops is a generic technique. This inspired to write an article which introduces a generalized warped linear predictive coding scheme. One example of the generalized approach is a linear predictive algorithm using almost logarithmic frequency representation.reviewe

Maastricht University Research Portal

Aaltodoc Publication Archive

Generalized linear-in-parameter models : theory and audio signal processing applications

Author: Paatero Tuomas
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2005
Field of study

This thesis presents a mathematically oriented perspective to some basic concepts of digital signal processing. A general framework for the development of alternative signal and system representations is attained by defining a generalized linear-in-parameter model (GLM) configuration. The GLM provides a direct view into the origins of many familiar methods in signal processing, implying a variety of generalizations, and it serves as a natural introduction to rational orthonormal model structures. In particular, the conventional division between finite impulse response (FIR) and infinite impulse response (IIR) filtering methods is reconsidered. The latter part of the thesis consists of audio oriented case studies, including loudspeaker equalization, musical instrument body modeling, and room response modeling. The proposed collection of IIR filter design techniques is submitted to challenging modeling tasks. The most important practical contribution of this thesis is the introduction of a procedure for the optimization of rational orthonormal filter structures, called the BU-method. More generally, the BU-method and its variants, including the (complex) warped extension, the (C)WBU-method, can be consider as entirely new IIR filter design strategies.reviewe

CiteSeerX

Aaltodoc Publication Archive

Deep Burst Denoising

Author: A Foi
C Dong
C Dong
Chih-Yuan Yang
CRA Chaitanya
E Shelhamer
F Heide
F Heide
H Zhao
J Yang
JJ Hopfield
K Dabov
K Nasrollahi
K Zhang
M Everingham
M Gharbi
M Maggioni
O Ronneberger
PJ Werbos
S Farsiu
SW Hasinoff
Y Chen
Z Liu
Publication venue
Publication date: 15/12/2017
Field of study

Noise is an inherent issue of low-light image capture, one which is exacerbated on mobile devices due to their narrow apertures and small sensors. One strategy for mitigating noise in a low-light situation is to increase the shutter time of the camera, thus allowing each photosite to integrate more light and decrease noise variance. However, there are two downsides of long exposures: (a) bright regions can exceed the sensor range, and (b) camera and scene motion will result in blurred images. Another way of gathering more light is to capture multiple short (thus noisy) frames in a "burst" and intelligently integrate the content, thus avoiding the above downsides. In this paper, we use the burst-capture strategy and implement the intelligent integration via a recurrent fully convolutional deep neural net (CNN). We build our novel, multiframe architecture to be a simple addition to any single frame denoising model, and design to handle an arbitrary number of noisy input frames. We show that it achieves state of the art denoising results on our burst dataset, improving on the best published multi-frame techniques, such as VBM4D and FlexISP. Finally, we explore other applications of image enhancement by integrating content from multiple frames and demonstrate that our DNN architecture generalizes well to image super-resolution

arXiv.org e-Print Archive

Crossref

Action Recognition in Videos: from Motion Capture Labs to the Web

Author: Ana Paula Br
Arnaldo Albuquerque De Araújo
De Almeida
Eduardo Alves
Jussara Marques
Publication venue
Publication date: 17/06/2010
Field of study

This paper presents a survey of human action recognition approaches based on visual data recorded from a single video camera. We propose an organizing framework which puts in evidence the evolution of the area, with techniques moving from heavily constrained motion capture scenarios towards more challenging, realistic, "in the wild" videos. The proposed organization is based on the representation used as input for the recognition task, emphasizing the hypothesis assumed and thus, the constraints imposed on the type of video that each technique is able to address. Expliciting the hypothesis and constraints makes the framework particularly useful to select a method, given an application. Another advantage of the proposed organization is that it allows categorizing newest approaches seamlessly with traditional ones, while providing an insightful perspective of the evolution of the action recognition task up to now. That perspective is the basis for the discussion in the end of the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4 table

arXiv.org e-Print Archive

CiteSeerX

Alternativas à modelagem de funções de transferência de ambientes

Author: Ferreira Cristiano Oliveira
Publication venue: Florianópolis, SC
Publication date: 01/01/2008
Field of study

Dissertação [mestrado) - Universidade Federal de Santa Catarina. Centro Tecnológico. Programa de Pós-Graduação em Engenharia Elétrica.Esta dissertação trata da modelagem de funções de transferência de ambientes (RTF) utilizando modelos auto-regressivo e média móvel (ARMA). A representação de tais funções através de sistemas digitais é decisiva em aplicações visando à correção de problemas do ambiente acústico como também na síntese de fenômenos acústicos associados. O problema da estimação baseada em mínimos quadrados dos coeficientes de modelos ARMA é tratado com a introdução do método de Brandenstein e Unbehauen, denominado LSBU. Como alternativa à implementação do modelo estimado por tal método é introduzido o filtro de Kautz. Considerando o comprimento elevado da resposta ao impulso do ambiente, são utilizadas estratégias de decomposição (polifásica e transformada wavelet) para viabilizar sua aproximação. O trabalho culmina com a modelagem de múltiplas RFTs através do modelo de zeros e pólos acústicos comuns (CAPZ). Propõe-se então uma estratégia de determinação dos pólos comuns com garantia de estabilidade, denominada matriz de singularidades quantizadas (MSQ). Aliada a MSQ, a viabilidade da utilização do método LSBU para a estimação de seus coeficientes é investigada. Uma comparação dos resultados obtidos através das técnicas aqui propostas com os fornecidos pelo modelo CAPZ, originalmente proposto na literatura, e os obtidos baseado no método de mínimos quadrados de Shanks associado ao algoritmo de agrupamento c-means mostram a aplicabilidade das estratégias propostas. This work deals with the modeling of room transfer functions (RTFs) by using autoregressive moving average (ARMA) models. The modeling of such functions is critical in applications of both room equalization and sound field simulation. The least squares estimation to obtain the ARMA model coefficients is attained through the Brandestein and Unbehauen (LSBU) algorithm. As an alternative to implement the required modeling, the Kautz filter is introduced. Due to the large length of room impulse responses, such functions are decomposed by using either wavelet or polyphase filters, aiming to approximate them by an ARMA model. The research work provides a common acoustic pole/zero (CAPZ) modeling of multiple RTFs. A novel approach to obtain the common acoustic poles of a room, which assures the stability of the estimated CAPZ model, termed singularity matrix (MSQ), is proposed. Besides the MSQ procedure, the parameter estimation of such a model by using the LSBU method is presented. The obtained results by considering the proposed approaches are similar to those obtained with both the original CAPZ model and a more recent one, the latter based on both the Shanks method and the c-means algorithm, corroborating the applicability of the proposed strategies

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositório Institucional da UFSC

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Advanced methods and deep learning for video and satellite data compression

Author: PRETTE NICOLA
Publication venue: country:Italy
Publication date: 19/09/2022
Field of study

L'abstract è presente nell'allegato / the abstract is in the attachmen

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Understanding Video Transformers for Segmentation: A Survey of Application and Interpretability

Author: Karim Rezaul
Wildes Richard P.
Publication venue
Publication date: 18/10/2023
Field of study

Video segmentation encompasses a wide range of categories of problem formulation, e.g., object, scene, actor-action and multimodal video segmentation, for delineating task-specific scene components with pixel-level masks. Recently, approaches in this research area shifted from concentrating on ConvNet-based to transformer-based models. In addition, various interpretability approaches have appeared for transformer models and video temporal dynamics, motivated by the growing interest in basic scientific understanding, model diagnostics and societal implications of real-world deployment. Previous surveys mainly focused on ConvNet models on a subset of video segmentation tasks or transformers for classification tasks. Moreover, component-wise discussion of transformer-based video segmentation models has not yet received due focus. In addition, previous reviews of interpretability methods focused on transformers for classification, while analysis of video temporal dynamics modelling capabilities of video models received less attention. In this survey, we address the above with a thorough discussion of various categories of video segmentation, a component-wise discussion of the state-of-the-art transformer-based models, and a review of related interpretability methods. We first present an introduction to the different video segmentation task categories, their objectives, specific challenges and benchmark datasets. Next, we provide a component-wise review of recent transformer-based models and document the state of the art on different video segmentation tasks. Subsequently, we discuss post-hoc and ante-hoc interpretability methods for transformer models and interpretability methods for understanding the role of the temporal dimension in video models. Finally, we conclude our discussion with future research directions

arXiv.org e-Print Archive

Photo-realistic face synthesis and reenactment with deep generative models

Author: Doukas Michail Christos
Publication venue: Computing, Imperial College London
Publication date: 01/06/2023
Field of study

The advent of Deep Learning has led to numerous breakthroughs in the field of Computer Vision. Over the last decade, a significant amount of research has been undertaken towards designing neural networks for visual data analysis. At the same time, rapid advancements have been made towards the direction of deep generative modeling, especially after the introduction of Generative Adversarial Networks (GANs), which have shown particularly promising results when it comes to synthesising visual data. Since then, considerable attention has been devoted to the problem of photo-realistic human face animation due to its wide range of applications, including image and video editing, virtual assistance, social media, teleconferencing, and augmented reality. The objective of this thesis is to make progress towards generating photo-realistic videos of human faces. To that end, we propose novel generative algorithms that provide explicit control over the facial expression and head pose of synthesised subjects. Despite the major advances in face reenactment and motion transfer, current methods struggle to generate video portraits that are indistinguishable from real data. In this work, we aim to overcome the limitations of existing approaches, by combining concepts from deep generative networks and video-to-video translation with 3D face modelling, and more specifically by capitalising on prior knowledge of faces that is enclosed within statistical models such as 3D Morphable Models (3DMMs). In the first part of this thesis, we introduce a person-specific system that performs full head reenactment using ideas from video-to-video translation. Subsequently, we propose a novel approach to controllable video portrait synthesis, inspired from Implicit Neural Representations (INR). In the second part of the thesis, we focus on person-agnostic methods and present a GAN-based framework that performs video portrait reconstruction, full head reenactment, expression editing, novel pose synthesis and face frontalisation.Open Acces

Spiral - Imperial College Digital Repository

Uses of uncalibrated images to enrich 3D models information

Author: DELLEPIANE MATTEO
Publication venue: 'Pisa University Press'
Publication date: 18/04/2009
Field of study

The decrease in costs of semi-professional digital cameras has led to the possibility for everyone to acquire a very detailed description of a scene in a very short time. Unfortunately, the interpretation of the images is usually quite hard, due to the amount of data and the lack of robust and generic image analysis methods. Nevertheless, if a geometric description of the depicted scene is available, it gets much easier to extract information from 2D data. This information can be used to enrich the quality of the 3D data in several ways. In this thesis, several uses of sets of unregistered images for the enrichment of 3D models are shown. In particular, two possible fields of application are presented: the color acquisition, projection and visualization and the geometry modification. Regarding color management, several practical and cheap solutions to overcome the main issues in this field are presented. Moreover, some real applications, mainly related to Cultural Heritage, show that provided methods are robust and effective. In the context of geometry modification, two approaches are presented to modify already existing 3D models. In the first one, information extracted from images is used to deform a dummy model to obtain accurate 3D head models, used for simulation in the context of three-dimensional audio rendering. The second approach presents a method to fill holes in 3D models, with the use of registered images depicting a pattern projected on the real object. Finally, some useful indications about the possible future work in all the presented fields are given, in order to delineate the developments of this promising direction of research

Electronic Thesis and Dissertation Archive - Università di Pisa