279 research outputs found
Inpainting of Missing Audio Signal Samples
V oblasti zpracování signálů se v současné době čím dál více využívají tzv. řídké reprezentace signálů, tzn. že daný signál je možné vyjádřit přesně či velmi dobře aproximovat lineární kombinací velmi malého počtu vektorů ze zvoleného reprezentačního systému. Tato práce se zabývá využitím řídkých reprezentací pro rekonstrukci poškozených zvukových záznamů, ať už historických nebo nově vzniklých. Především historické zvukové nahrávky trpí zarušením jako praskání nebo šum. Krátkodobé poškození zvukových nahrávek bylo doposud řešeno interpolačními technikami, zejména pomocí autoregresního modelování. V nedávné době byl představen algoritmus s názvem Audio Inpainting, který řeší doplňování chybějících vzorků ve zvukovém signálu pomocí řídkých reprezentací. Zmíněný algoritmus využívá tzv. hladové algoritmy pro řešení optimalizačních úloh. Cílem této práce je porovnání dosavadních interpolačních metod s technikou Audio Inpaintingu. Navíc, k řešení optimalizačních úloh jsou využívány algoritmy založené na l1-relaxaci, a to jak ve formě analyzujícího, tak i syntetizujícího modelu. Především se jedná o proximální algoritmy. Tyto algoritmy pracují jak s jednotlivými koeficienty samostatně, tak s koeficienty v závislosti na jejich okolí, tzv. strukturovaná řídkost. Strukturovaná řídkost je dále využita taky pro odšumování zvukových nahrávek. Jednotlivé algoritmy jsou v praktické části zhodnoceny z hlediska nastavení parametrů pro optimální poměr rekonstrukce vs. výpočetní čas. Všechny algoritmy popsané v práci jsou na praktických příkladech porovnány pomocí objektivních metod odstupu signálu od šumu (SNR) a PEMO-Q. Na závěr je úspěšnost rekonstrukce poškozených zvukových signálů vyhodnocena.Recently, sparse representations of signals became very popular in the field of signal processing. Sparse representation mean that the signal is represented exactly or very well approximated by a linear combination of only a few vectors from the specific representation system. This thesis deals with the utilization of sparse representations of signals for the process of audio restoration, either historical or recent records. Primarily old audio recordings suffer from defects like crackles or noise. Until now, short gaps in audio signals were repaired by interpolation techniques, especially autoregressive modeling. Few years ago, an algorithm termed the Audio Inpainting was introduced. This algorithm solves the missing audio signal samples inpainting using sparse representations through the greedy algorithm for sparse approximation. This thesis aims to compare the state-of-the-art interpolation methods with the Audio Inpainting. Besides this, l1-relaxation methods are utilized for sparse approximation, while both analysis and synthesis models are incorporated. Algorithms used for the sparse approximation are called the proximal algorithms. These algorithms treat the coefficients either separately or with relations to their neighbourhood (structured sparsity). Further, structured sparsity is used for audio denoising. In the experimental part of the thesis, parameters of each algorithm are evaluated in terms of optimal restoration efficiency vs. processing time efficiency. All of the algorithms described in the thesis are compared using objective evaluation methods Signal-to-Noise ratio (SNR) and PEMO-Q. Finally, the overall conclusion and discussion on the restoration results is presented.
Audio Declipping with Social Sparsity
International audienceWe consider the audio declipping problem by using iterative thresholding algorithms and the principle of social sparsity. This recently introduced approach features thresholding/shrinkage operators which allow to model dependencies between neighboring coefficients in expansions with time-frequency dictionaries. A new unconstrained convex formulation of the audio declipping problem is introduced. The chosen structured thresholding operators are the so called \emph{windowed group-Lasso} and the \emph{persistent empirical Wiener}. The usage of these operators significantly improves the quality of the reconstruction, compared to simple soft-thresholding. The resulting algorithm is fast, simple to implement, and it outperforms the state of the art in terms of signal to noise ratio
The Affine Uncertainty Principle, Associated Frames and Applications in Signal Processing
Uncertainty relations play a prominent role in signal processing, stating that a signal can not be simultaneously concentrated in the two related domains of the corresponding phase space. In particular, a new uncertainty principle for the affine group, which is directly related to the wavelet transform has lead to a new minimizing waveform. In this thesis, a frame construction is proposed which leads to approximately tight frames based on this minimizing waveform. Frame properties such as the diagonality of the frame operator as well as lower and upper frame bounds are analyzed. Additionally, three applications of such frame constructions are introduced: inpainting of missing audio data, detection of neuronal spikes in extracellular recorded data and peak detection in MALDI imaging data
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
Audio-Visual Speech Inpainting with Deep Learning
In this paper, we present a deep-learning-based framework for audio-visual
speech inpainting, i.e., the task of restoring the missing parts of an acoustic
speech signal from reliable audio context and uncorrupted visual information.
Recent work focuses solely on audio-only methods and generally aims at
inpainting music signals, which show highly different structure than speech.
Instead, we inpaint speech signals with gaps ranging from 100 ms to 1600 ms to
investigate the contribution that vision can provide for gaps of different
duration. We also experiment with a multi-task learning approach where a phone
recognition task is learned together with speech inpainting. Results show that
the performance of audio-only speech inpainting approaches degrades rapidly
when gaps get large, while the proposed audio-visual approach is able to
plausibly restore missing information. In addition, we show that multi-task
learning is effective, although the largest contribution to performance comes
from vision
State of the Art in Face Recognition
Notwithstanding the tremendous effort to solve the face recognition problem, it is not possible yet to design a face recognition system with a potential close to human performance. New computer vision and pattern recognition approaches need to be investigated. Even new knowledge and perspectives from different fields like, psychology and neuroscience must be incorporated into the current field of face recognition to design a robust face recognition system. Indeed, many more efforts are required to end up with a human like face recognition system. This book tries to make an effort to reduce the gap between the previous face recognition research state and the future state
- …