2,235 research outputs found
Bayesian Speaker Adaptation Based on a New Hierarchical Probabilistic Model
In this paper, a new hierarchical Bayesian speaker adaptation method called HMAP is proposed that combines the advantages of three conventional algorithms, maximum a posteriori (MAP), maximum-likelihood linear regression (MLLR), and eigenvoice, resulting in excellent performance across a wide range of adaptation conditions. The new method efficiently utilizes intra-speaker and inter-speaker correlation information through modeling phone and speaker subspaces in a consistent hierarchical Bayesian way. The phone variations for a specific speaker are assumed to be located in a low-dimensional subspace. The phone coordinate, which is shared among different speakers, implicitly contains the intra-speaker correlation information. For a specific speaker, the phone variation, represented by speaker-dependent eigenphones, are concatenated into a supervector. The eigenphone supervector space is also a low dimensional speaker subspace, which contains inter-speaker correlation information. Using principal component analysis (PCA), a new hierarchical probabilistic model for the generation of the speech observations is obtained. Speaker adaptation based on the new hierarchical model is derived using the maximum a posteriori criterion in a top-down manner. Both batch adaptation and online adaptation schemes are proposed. With tuned parameters, the new method can handle varying amounts of adaptation data automatically and efficiently. Experimental results on a Mandarin Chinese continuous speech recognition task show good performance under all testing conditions
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
Diffeomorphic Transformations for Time Series Analysis: An Efficient Approach to Nonlinear Warping
The proliferation and ubiquity of temporal data across many disciplines has
sparked interest for similarity, classification and clustering methods
specifically designed to handle time series data. A core issue when dealing
with time series is determining their pairwise similarity, i.e., the degree to
which a given time series resembles another. Traditional distance measures such
as the Euclidean are not well-suited due to the time-dependent nature of the
data. Elastic metrics such as dynamic time warping (DTW) offer a promising
approach, but are limited by their computational complexity,
non-differentiability and sensitivity to noise and outliers. This thesis
proposes novel elastic alignment methods that use parametric \& diffeomorphic
warping transformations as a means of overcoming the shortcomings of DTW-based
metrics. The proposed method is differentiable \& invertible, well-suited for
deep learning architectures, robust to noise and outliers, computationally
efficient, and is expressive and flexible enough to capture complex patterns.
Furthermore, a closed-form solution was developed for the gradient of these
diffeomorphic transformations, which allows an efficient search in the
parameter space, leading to better solutions at convergence. Leveraging the
benefits of these closed-form diffeomorphic transformations, this thesis
proposes a suite of advancements that include: (a) an enhanced temporal
transformer network for time series alignment and averaging, (b) a
deep-learning based time series classification model to simultaneously align
and classify signals with high accuracy, (c) an incremental time series
clustering algorithm that is warping-invariant, scalable and can operate under
limited computational and time resources, and finally, (d) a normalizing flow
model that enhances the flexibility of affine transformations in coupling and
autoregressive layers.Comment: PhD Thesis, defended at the University of Navarra on July 17, 2023.
277 pages, 8 chapters, 1 appendi
Recent Progress in Image Deblurring
This paper comprehensively reviews the recent development of image
deblurring, including non-blind/blind, spatially invariant/variant deblurring
techniques. Indeed, these techniques share the same objective of inferring a
latent sharp image from one or several corresponding blurry images, while the
blind deblurring techniques are also required to derive an accurate blur
kernel. Considering the critical role of image restoration in modern imaging
systems to provide high-quality images under complex environments such as
motion, undesirable lighting conditions, and imperfect system components, image
deblurring has attracted growing attention in recent years. From the viewpoint
of how to handle the ill-posedness which is a crucial issue in deblurring
tasks, existing methods can be grouped into five categories: Bayesian inference
framework, variational methods, sparse representation-based methods,
homography-based modeling, and region-based methods. In spite of achieving a
certain level of development, image deblurring, especially the blind case, is
limited in its success by complex application conditions which make the blur
kernel hard to obtain and be spatially variant. We provide a holistic
understanding and deep insight into image deblurring in this review. An
analysis of the empirical evidence for representative methods, practical
issues, as well as a discussion of promising future directions are also
presented.Comment: 53 pages, 17 figure
- …