516 research outputs found
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
Disentanglement of Correlated Factors via Hausdorff Factorized Support
A grand goal in deep learning research is to learn representations capable of
generalizing across distribution shifts. Disentanglement is one promising
direction aimed at aligning a models representations with the underlying
factors generating the data (e.g. color or background). Existing
disentanglement methods, however, rely on an often unrealistic assumption: that
factors are statistically independent. In reality, factors (like object color
and shape) are correlated. To address this limitation, we propose a relaxed
disentanglement criterion - the Hausdorff Factorized Support (HFS) criterion -
that encourages a factorized support, rather than a factorial distribution, by
minimizing a Hausdorff distance. This allows for arbitrary distributions of the
factors over their support, including correlations between them. We show that
the use of HFS consistently facilitates disentanglement and recovery of
ground-truth factors across a variety of correlation settings and benchmarks,
even under severe training correlations and correlation shifts, with in parts
over +60% in relative improvement over existing disentanglement methods. In
addition, we find that leveraging HFS for representation learning can even
facilitate transfer to downstream tasks such as classification under
distribution shifts. We hope our original approach and positive empirical
results inspire further progress on the open problem of robust generalization
Generative Models for Preprocessing of Hospital Brain Scans
I will in this thesis present novel computational methods for processing routine clinical brain scans. Such scans were originally acquired for qualitative assessment by trained radiologists, and present a number of difficulties for computational models, such as those within common neuroimaging analysis software. The overarching objective of this work is to enable efficient and fully automated analysis of large neuroimaging datasets, of the type currently present in many hospitals worldwide. The methods presented are based on probabilistic, generative models of the observed imaging data, and therefore rely on informative priors and realistic forward models. The first part of the thesis will present a model for image quality improvement, whose key component is a novel prior for multimodal datasets. I will demonstrate its effectiveness for super-resolving thick-sliced clinical MR scans and for denoising CT images and MR-based, multi-parametric mapping acquisitions. I will then show how the same prior can be used for within-subject, intermodal image registration, for more robustly registering large numbers of clinical scans. The second part of the thesis focusses on improved, automatic segmentation and spatial normalisation of routine clinical brain scans. I propose two extensions to a widely used segmentation technique. First, a method for this model to handle missing data, which allows me to predict entirely missing modalities from one, or a few, MR contrasts. Second, a principled way of combining the strengths of probabilistic, generative models with the unprecedented discriminative capability of deep learning. By introducing a convolutional neural network as a Markov random field prior, I can model nonlinear class interactions and learn these using backpropagation. I show that this model is robust to sequence and scanner variability. Finally, I show examples of fitting a population-level, generative model to various neuroimaging data, which can model, e.g., CT scans with haemorrhagic lesions
SEGMENTATION, RECOGNITION, AND ALIGNMENT OF COLLABORATIVE GROUP MOTION
Modeling and recognition of human motion in videos has broad applications in behavioral biometrics, content-based visual data analysis, security and surveillance, as well as designing interactive environments. Significant progress has been made in the past two decades by way of new models, methods, and implementations. In this dissertation, we focus our attention on a relatively less investigated sub-area called collaborative group motion analysis. Collaborative group motions are those that typically involve multiple objects, wherein the motion patterns of individual objects may vary significantly in both space and time, but the collective motion pattern of the ensemble allows characterization in terms of geometry and statistics. Therefore, the motions or activities of an individual object constitute local information. A framework to synthesize all local information into a holistic view, and to explicitly characterize interactions among objects, involves large scale global reasoning, and is of significant complexity. In this dissertation, we first review relevant previous contributions on human motion/activity modeling and recognition, and then propose several approaches to answer a sequence of traditional vision questions including 1) which of the motion elements among all are the ones relevant to a group motion pattern of interest (Segmentation); 2) what is the underlying motion pattern (Recognition); and 3) how two motion ensembles are similar and how we can 'optimally' transform one to match the other (Alignment). Our primary practical scenario is American football play, where the corresponding problems are 1) who are offensive players; 2) what are the offensive strategy they are using; and 3) whether two plays are using the same strategy and how we can remove the spatio-temporal misalignment between them due to internal or external factors. The proposed approaches discard traditional modeling paradigm but explore either concise descriptors, hierarchies, stochastic mechanism, or compact generative model to achieve both effectiveness and efficiency.
In particular, the intrinsic geometry of the spaces of the involved features/descriptors/quantities is exploited and statistical tools are established on these nonlinear manifolds. These initial attempts have identified new challenging problems in complex motion analysis, as well as in more general tasks in video dynamics. The insights gained from nonlinear geometric modeling and analysis in this dissertation may hopefully be useful toward a broader class of computer vision applications
Can Tabular Generative Models Generate Realistic Synthetic Near Infrared Spectroscopic Data?
In this thesis, we evaluated the performance of two generative models, Conditional Tabular Gen-
erative Adversarial Network (CTGAN) and Tabular Variational Autoencoder (TVAE), from the
open-source library Synthetic Data Vault (SDV), for generating synthetic Near Infrared (NIR)
spectral data. The aim was to assess the viability of these models in synthetic data generation
for predicting Dry Matter Content (DMC) in the field of NIR spectroscopy. The fidelity and
utility of the synthetic data were examined through a series of benchmarks, including statistical
comparisons, dimensionality reduction, and machine learning tasks.
The results showed that while both CTGAN and TVAE could generate synthetic data with
statistical properties similar to real data, TVAE outperformed CTGAN in terms of preserving
the correlation structure of the data and the relationship between the features and the target
variable, DMC. However, the synthetic data fell short in fooling machine learning classifiers,
indicating a persisting challenge in synthetic data generation.
With respect to utility, neither synthetic dataset produced by CTGAN or TVAE could serve as
a satisfactory substitute for real data in training machine learning models for predicting DMC.
Although TVAE-generated synthetic data showed some potential when used with Random For-
est (RF) and K-Nearest Neighbors (KNN) classifiers, the performance was still inadequate for
practical use.
This study offers valuable insights into the use of generative models for synthetic NIR spectral
data generation, highlighting their current limitations and potential areas for future research
- …