25 research outputs found
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
Sparse and low rank approximations for action recognition
Action recognition is crucial area of research in computer vision with wide range of
applications in surveillance, patient-monitoring systems, video indexing, Human-
Computer Interaction and many more. These applications require automated
action recognition. Robust classification methods are sought-after despite influential
research in this field over past decade. The data resources have grown
tremendously owing to the advances in the digital revolution which cannot be
compared to the meagre resources in the past. The main limitation on a system
when dealing with video data is the computational burden due to large dimensions
and data redundancy. Sparse and low rank approximation methods have evolved
recently which aim at concise and meaningful representation of data. This thesis
explores the application of sparse and low rank approximation methods in the
context of video data classification with the following contributions.
1. An approach for solving the problem of action and gesture classification is
proposed within the sparse representation domain, effectively dealing with
large feature dimensions,
2. Low rank matrix completion approach is proposed to jointly classify more
than one action
3. Deep features are proposed for robust classification of multiple actions
within matrix completion framework which can handle data deficiencies.
This thesis starts with the applicability of sparse representations based classifi-
cation methods to the problem of action and gesture recognition. Random projection
is used to reduce the dimensionality of the features. These are referred
to as compressed features in this thesis. The dictionary formed with compressed
features has proved to be efficient for the classification task achieving comparable
results to the state of the art.
Next, this thesis addresses the more promising problem of simultaneous classifi-
cation of multiple actions. This is treated as matrix completion problem under
transduction setting. Matrix completion methods are considered as the generic
extension to the sparse representation methods from compressed sensing point
of view. The features and corresponding labels of the training and test data are
concatenated and placed as columns of a matrix. The unknown test labels would
be the missing entries in that matrix. This is solved using rank minimization
techniques based on the assumption that the underlying complete matrix would
be a low rank one. This approach has achieved results better than the state of the art on datasets with varying complexities.
This thesis then extends the matrix completion framework for joint classification
of actions to handle the missing features besides missing test labels. In
this context, deep features from a convolutional neural network are proposed.
A convolutional neural network is trained on the training data and features are
extracted from train and test data from the trained network. The performance
of the deep features has proved to be promising when compared to the state of
the art hand-crafted features
Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics
This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ∼ 7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p
Patch-based methods for variational image processing problems
Image Processing problems are notoriously difficult. To name a few of these difficulties, they are usually ill-posed, involve a huge number of unknowns (from one to several per pixel!), and images cannot be considered as the linear superposition of a few physical sources as they contain many different scales and non-linearities. However, if one considers instead of images as a whole small blocks (or patches) inside the pictures, many of these hurdles vanish and problems become much easier to solve, at the cost of increasing again the dimensionality of the data to process. Following the seminal NL-means algorithm in 2005-2006, methods that consider only the visual correlation between patches and ignore their spatial relationship are called non-local methods. While powerful, it is an arduous task to define non-local methods without using heuristic formulations or complex mathematical frameworks. On the other hand, another powerful property has brought global image processing algorithms one step further: it is the sparsity of images in well chosen representation basis. However, this property is difficult to embed naturally in non-local methods, yielding algorithms that are usually inefficient or circonvoluted. In this thesis, we explore alternative approaches to non-locality, with the goals of i) developing universal approaches that can handle local and non-local constraints and ii) leveraging the qualities of both non-locality and sparsity. For the first point, we will see that embedding the patches of an image into a graph-based framework can yield a simple algorithm that can switch from local to non-local diffusion, which we will apply to the problem of large area image inpainting. For the second point, we will first study a fast patch preselection process that is able to group patches according to their visual content. This preselection operator will then serve as input to a social sparsity enforcing operator that will create sparse groups of jointly sparse patches, thus exploiting all the redundancies present in the data, in a simple mathematical framework. Finally, we will study the problem of reconstructing plausible patches from a few binarized measurements. We will show that this task can be achieved in the case of popular binarized image keypoints descriptors, thus demonstrating a potential privacy issue in mobile visual recognition applications, but also opening a promising way to the design and the construction of a new generation of smart cameras
Foundations, Inference, and Deconvolution in Image Restoration
Image restoration is a critical preprocessing step in computer vision,
producing images with reduced noise, blur, and pixel defects.
This enables precise higher-level reasoning as to the scene content in
later stages of the vision pipeline (e.g., object segmentation,
detection, recognition, and tracking).
Restoration techniques have found extensive usage in a broad range of
applications from industry, medicine, astronomy, biology, and
photography.
The recovery of high-grade results requires models of the image
degradation process, giving rise to a class of often heavily
underconstrained, inverse problems.
A further challenge specific to the problem of blur removal is noise
amplification, which may cause strong distortion by ringing artifacts.
This dissertation presents new insights and problem solving procedures
for three areas of image restoration, namely (1) model
foundations, (2) Bayesian inference for high-order Markov
random fields (MRFs), and (3) blind image deblurring
(deconvolution).
As basic research on model foundations, we contribute to reconciling
the perceived differences between probabilistic MRFs on the one hand,
and deterministic variational models on the other.
To do so, we restrict the variational functional to locally supported finite
elements (FE) and integrate over the domain.
This yields a sum of terms depending locally on FE basis coefficients,
and by identifying the latter with pixels, the terms resolve to MRF
potential functions.
In contrast with previous literature, we place special emphasis on robust
regularizers used commonly in contemporary computer vision.
Moreover, we draw samples from the derived models to further
demonstrate the probabilistic connection.
Another focal issue is a class of high-order Field of Experts MRFs
which are learned generatively from natural image data and yield
best quantitative results under Bayesian estimation.
This involves minimizing an integral expression, which has no closed
form solution in general.
However, the MRF class under study has Gaussian mixture potentials,
permitting expansion by indicator variables as a technical measure.
As approximate inference method, we study Gibbs sampling in the
context of non-blind deblurring and obtain excellent results, yet
at the cost of high computing effort.
In reaction to this, we turn to the mean field algorithm, and show
that it scales quadratically in the clique size for a standard
restoration setting with linear degradation model.
An empirical study of mean field over several restoration scenarios
confirms advantageous properties with regard to both image quality and
computational runtime.
This dissertation further examines the problem of blind deconvolution,
beginning with localized blur from fast moving objects in the
scene, or from camera defocus.
Forgoing dedicated hardware or user labels, we rely only on the image
as input and introduce a latent variable model to explain the
non-uniform blur.
The inference procedure estimates freely varying kernels and we
demonstrate its generality by extensive experiments.
We further present a discriminative method for blind removal of camera
shake.
In particular, we interleave discriminative non-blind deconvolution
steps with kernel estimation and leverage the error cancellation
effects of the Regression Tree Field model to attain a deblurring
process with tightly linked sequential stages
Multimedia Forensics
This book is open access. Media forensics has never been more relevant to societal life. Not only media content represents an ever-increasing share of the data traveling on the net and the preferred communications means for most users, it has also become integral part of most innovative applications in the digital information ecosystem that serves various sectors of society, from the entertainment, to journalism, to politics. Undoubtedly, the advances in deep learning and computational imaging contributed significantly to this outcome. The underlying technologies that drive this trend, however, also pose a profound challenge in establishing trust in what we see, hear, and read, and make media content the preferred target of malicious attacks. In this new threat landscape powered by innovative imaging technologies and sophisticated tools, based on autoencoders and generative adversarial networks, this book fills an important gap. It presents a comprehensive review of state-of-the-art forensics capabilities that relate to media attribution, integrity and authenticity verification, and counter forensics. Its content is developed to provide practitioners, researchers, photo and video enthusiasts, and students a holistic view of the field