923 research outputs found
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
Detection and classification of non-stationary signals using sparse representations in adaptive dictionaries
Automatic classification of non-stationary radio frequency (RF) signals is of particular interest in persistent surveillance and remote sensing applications. Such signals are often acquired in noisy, cluttered environments, and may be characterized by complex or unknown analytical models, making feature extraction and classification difficult. This thesis proposes an adaptive classification approach for poorly characterized targets and backgrounds based on sparse representations in non-analytical dictionaries learned from data. Conventional analytical orthogonal dictionaries, e.g., Short Time Fourier and Wavelet Transforms, can be suboptimal for classification of non-stationary signals, as they provide a rigid tiling of the time-frequency space, and are not specifically designed for a particular signal class. They generally do not lead to sparse decompositions (i.e., with very few non-zero coefficients), and use in classification requires separate feature selection algorithms. Pursuit-type decompositions in analytical overcomplete (non-orthogonal) dictionaries yield sparse representations, by design, and work well for signals that are similar to the dictionary elements. The pursuit search, however, has a high computational cost, and the method can perform poorly in the presence of realistic noise and clutter. One such overcomplete analytical dictionary method is also analyzed in this thesis for comparative purposes. The main thrust of the thesis is learning discriminative RF dictionaries directly from data, without relying on analytical constraints or additional knowledge about the signal characteristics. A pursuit search is used over the learned dictionaries to generate sparse classification features in order to identify time windows that contain a target pulse. Two state-of-the-art dictionary learning methods are compared, the K-SVD algorithm and Hebbian learning, in terms of their classification performance as a function of dictionary training parameters. Additionally, a novel hybrid dictionary algorithm is introduced, demonstrating better performance and higher robustness to noise. The issue of dictionary dimensionality is explored and this thesis demonstrates that undercomplete learned dictionaries are suitable for non-stationary RF classification. Results on simulated data sets with varying background clutter and noise levels are presented. Lastly, unsupervised classification with undercomplete learned dictionaries is also demonstrated in satellite imagery analysis
FACE RECOGNITION AND VERIFICATION IN UNCONSTRAINED ENVIRIONMENTS
Face recognition has been a long standing problem in computer vision. General
face recognition is challenging because of large appearance variability due to
factors including pose, ambient lighting, expression, size of the face, age, and distance
from the camera, etc. There are very accurate techniques to perform face
recognition in controlled environments, especially when large numbers of samples
are available for each face (individual). However, face identification under uncontrolled(
unconstrained) environments or with limited training data is still an unsolved
problem. There are two face recognition tasks: face identification (who is who in
a probe face set, given a gallery face set) and face verification (same or not, given
two faces). In this work, we study both face identification and verification in unconstrained
environments.
Firstly, we propose a face verification framework that combines Partial Least
Squares (PLS) and the One-Shot similarity model[1]. The idea is to describe a
face with a large feature set combining shape, texture and color information. PLS
regression is applied to perform multi-channel feature weighting on this large feature
set. Finally the PLS regression is used to compute the similarity score of an image
pair by One-Shot learning (using a fixed negative set).
Secondly, we study face identification with image sets, where the gallery and
probe are sets of face images of an individual. We model a face set by its covariance
matrix (COV) which is a natural 2nd-order statistic of a sample set.By exploring an
efficient metric for the SPD matrices, i.e., Log-Euclidean Distance (LED), we derive
a kernel function that explicitly maps the covariance matrix from the Riemannian
manifold to Euclidean space. Then, discriminative learning is performed on the
COV manifold: the learning aims to maximize the between-class COV distance and
minimize the within-class COV distance.
Sparse representation and dictionary learning have been widely used in face
recognition, especially when large numbers of samples are available for each face
(individual). Sparse coding is promising since it provides a more stable and discriminative
face representation. In the last part of our work, we explore sparse
coding and dictionary learning for face verification application. More specifically,
in one approach, we apply sparse representations to face verification in two ways
via a fix reference set as dictionary. In the other approach, we propose a dictionary
learning framework with explicit pairwise constraints, which unifies the discriminative
dictionary learning for pair matching (face verification) and classification (face
recognition) problems
Taming Wild Faces: Web-Scale, Open-Universe Face Identification in Still and Video Imagery
With the increasing pervasiveness of digital cameras, the Internet, and social networking, there is a growing need to catalog and analyze large collections of photos and videos. In this dissertation, we explore unconstrained still-image and video-based face recognition in real-world scenarios, e.g. social photo sharing and movie trailers, where people of interest are recognized and all others are ignored. In such a scenario, we must obtain high precision in recognizing the known identities, while accurately rejecting those of no interest. Recent advancements in face recognition research has seen Sparse Representation-based Classification (SRC) advance to the forefront of competing methods. However, its drawbacks, slow speed and sensitivity to variations in pose, illumination, and occlusion, have hindered its wide-spread applicability. The contributions of this dissertation are three-fold: 1. For still-image data, we propose a novel Linearly Approximated Sparse Representation-based Classification (LASRC) algorithm that uses linear regression to perform sample selection for l1-minimization, thus harnessing the speed of least-squares and the robustness of SRC. On our large dataset collected from Facebook, LASRC performs equally to standard SRC with a speedup of 100-250x. 2. For video, applying the popular l1-minimization for face recognition on a frame-by-frame basis is prohibitively expensive computationally, so we propose a new algorithm Mean Sequence SRC (MSSRC) that performs video face recognition using a joint optimization leveraging all of the available video data and employing the knowledge that the face track frames belong to the same individual. Employing MSSRC results in a speedup of 5x on average over SRC on a frame-by-frame basis. 3. Finally, we make the observation that MSSRC sometimes assigns inconsistent identities to the same individual in a scene that could be corrected based on their visual similarity. Therefore, we construct a probabilistic affinity graph combining appearance and co-occurrence similarities to model the relationship between face tracks in a video. Using this relationship graph, we employ random walk analysis to propagate strong class predictions among similar face tracks, while dampening weak predictions. Our method results in a performance gain of 15.8% in average precision over using MSSRC alone
A Panorama on Multiscale Geometric Representations, Intertwining Spatial, Directional and Frequency Selectivity
The richness of natural images makes the quest for optimal representations in
image processing and computer vision challenging. The latter observation has
not prevented the design of image representations, which trade off between
efficiency and complexity, while achieving accurate rendering of smooth regions
as well as reproducing faithful contours and textures. The most recent ones,
proposed in the past decade, share an hybrid heritage highlighting the
multiscale and oriented nature of edges and patterns in images. This paper
presents a panorama of the aforementioned literature on decompositions in
multiscale, multi-orientation bases or dictionaries. They typically exhibit
redundancy to improve sparsity in the transformed domain and sometimes its
invariance with respect to simple geometric deformations (translation,
rotation). Oriented multiscale dictionaries extend traditional wavelet
processing and may offer rotation invariance. Highly redundant dictionaries
require specific algorithms to simplify the search for an efficient (sparse)
representation. We also discuss the extension of multiscale geometric
decompositions to non-Euclidean domains such as the sphere or arbitrary meshed
surfaces. The etymology of panorama suggests an overview, based on a choice of
partially overlapping "pictures". We hope that this paper will contribute to
the appreciation and apprehension of a stream of current research directions in
image understanding.Comment: 65 pages, 33 figures, 303 reference
- …