4,446 research outputs found
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
Deep Shape Matching
We cast shape matching as metric learning with convolutional networks. We
break the end-to-end process of image representation into two parts. Firstly,
well established efficient methods are chosen to turn the images into edge
maps. Secondly, the network is trained with edge maps of landmark images, which
are automatically obtained by a structure-from-motion pipeline. The learned
representation is evaluated on a range of different tasks, providing
improvements on challenging cases of domain generalization, generic
sketch-based image retrieval or its fine-grained counterpart. In contrast to
other methods that learn a different model per task, object category, or
domain, we use the same network throughout all our experiments, achieving
state-of-the-art results in multiple benchmarks.Comment: ECCV 201
“I Look in Your Eyes, Honey”: Internal Face Features Induce Spatial Frequency Preference for Human Face Processing
Numerous psychophysical experiments found that humans preferably rely on a narrow
band of spatial frequencies for recognition of face identity. A recently
conducted theoretical study by the author suggests that this frequency
preference reflects an adaptation of the brain's face processing
machinery to this specific stimulus class (i.e., faces). The purpose of the
present study is to examine this property in greater detail and to specifically
elucidate the implication of internal face features (i.e., eyes, mouth, and
nose). To this end, I parameterized Gabor filters to match the spatial receptive
field of contrast sensitive neurons in the primary visual cortex (simple and
complex cells). Filter responses to a large number of face images were computed,
aligned for internal face features, and response-equalized
(“whitened”). The results demonstrate that the frequency
preference is caused by internal face features. Thus, the psychophysically
observed human frequency bias for face processing seems to be specifically
caused by the intrinsic spatial frequency content of internal face features
Sparse, hierarchical and shared-factors priors for representation learning
La représentation en caractéristiques est une préoccupation centrale des systèmes d’apprentissage automatique d’aujourd’hui. Une représentation adéquate peut faciliter une tâche d’apprentissage complexe. C’est le cas lorsque par exemple cette représentation est de faible dimensionnalité et est constituée de caractéristiques de haut niveau. Mais comment déterminer si une représentation est adéquate pour une tâche d’apprentissage ? Les récents travaux suggèrent qu’il est préférable de voir le choix de la représentation comme un problème d’apprentissage en soi. C’est ce que l’on nomme l’apprentissage de représentation. Cette thèse présente une série de contributions visant à améliorer la qualité des représentations apprises. La première contribution élabore une étude comparative des approches par dictionnaire parcimonieux sur le problème de la localisation de points de prises (pour la saisie robotisée) et fournit une analyse empirique de leurs avantages et leurs inconvénients. La deuxième contribution propose une architecture réseau de neurones à convolution (CNN) pour la détection de points de prise et la compare aux approches d’apprentissage par dictionnaire. Ensuite, la troisième contribution élabore une nouvelle fonction d’activation paramétrique et la valide expérimentalement. Finalement, la quatrième contribution détaille un nouveau mécanisme de partage souple de paramètres dans un cadre d’apprentissage multitâche.Feature representation is a central concern of today’s machine learning systems. A proper representation can facilitate a complex learning task. This is the case when for instance the representation has low dimensionality and consists of high-level characteristics. But how can we determine if a representation is adequate for a learning task? Recent work suggests that it is better to see the choice of representation as a learning problem in itself. This is called Representation Learning. This thesis presents a series of contributions aimed at improving the quality of the learned representations. The first contribution elaborates a comparative study of Sparse Dictionary Learning (SDL) approaches on the problem of grasp detection (for robotic grasping) and provides an empirical analysis of their advantages and disadvantages. The second contribution proposes a Convolutional Neural Network (CNN) architecture for grasp detection and compares it to SDL. Then, the third contribution elaborates a new parametric activation function and validates it experimentally. Finally, the fourth contribution details a new soft parameter sharing mechanism for multitasking learning
Project SEMACODE : a scale-invariant object recognition system for content-based queries in image databases
For the efficient management of large image databases, the automated characterization of images and the usage of that characterization for searching and ordering tasks is highly desirable. The purpose of the project SEMACODE is to combine the still unsolved problem of content-oriented characterization of images with scale-invariant object recognition and modelbased compression methods. To achieve this goal, existing techniques as well as new concepts related to pattern matching, image encoding, and image compression are examined. The resulting methods are integrated in a common framework with the aid of a content-oriented conception. For the application, an image database at the library of the university of Frankfurt/Main (StUB; about 60000 images), the required operations are developed. The search and query interfaces are defined in close cooperation with the StUB project “Digitized Colonial Picture Library”. This report describes the fundamentals and first results of the image encoding and object recognition algorithms developed within the scope of the project
Instance-Aware Domain Generalization for Face Anti-Spoofing
Face anti-spoofing (FAS) based on domain generalization (DG) has been
recently studied to improve the generalization on unseen scenarios. Previous
methods typically rely on domain labels to align the distribution of each
domain for learning domain-invariant representations. However, artificial
domain labels are coarse-grained and subjective, which cannot reflect real
domain distributions accurately. Besides, such domain-aware methods focus on
domain-level alignment, which is not fine-grained enough to ensure that learned
representations are insensitive to domain styles. To address these issues, we
propose a novel perspective for DG FAS that aligns features on the instance
level without the need for domain labels. Specifically, Instance-Aware Domain
Generalization framework is proposed to learn the generalizable feature by
weakening the features' sensitivity to instance-specific styles. Concretely, we
propose Asymmetric Instance Adaptive Whitening to adaptively eliminate the
style-sensitive feature correlation, boosting the generalization. Moreover,
Dynamic Kernel Generator and Categorical Style Assembly are proposed to first
extract the instance-specific features and then generate the style-diversified
features with large style shifts, respectively, further facilitating the
learning of style-insensitive features. Extensive experiments and analysis
demonstrate the superiority of our method over state-of-the-art competitors.
Code will be publicly available at https://github.com/qianyuzqy/IADG.Comment: Accepted to IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 202
Biologically inspired feature extraction for rotation and scale tolerant pattern analysis
Biologically motivated information processing has been an important area of scientific research for decades. The central topic addressed in this dissertation is utilization of lateral inhibition and more generally, linear networks with recurrent connectivity along with complex-log conformal mapping in machine based implementations of information encoding, feature extraction and pattern recognition. The reasoning behind and method for spatially uniform implementation of inhibitory/excitatory network model in the framework of non-uniform log-polar transform is presented. For the space invariant connectivity model characterized by Topelitz-Block-Toeplitz matrix, the overall network response is obtained without matrix inverse operations providing the connection matrix generating function is bound by unity. It was shown that for the network with the inter-neuron connection function expandable in a Fourier series in polar angle, the overall network response is steerable. The decorrelating/whitening characteristics of networks with lateral inhibition are used in order to develop space invariant pre-whitening kernels specialized for specific category of input signals. These filters have extremely small memory footprint and are successfully utilized in order to improve performance of adaptive neural whitening algorithms. Finally, the method for feature extraction based on localized Independent Component Analysis (ICA) transform in log-polar domain and aided by previously developed pre-whitening filters is implemented. Since output codes produced by ICA are very sparse, a small number of non-zero coefficients was sufficient to encode input data and obtain reliable pattern recognition performance
- …