4,446 research outputs found

    Sparse Modeling for Image and Vision Processing

    Get PDF
    In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection---that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of representing data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by several scientific communities such as neuroscience, bioinformatics, or computer vision. The goal of this monograph is to offer a self-contained view of sparse modeling for visual recognition and image processing. More specifically, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics and Visio

    Deep Shape Matching

    Full text link
    We cast shape matching as metric learning with convolutional networks. We break the end-to-end process of image representation into two parts. Firstly, well established efficient methods are chosen to turn the images into edge maps. Secondly, the network is trained with edge maps of landmark images, which are automatically obtained by a structure-from-motion pipeline. The learned representation is evaluated on a range of different tasks, providing improvements on challenging cases of domain generalization, generic sketch-based image retrieval or its fine-grained counterpart. In contrast to other methods that learn a different model per task, object category, or domain, we use the same network throughout all our experiments, achieving state-of-the-art results in multiple benchmarks.Comment: ECCV 201

    “I Look in Your Eyes, Honey”: Internal Face Features Induce Spatial Frequency Preference for Human Face Processing

    Get PDF
    Numerous psychophysical experiments found that humans preferably rely on a narrow band of spatial frequencies for recognition of face identity. A recently conducted theoretical study by the author suggests that this frequency preference reflects an adaptation of the brain's face processing machinery to this specific stimulus class (i.e., faces). The purpose of the present study is to examine this property in greater detail and to specifically elucidate the implication of internal face features (i.e., eyes, mouth, and nose). To this end, I parameterized Gabor filters to match the spatial receptive field of contrast sensitive neurons in the primary visual cortex (simple and complex cells). Filter responses to a large number of face images were computed, aligned for internal face features, and response-equalized (“whitened”). The results demonstrate that the frequency preference is caused by internal face features. Thus, the psychophysically observed human frequency bias for face processing seems to be specifically caused by the intrinsic spatial frequency content of internal face features

    Sparse, hierarchical and shared-factors priors for representation learning

    Get PDF
    La représentation en caractéristiques est une préoccupation centrale des systèmes d’apprentissage automatique d’aujourd’hui. Une représentation adéquate peut faciliter une tâche d’apprentissage complexe. C’est le cas lorsque par exemple cette représentation est de faible dimensionnalité et est constituée de caractéristiques de haut niveau. Mais comment déterminer si une représentation est adéquate pour une tâche d’apprentissage ? Les récents travaux suggèrent qu’il est préférable de voir le choix de la représentation comme un problème d’apprentissage en soi. C’est ce que l’on nomme l’apprentissage de représentation. Cette thèse présente une série de contributions visant à améliorer la qualité des représentations apprises. La première contribution élabore une étude comparative des approches par dictionnaire parcimonieux sur le problème de la localisation de points de prises (pour la saisie robotisée) et fournit une analyse empirique de leurs avantages et leurs inconvénients. La deuxième contribution propose une architecture réseau de neurones à convolution (CNN) pour la détection de points de prise et la compare aux approches d’apprentissage par dictionnaire. Ensuite, la troisième contribution élabore une nouvelle fonction d’activation paramétrique et la valide expérimentalement. Finalement, la quatrième contribution détaille un nouveau mécanisme de partage souple de paramètres dans un cadre d’apprentissage multitâche.Feature representation is a central concern of today’s machine learning systems. A proper representation can facilitate a complex learning task. This is the case when for instance the representation has low dimensionality and consists of high-level characteristics. But how can we determine if a representation is adequate for a learning task? Recent work suggests that it is better to see the choice of representation as a learning problem in itself. This is called Representation Learning. This thesis presents a series of contributions aimed at improving the quality of the learned representations. The first contribution elaborates a comparative study of Sparse Dictionary Learning (SDL) approaches on the problem of grasp detection (for robotic grasping) and provides an empirical analysis of their advantages and disadvantages. The second contribution proposes a Convolutional Neural Network (CNN) architecture for grasp detection and compares it to SDL. Then, the third contribution elaborates a new parametric activation function and validates it experimentally. Finally, the fourth contribution details a new soft parameter sharing mechanism for multitasking learning

    Project SEMACODE : a scale-invariant object recognition system for content-based queries in image databases

    Get PDF
    For the efficient management of large image databases, the automated characterization of images and the usage of that characterization for searching and ordering tasks is highly desirable. The purpose of the project SEMACODE is to combine the still unsolved problem of content-oriented characterization of images with scale-invariant object recognition and modelbased compression methods. To achieve this goal, existing techniques as well as new concepts related to pattern matching, image encoding, and image compression are examined. The resulting methods are integrated in a common framework with the aid of a content-oriented conception. For the application, an image database at the library of the university of Frankfurt/Main (StUB; about 60000 images), the required operations are developed. The search and query interfaces are defined in close cooperation with the StUB project “Digitized Colonial Picture Library”. This report describes the fundamentals and first results of the image encoding and object recognition algorithms developed within the scope of the project

    Instance-Aware Domain Generalization for Face Anti-Spoofing

    Full text link
    Face anti-spoofing (FAS) based on domain generalization (DG) has been recently studied to improve the generalization on unseen scenarios. Previous methods typically rely on domain labels to align the distribution of each domain for learning domain-invariant representations. However, artificial domain labels are coarse-grained and subjective, which cannot reflect real domain distributions accurately. Besides, such domain-aware methods focus on domain-level alignment, which is not fine-grained enough to ensure that learned representations are insensitive to domain styles. To address these issues, we propose a novel perspective for DG FAS that aligns features on the instance level without the need for domain labels. Specifically, Instance-Aware Domain Generalization framework is proposed to learn the generalizable feature by weakening the features' sensitivity to instance-specific styles. Concretely, we propose Asymmetric Instance Adaptive Whitening to adaptively eliminate the style-sensitive feature correlation, boosting the generalization. Moreover, Dynamic Kernel Generator and Categorical Style Assembly are proposed to first extract the instance-specific features and then generate the style-diversified features with large style shifts, respectively, further facilitating the learning of style-insensitive features. Extensive experiments and analysis demonstrate the superiority of our method over state-of-the-art competitors. Code will be publicly available at https://github.com/qianyuzqy/IADG.Comment: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 202

    Biologically inspired feature extraction for rotation and scale tolerant pattern analysis

    Get PDF
    Biologically motivated information processing has been an important area of scientific research for decades. The central topic addressed in this dissertation is utilization of lateral inhibition and more generally, linear networks with recurrent connectivity along with complex-log conformal mapping in machine based implementations of information encoding, feature extraction and pattern recognition. The reasoning behind and method for spatially uniform implementation of inhibitory/excitatory network model in the framework of non-uniform log-polar transform is presented. For the space invariant connectivity model characterized by Topelitz-Block-Toeplitz matrix, the overall network response is obtained without matrix inverse operations providing the connection matrix generating function is bound by unity. It was shown that for the network with the inter-neuron connection function expandable in a Fourier series in polar angle, the overall network response is steerable. The decorrelating/whitening characteristics of networks with lateral inhibition are used in order to develop space invariant pre-whitening kernels specialized for specific category of input signals. These filters have extremely small memory footprint and are successfully utilized in order to improve performance of adaptive neural whitening algorithms. Finally, the method for feature extraction based on localized Independent Component Analysis (ICA) transform in log-polar domain and aided by previously developed pre-whitening filters is implemented. Since output codes produced by ICA are very sparse, a small number of non-zero coefficients was sufficient to encode input data and obtain reliable pattern recognition performance
    corecore