209 research outputs found
Beyond 2D-grids: a dependence maximization view on image browsing
Ideally, one would like to perform image search using an intuitive and friendly approach. Many existing image search engines, however, present users with sets of images arranged in some default order on the screen, typically the relevance to a query, only. While this certainly has its advantages, arguably, a more flexible and intuitive way would be to sort images into arbitrary structures such as grids, hierarchies, or spheres so that images that are visually or semantically alike are placed together. This paper focuses on designing such a navigation system for image browsers. This is a challenging task because arbitrary layout structure makes it difficult -- if not impossible -- to compute cross-similarities between images and structure coordinates, the main ingredient of traditional layouting approaches. For this reason, we resort to a recently developed machine learning technique: kernelized sorting. It is a general technique for matching pairs of objects from different domains without requiring cross-domain similarity measures and hence elegantly allows sorting images into arbitrary structures. Moreover, we extend it so that some images can be preselected for instance forming the tip of the hierarchy allowing to subsequently navigate through the search results in the lower levels in an intuitive way
Novel Views of Objects from a Single Image
Taking an image of an object is at its core a lossy process. The rich information about the three-dimensional structure of the world is flattened to an image plane and decisions such as viewpoint and camera parameters are final and not easily revertible. As a consequence, possibilities of changing viewpoint are limited. Given a single image depicting an object, novel-view synthesis is the task of generating new images that render the object from a different viewpoint than the one given. The main difficulty is to synthesize the parts that are disoccluded; disocclusion occurs when parts of an object are hidden by the object itself under a specific viewpoint. In this work, we show how to improve novel-view synthesis by making use of the correlations observed in 3D models and applying them to new image instances. We propose a technique to use the structural information extracted from a 3D model that matches the image object in terms of viewpoint and shape. For the latter part, we propose an efficient 2D-to-3D alignment method that associates precisely the image appearance with the 3D model geometry with minimal user interaction. Our technique is able to simulate plausible viewpoint changes for a variety of object classes within seconds. Additionally, we show that our synthesized images can be used as additional training data that improves the performance of standard object detectors
Affine Subspace Representation for Feature Description
This paper proposes a novel Affine Subspace Representation (ASR) descriptor
to deal with affine distortions induced by viewpoint changes. Unlike the
traditional local descriptors such as SIFT, ASR inherently encodes local
information of multi-view patches, making it robust to affine distortions while
maintaining a high discriminative ability. To this end, PCA is used to
represent affine-warped patches as PCA-patch vectors for its compactness and
efficiency. Then according to the subspace assumption, which implies that the
PCA-patch vectors of various affine-warped patches of the same keypoint can be
represented by a low-dimensional linear subspace, the ASR descriptor is
obtained by using a simple subspace-to-point mapping. Such a linear subspace
representation could accurately capture the underlying information of a
keypoint (local structure) under multiple views without sacrificing its
distinctiveness. To accelerate the computation of ASR descriptor, a fast
approximate algorithm is proposed by moving the most computational part (ie,
warp patch under various affine transformations) to an offline training stage.
Experimental results show that ASR is not only better than the state-of-the-art
descriptors under various image transformations, but also performs well without
a dedicated affine invariant detector when dealing with viewpoint changes.Comment: To Appear in the 2014 European Conference on Computer Visio
Natural Illumination from Multiple Materials Using Deep Learning
Recovering natural illumination from a single Low-Dynamic Range (LDR) image is a challenging task. To remedy this situation we exploit two properties often found in everyday images. First, images rarely show a single material, but rather multiple ones that all reflect the same illumination. However, the appearance of each material is observed only for some surface orientations, not all. Second, parts of the illumination are often directly observed in the background, without being affected by reflection. Typically, this directly observed part of the illumination is even smaller. We propose a deep Convolutional Neural Network (CNN) that combines prior knowledge about the statistics of illumination and reflectance with an input that makes explicit use of these two observations. Our approach maps multiple partial LDR material observations represented as reflectance maps and a background image to a spherical High-Dynamic Range (HDR) illumination map. For training and testing we propose a new data set comprising of synthetic and real images with multiple materials observed under the same illumination. Qualitative and quantitative evidence shows how both multi-material and using a background are essential to improve illumination estimations
Numerical inversion of SRNFs for efficient elastic shape analysis of star-shaped objects
The elastic shape analysis of surfaces has proven useful in several application areas, including medical image analysis, vision, and graphics. This approach is based on defining new mathematical representations of parameterized surfaces, including the square root normal field (SRNF), and then using the L2 norm to compare their shapes. Past work is based on using the pullback of the L2 metric to the space of surfaces, performing statistical analysis under this induced Riemannian metric. However, if one can estimate the inverse of the SRNF mapping, even approximately, a very efficient framework results: the surfaces, represented by their SRNFs, can be efficiently analyzed using standard Euclidean tools, and only the final results need be mapped back to the surface space. Here we describe a procedure for inverting SRNF maps of star-shaped surfaces, a special case for which analytic results can be obtained. We test our method via the classification of 34 cases of ADHD (Attention Deficit Hyperactivity Disorder), plus controls, in the Detroit Fetal Alcohol and Drug Exposure Cohort study. We obtain state-of-the-art results
3D modeling and registration under wide baseline conditions
During the 90s important progess has been made in the area of structure-from-motion. From a series of closely spaced images a 3D model of the observed scene can now be reconstructed, without knowledge about the subsequent camera positions or settings. From nothing but a video, the camera trajectory and scene shape are extracted. Progress has also been important in the area of structured light techniques. Rather than having to use slow and/or bulky laser scanners, compact one-shot systems have been developed. Upon projection of a pattern onto the scene, its 3D shape and texture can be extracted from a single image. This paper presents recent extensions on both strands, that have a common theme: how to cope with large baseline conditions. In the case of shape-from-video we discuss ways to find correspondences and, hence, extract 3D shapes even when the images are taken far apart. In the case of structured light, the problem solved is how to combine partial 3D patches into complete models, without a good initialisation of their relative poses.
Class Representative Visual Words for Category-Level Object Recognition
Recent works in object recognition often use visual words, i.e. vector quantized local descriptors extracted from the images. In this paper we present a novel method to build such a codebook with class representative vectors. This method, coined Cluster Precision Maximization (CPM), is based on a new measure of the cluster precision and on an optimization procedure that leads any clustering algorithm towards class representative visual words. We compare our procedure with other measures of cluster precision and present the integration of a Reciprocal Nearest Neighbor (RNN) clustering algorithm in the CPM method. In the experiments, on a subset of the the Caltech101 database, we analyze several vocabularies obtained with different local descriptors and different clustering algorithms, and we show that the vocabularies obtained with the CPM process perform best in a category-level object recognition system using a Support Vector Machine (SVM). © 2009 Springer Berlin Heidelberg.López Sastre R.J., Tuytelaars T., Maldonado Bascón S., ''Class representative visual words for category-level object recognition'', Lecture notes in computer science, vol. 5524, 2009 (4th Iberian conference on pattern recognition and image analysis - IbPRAI 2009, June 10-12, 2009, Póvoa de Varzim, Portugal).status: publishe
What Is Around the Camera?
How much does a single image reveal about the environment it was taken in? In this paper, we investigate how much of that information can be retrieved from a foreground object, combined with the background (i.e. the visible part of the environment). Assuming it is not perfectly diffuse, the foreground object acts as a complexly shaped and far-from-perfect mirror An additional challenge is that its appearance confounds the light coming from the environment with the unknown materials it is made of. We propose a learning-based approach to predict the environment from multiple reflectance maps that are computed from approximate surface normals. The proposed method allows us to jointly model the statistics of environments and material properties. We train our system from synthesized training data, but demonstrate its applicability to real-world data. Interestingly, our analysis shows that the information obtained from objects made out of multiple materials often is complementary and leads to better performance
- …