1,156 research outputs found

    VISUAL TRACKING AND ILLUMINATION RECOVERY VIA SPARSE REPRESENTATION

    Get PDF
    Compressive sensing, or sparse representation, has played a fundamental role in many fields of science. It shows that the signals and images can be reconstructed from far fewer measurements than what is usually considered to be necessary. Sparsity leads to efficient estimation, efficient compression, dimensionality reduction, and efficient modeling. Recently, there has been a growing interest in compressive sensing in computer vision and it has been successfully applied to face recognition, background subtraction, object tracking and other problems. Sparsity can be achieved by solving the compressive sensing problem using L1 minimization. In this dissertation, we present the results of a study of applying sparse representation to illumination recovery, object tracking, and simultaneous tracking and recognition. Illumination recovery, also known as inverse lighting, is the problem of recovering an illumination distribution in a scene from the appearance of objects located in the scene. It is used for Augmented Reality, where the virtual objects match the existing image and cast convincing shadows on the real scene rendered with the recovered illumination. Shadows in a scene are caused by the occlusion of incoming light, and thus contain information about the lighting of the scene. Although shadows have been used in determining the 3D shape of the object that casts shadows onto the scene, few studies have focused on the illumination information provided by the shadows. In this dissertation, we recover the illumination of a scene from a single image with cast shadows given the geometry of the scene. The images with cast shadows can be quite complex and therefore cannot be well approximated by low-dimensional linear subspaces. However, in this study we show that the set of images produced by a Lambertian scene with cast shadows can be efficiently represented by a sparse set of images generated by directional light sources. We first model an image with cast shadows as composed of a diffusive part (without cast shadows) and a residual part that captures cast shadows. Then, we express the problem in an L1-regularized least squares formulation, with nonnegativity constraints (as light has to be nonnegative at any point in space). This sparse representation enjoys an effective and fast solution, thanks to recent advances in compressive sensing. In experiments on both synthetic and real data, our approach performs favorably in comparison to several previously proposed methods. Visual tracking, which consistently infers the motion of a desired target in a video sequence, has been an active and fruitful research topic in computer vision for decades. It has many practical applications such as surveillance, human computer interaction, medical imaging and so on. Many challenges to design a robust tracking algorithm come from the enormous unpredictable variations in the target, such as deformations, fast motion, occlusions, background clutter, and lighting changes. To tackle the challenges posed by tracking, we propose a robust visual tracking method by casting tracking as a sparse approximation problem in a particle filter framework. In this framework, occlusion, noise and other challenging issues are addressed seamlessly through a set of trivial templates. Specifically, to find the tracking target at a new frame, each target candidate is sparsely represented in the space spanned by target templates and trivial templates. The sparsity is achieved by solving an L1-regularized least squares problem. Then the candidate with the smallest projection error is taken as the tracking target. After that, tracking is continued using a Bayesian state inference framework in which a particle filter is used for propagating sample distributions over time. Three additional components further improve the robustness of our approach: 1) a velocity incorporated motion model that helps concentrate the samples on the true target location in the next frame, 2) the nonnegativity constraints that help filter out clutter that is similar to tracked targets in reversed intensity patterns, and 3) a dynamic template update scheme that keeps track of the most representative templates throughout the tracking procedure. We test the proposed approach on many challenging sequences involving heavy occlusions, drastic illumination changes, large scale changes, non-rigid object movement, out-of-plane rotation, and large pose variations. The proposed approach shows excellent performance in comparison with four previously proposed trackers. We also extend the work to simultaneous tracking and recognition in vehicle classification in IR video sequences. We attempt to resolve the uncertainties in tracking and recognition at the same time by introducing a static template set that stores target images in various conditions such as different poses, lighting, and so on. The recognition results at each frame are propagated to produce the final result for the whole video. The tracking result is evaluated at each frame and low confidence in tracking performance initiates a new cycle of tracking and classification. We demonstrate the robustness of the proposed method on vehicle tracking and classification using outdoor IR video sequences

    Probeless Illumination Estimation for Outdoor Augmented Reality

    Get PDF

    Robust Principal Component Analysis?

    Full text link
    This paper is about a curious phenomenon. Suppose we have a data matrix, which is the superposition of a low-rank component and a sparse component. Can we recover each component individually? We prove that under some suitable assumptions, it is possible to recover both the low-rank and the sparse components exactly by solving a very convenient convex program called Principal Component Pursuit; among all feasible decompositions, simply minimize a weighted combination of the nuclear norm and of the L1 norm. This suggests the possibility of a principled approach to robust principal component analysis since our methodology and results assert that one can recover the principal components of a data matrix even though a positive fraction of its entries are arbitrarily corrupted. This extends to the situation where a fraction of the entries are missing as well. We discuss an algorithm for solving this optimization problem, and present applications in the area of video surveillance, where our methodology allows for the detection of objects in a cluttered background, and in the area of face recognition, where it offers a principled way of removing shadows and specularities in images of faces

    Geometry and Photometry in 3D Visual Recognition

    Get PDF
    The report addresses the problem of visual recognition under two sources of variability: geometric and photometric. The geometric deals with the relation between 3D objects and their views under orthographic and perspective projection. The photometric deals with the relation between 3D matte objects and their images under changing illumination conditions. Taken together, an alignment-based method is presented for recognizing objects viewed from arbitrary viewing positions and illuminated by arbitrary settings of light sources

    User-Assisted Image Shadow Removal

    Get PDF
    This paper presents a novel user-aided method for texture-preserving shadow removal from single images requiring simple user input. Compared with the state-of-the-art, our algorithm offers the most flexible user interaction to date and produces more accurate and robust shadow removal under thorough quantitative evaluation. Shadow masks are first detected by analysing user specified shadow feature strokes. Sample intensity profiles with variable interval and length around the shadow boundary are detected next, which avoids artefacts raised from uneven boundaries. Texture noise in samples is then removed by applying local group bilateral filtering, and initial sparse shadow scales are estimated by fitting a piece-wise curve to intensity samples. The remaining errors in estimated sparse scales are removed by local group smoothing. To relight the image, a dense scale field is produced by in-painting the sparse scales. Finally, a gradual colour correction is applied to remove artefacts due to image post-processing. Using state-of-the-art evaluation data, we quantitatively and qualitatively demonstrate our method to outperform current leading shadow removal methods

    Estimating varying illuminant colours in images

    Get PDF
    Colour Constancy is the ability to perceive colours independently of varying illumi-nation colour. A human could tell that a white t-shirt was indeed white, even under the presence of blue or red illumination. These illuminant colours would actually make the reflectance colour of the t-shirt bluish or reddish. Humans can, to a good extent, see colours constantly. Getting a computer to achieve the same goal, with a high level of accuracy has proven problematic. Particularly if we wanted to use colour as a main cue in object recognition. If we trained a system on object colours under one illuminant and then tried to recognise the objects under another illuminant, the system would likely fail. Early colour constancy algorithms assumed that an image contains a single uniform illuminant. They would then attempt to estimate the colour of the illuminant to apply a single correction to the entire image. It’s not hard to imagine a scenario where a scene is lit by more than one illuminant. If we take the case of an outdoors scene on a typical summers day, we would see objects brightly lit by sunlight and others that are in shadow. The ambient light in shadows is known to be a different colour to that of direct sunlight (bluish and yellowish respectively). This means that there are at least two illuminant colours to be recovered in this scene. This thesis focuses on the harder case of recovering the illuminant colours when more than one are present in a scene. Early work on this subject made the empirical observation that illuminant colours are actually very predictable compared to surface colours. Real-world illuminants tend not to be greens or purples, but rather blues, yellows and reds. We can think of an illuminant mapping as the function which takes a scene from some unknown illuminant to a known illuminant. We model this mapping as a simple multiplication of the Red, Green and Blue channels of a pixel. It turns out that the set of realistic mappings approximately lies on a line segment in chromaticity space. We propose an algorithm that uses this knowledge and only requires two pixels of the same surface under two illuminants as input. We can then recover an estimate for the surface reflectance colour, and subsequently the two illuminants. Additionally in this thesis, we propose a more robust algorithm that can use vary-ing surface reflectance data in a scene. One of the most successful colour constancy algorithms, known Gamut Mappping, was developed by Forsyth (1990). He argued that the illuminant colour of a scene naturally constrains the surfaces colours that are possible to perceive. We couldn’t perceive a very chromatic red under a deep blue illuminant. We introduce our multiple illuminant constraint in a Gamut Mapping context and are able to further improve it’s performance. The final piece of work proposes a method for detecting shadow-edges, so that we can automatically recover estimates for the illuminant colours in and out of shadow. We also formulate our illuminant estimation algorithm in a voting scheme, that probabilistically chooses an illuminant estimate on both sides of the shadow edge. We test the performance of all our algorithms experimentally on well known datasets, as well as our new proposed shadow datasets

    Synthesis of environment maps for mixed reality

    Get PDF
    When rendering virtual objects in a mixed reality application, it is helpful to have access to an environment map that captures the appearance of the scene from the perspective of the virtual object. It is straightforward to render virtual objects into such maps, but capturing and correctly rendering the real components of the scene into the map is much more challenging. This information is often recovered from physical light probes, such as reflective spheres or fisheye cameras, placed at the location of the virtual object in the scene. For many application areas, however, real light probes would be intrusive or impractical. Ideally, all of the information necessary to produce detailed environment maps could be captured using a single device. We introduce a method using an RGBD camera and a small fisheye camera, contained in a single unit, to create environment maps at any location in an indoor scene. The method combines the output from both cameras to correct for their limited field of view and the displacement from the virtual object, producing complete environment maps suitable for rendering the virtual content in real time. Our method improves on previous probeless approaches by its ability to recover high-frequency environment maps. We demonstrate how this can be used to render virtual objects which shadow, reflect and refract their environment convincingly

    Cast shadow segmentation using invariant colour features

    Get PDF
    Shadows are integral parts of natural scenes and one of the elements contributing to naturalness of synthetic scenes. In many image analysis and interpretation applications, shadows interfere with fundamental tasks such as object extraction and description. For this reason, shadow segmentation is an important step in image analysis. In this paper, we propose a new cast shadow segmentation algorithm for both still and moving images. The proposed technique exploits spectral and geometrical properties of shadows in a scene to perform this task. The presence of a shadow is first hypothesized with an initial and simple evidence based on the fact that shadows darken the surface which they are cast upon. The validity of detected regions as shadows is further verified by making use of more complex hypotheses on color invariance and geometric properties of shadows. Finally, an information integration stage confirms or rejects the initial hypothesis for every detected region. Simulation results show that the proposed algorithm is robust and efficient in detecting shadows for a large class of scenes

    Learning geometric and lighting priors from natural images

    Get PDF
    Comprendre les images est d’une importance cruciale pour une pléthore de tâches, de la composition numérique au ré-éclairage d’une image, en passant par la reconstruction 3D d’objets. Ces tâches permettent aux artistes visuels de réaliser des chef-d’oeuvres ou d’aider des opérateurs à prendre des décisions de façon sécuritaire en fonction de stimulis visuels. Pour beaucoup de ces tâches, les modèles physiques et géométriques que la communauté scientifique a développés donnent lieu à des problèmes mal posés possédant plusieurs solutions, dont généralement une seule est raisonnable. Pour résoudre ces indéterminations, le raisonnement sur le contexte visuel et sémantique d’une scène est habituellement relayé à un artiste ou un expert qui emploie son expérience pour réaliser son travail. Ceci est dû au fait qu’il est généralement nécessaire de raisonner sur la scène de façon globale afin d’obtenir des résultats plausibles et appréciables. Serait-il possible de modéliser l’expérience à partir de données visuelles et d’automatiser en partie ou en totalité ces tâches ? Le sujet de cette thèse est celui-ci : la modélisation d’a priori par apprentissage automatique profond pour permettre la résolution de problèmes typiquement mal posés. Plus spécifiquement, nous couvrirons trois axes de recherche, soient : 1) la reconstruction de surface par photométrie, 2) l’estimation d’illumination extérieure à partir d’une seule image et 3) l’estimation de calibration de caméra à partir d’une seule image avec un contenu générique. Ces trois sujets seront abordés avec une perspective axée sur les données. Chacun de ces axes comporte des analyses de performance approfondies et, malgré la réputation d’opacité des algorithmes d’apprentissage machine profonds, nous proposons des études sur les indices visuels captés par nos méthodes.Understanding images is needed for a plethora of tasks, from compositing to image relighting, including 3D object reconstruction. These tasks allow artists to realize masterpieces or help operators to safely make decisions based on visual stimuli. For many of these tasks, the physical and geometric models that the scientific community has developed give rise to ill-posed problems with several solutions, only one of which is generally reasonable. To resolve these indeterminations, the reasoning about the visual and semantic context of a scene is usually relayed to an artist or an expert who uses his experience to carry out his work. This is because humans are able to reason globally on the scene in order to obtain plausible and appreciable results. Would it be possible to model this experience from visual data and partly or totally automate tasks? This is the topic of this thesis: modeling priors using deep machine learning to solve typically ill-posed problems. More specifically, we will cover three research axes: 1) surface reconstruction using photometric cues, 2) outdoor illumination estimation from a single image and 3) camera calibration estimation from a single image with generic content. These three topics will be addressed from a data-driven perspective. Each of these axes includes in-depth performance analyses and, despite the reputation of opacity of deep machine learning algorithms, we offer studies on the visual cues captured by our methods
    • …
    corecore