456 research outputs found

    ShadingNet: Image Intrinsics by Fine-Grained Shading Decomposition

    Get PDF
    In general, intrinsic image decomposition algorithms interpret shading as one unified component including all photometric effects. As shading transitions are generally smoother than reflectance (albedo) changes, these methods may fail in distinguishing strong photometric effects from reflectance variations. Therefore, in this paper, we propose to decompose the shading component into direct (illumination) and indirect shading (ambient light and shadows) subcomponents. The aim is to distinguish strong photometric effects from reflectance variations. An end-to-end deep convolutional neural network (ShadingNet) is proposed that operates in a fine-to-coarse manner with a specialized fusion and refinement unit exploiting the fine-grained shading model. It is designed to learn specific reflectance cues separated from specific photometric effects to analyze the disentanglement capability. A large-scale dataset of scene-level synthetic images of outdoor natural environments is provided with fine-grained intrinsic image ground-truths. Large scale experiments show that our approach using fine-grained shading decompositions outperforms state-of-the-art algorithms utilizing unified shading on NED, MPI Sintel, GTA V, IIW, MIT Intrinsic Images, 3DRMS and SRD datasets.Comment: Submitted to International Journal of Computer Vision (IJCV

    A Dataset of Multi-Illumination Images in the Wild

    Full text link
    Collections of images under a single, uncontrolled illumination have enabled the rapid advancement of core computer vision tasks like classification, detection, and segmentation. But even with modern learning techniques, many inverse problems involving lighting and material understanding remain too severely ill-posed to be solved with single-illumination datasets. To fill this gap, we introduce a new multi-illumination dataset of more than 1000 real scenes, each captured under 25 lighting conditions. We demonstrate the richness of this dataset by training state-of-the-art models for three challenging applications: single-image illumination estimation, image relighting, and mixed-illuminant white balance.Comment: ICCV 201

    Free-viewpoint Indoor Neural Relighting from Multi-view Stereo

    Get PDF
    We introduce a neural relighting algorithm for captured indoors scenes, that allows interactive free-viewpoint navigation. Our method allows illumination to be changed synthetically, while coherently rendering cast shadows and complex glossy materials. We start with multiple images of the scene and a 3D mesh obtained by multi-view stereo (MVS) reconstruction. We assume that lighting is well-explained as the sum of a view-independent diffuse component and a view-dependent glossy term concentrated around the mirror reflection direction. We design a convolutional network around input feature maps that facilitate learning of an implicit representation of scene materials and illumination, enabling both relighting and free-viewpoint navigation. We generate these input maps by exploiting the best elements of both image-based and physically-based rendering. We sample the input views to estimate diffuse scene irradiance, and compute the new illumination caused by user-specified light sources using path tracing. To facilitate the network's understanding of materials and synthesize plausible glossy reflections, we reproject the views and compute mirror images. We train the network on a synthetic dataset where each scene is also reconstructed with MVS. We show results of our algorithm relighting real indoor scenes and performing free-viewpoint navigation with complex and realistic glossy reflections, which so far remained out of reach for view-synthesis techniques

    OutCast: Outdoor Single-image Relighting with Cast Shadows

    Full text link
    We propose a relighting method for outdoor images. Our method mainly focuses on predicting cast shadows in arbitrary novel lighting directions from a single image while also accounting for shading and global effects such the sun light color and clouds. Previous solutions for this problem rely on reconstructing occluder geometry, e.g. using multi-view stereo, which requires many images of the scene. Instead, in this work we make use of a noisy off-the-shelf single-image depth map estimation as a source of geometry. Whilst this can be a good guide for some lighting effects, the resulting depth map quality is insufficient for directly ray-tracing the shadows. Addressing this, we propose a learned image space ray-marching layer that converts the approximate depth map into a deep 3D representation that is fused into occlusion queries using a learned traversal. Our proposed method achieves, for the first time, state-of-the-art relighting results, with only a single image as input. For supplementary material visit our project page at: https://dgriffiths.uk/outcast.Comment: Eurographics 2022 - Accepte

    Inverse rendering techniques for physically grounded image editing

    Get PDF
    From a single picture of a scene, people can typically grasp the spatial layout immediately and even make good guesses at materials properties and where light is coming from to illuminate the scene. For example, we can reliably tell which objects occlude others, what an object is made of and its rough shape, regions that are illuminated or in shadow, and so on. It is interesting how little is known about our ability to make these determinations; as such, we are still not able to robustly "teach" computers to make the same high-level observations as people. This document presents algorithms for understanding intrinsic scene properties from single images. The goal of these inverse rendering techniques is to estimate the configurations of scene elements (geometry, materials, luminaires, camera parameters, etc) using only information visible in an image. Such algorithms have applications in robotics and computer graphics. One such application is in physically grounded image editing: photo editing made easier by leveraging knowledge of the physical space. These applications allow sophisticated editing operations to be performed in a matter of seconds, enabling seamless addition, removal, or relocation of objects in images

    Adaptive Vision Based Scene Registration for Outdoor Augmented Reality

    Get PDF
    Augmented Reality (AR) involves adding virtual content into real scenes. Scenes are viewed using a Head-Mounted Display or other display type. In order to place content into the user's view of a scene, the user's position and orientation relative to the scene, commonly referred to as their pose, must be determined accurately. This allows the objects to be placed in the correct positions and to remain there when the user moves or the scene changes. It is achieved by tracking the user in relation to their environment using a variety of technology. One technology which has proven to provide accurate results is computer vision. Computer vision involves a computer analysing images and achieving an understanding of them. This may be locating objects such as faces in the images, or in the case of AR, determining the pose of the user. One of the ultimate goals of AR systems is to be capable of operating under any condition. For example, a computer vision system must be robust under a range of different scene types, and under unpredictable environmental conditions due to variable illumination and weather. The majority of existing literature tests algorithms under the assumption of ideal or 'normal' imaging conditions. To ensure robustness under as many circumstances as possible it is also important to evaluate the systems under adverse conditions. This thesis seeks to analyse the effects that variable illumination has on computer vision algorithms. To enable this analysis, test data is required to isolate weather and illumination effects, without other factors such as changes in viewpoint that would bias the results. A new dataset is presented which also allows controlled viewpoint differences in the presence of weather and illumination changes. This is achieved by capturing video from a camera undergoing a repeatable motion sequence. Ground truth data is stored per frame allowing images from the same position under differing environmental conditions, to be easily extracted from the videos. An in depth analysis of six detection algorithms and five matching techniques demonstrates the impact that non-uniform illumination changes can have on vision algorithms. Specifically, shadows can degrade performance and reduce confidence in the system, decrease reliability, or even completely prevent successful operation. An investigation into approaches to improve performance yields techniques that can help reduce the impact of shadows. A novel algorithm is presented that merges reference data captured at different times, resulting in reference data with minimal shadow effects. This can significantly improve performance and reliability when operating on images containing shadow effects. These advances improve the robustness of computer vision systems and extend the range of conditions in which they can operate. This can increase the usefulness of the algorithms and the AR systems that employ them

    Applying Augmented Reality to Outdoors Industrial Use

    Get PDF
    Augmented Reality (AR) is currently gaining popularity in multiple different fields. However, the technology for AR still requires development in both hardware and software when considering industrial use. In order to create immersive AR applications, more accurate pose estimation techniques to define virtual camera location are required. The algorithms for pose estimation often require a lot of processing power, which makes robust pose estimation a difficult task when using mobile devices or designated AR tools. The difficulties are even larger in outdoor scenarios where the environment can vary a lot and is often unprepared for AR. This thesis aims to research different possibilities for creating AR applications for outdoor environments. Both hardware and software solutions are considered, but the focus is more on software. The majority of the thesis focuses on different visual pose estimation and tracking techniques for natural features. During the thesis, multiple different solutions were tested for outdoor AR. One commercial AR SDK was tested, and three different custom software solutions were developed for an Android tablet. The custom software solutions were an algorithm for combining data from magnetometer and a gyroscope, a natural feature tracker and a tracker based on panorama images. The tracker based on panorama images was implemented based on an existing scientific publication, and the presented tracker was further developed by integrating it to Unity 3D and adding a possibility for augmenting content. This thesis concludes that AR is very close to becoming a usable tool for professional use. The commercial solutions currently available are not yet ready for creating tools for professional use, but especially for different visualization tasks some custom solutions are capable of achieving a required robustness. The panorama tracker implemented in this thesis seems like a promising tool for robust pose estimation in unprepared outdoor environments.LisÀtyn todellisuuden suosio on tÀllÀ hetkellÀ kasvamassa usealla eri alalla. Saatavilla olevat ohjelmistot sekÀ laitteet eivÀt vielÀ riitÀ lisÀtyn todellisuuden soveltamiseen ammattimaisessa kÀytössÀ. Erityisesti posen estimointi vaatii tarkempia menetelmiÀ, jotta immersiivisten lisÀtyn todellisuuden sovellusten kehittÀminen olisi mahdollista. Posen estimointiin (laitteen asennon- sekÀ paikan arviointiin) kÀytetyt algoritmit ovat usein monimutkaisia, joten ne vaativat merkittÀvÀsti laskentatehoa. Laskentatehon vaatimukset ovat usein haasteellisia varsinkin mobiililaitteita sekÀ lisÀtyn todellisuuden laitteita kÀytettÀessÀ. LisÀongelmia tuottaa myös ulkotilat, jossa ympÀristö voi muuttua usein ja ympÀristöÀ ei ole valmisteltu lisÀtyn todellisuuden sovelluksille. Diplomityön tarkoituksena on tutkia mahdollisuuksia lisÀtyn todellisuuden sovellusten kehittÀmiseen ulkotiloihin. SekÀ laitteisto- ettÀ ohjelmistopohjaisia ratkaisuja kÀsitellÀÀn. Ohjelmistopohjaisia ratkaisuja kÀsitellÀÀn työssÀ laitteistopohjaisia ratkaisuja laajemmin. Suurin osa diplomityöstÀ keskittyy erilaisiin visuaalisiin posen estimointi tekniikoihin, jotka perustuvat kuvasta tunnistettujen luonnollisten piirteiden seurantaan. Työn aikana testattiin useita ratkaisuja ulkotiloihin soveltuvaan lisÀttyyn todellisuuteen. YhtÀ kaupallista työkalua testattiin, jonka lisÀksi toteutettiin kolme omaa sovellusta Android tableteille. Työn aikana kehitetyt sovellukset olivat yksinkertainen algoritmi gyroskoopin ja magnetometrin datan yhdistÀmiseen, luonnollisen piirteiden seuranta-algoritmi sekÀ panoraamakuvaan perustuva seuranta-algoritmi. Panoraamakuvaan perustuva seuranta-algoritmi on toteuteutettu toisen tieteellisen julkaisun pohjalta, ja algoritmia jatkokehitettiin integroimalla se Unity 3D:hen. Unity 3D-integrointi mahdollisti myös sisÀllön esittÀmisen lisÀtyn todellisuuden avulla. Työn lopputuloksena todetaan, ettÀ lisÀtyn todellisuuden teknologia on lÀhellÀ pistettÀ, jossa lisÀtyn todellisuuden työkaluja voitaisiin kÀyttÀÀ ammattimaisessa kÀytössÀ. TÀllÀ hetkellÀ saatavilla olevat kaupalliset työkalut eivÀt vielÀ pÀÀse ammattikÀytön vaatimalle tasolle, mutta erityisesti visualisointitehtÀviin soveltuvia ei-kaupallisia ratkaisuja on jo olemassa. LisÀksi työn aikana toteutetun panoraamakuviin perustuvan seuranta-algoritmin todetaan olevan lupaava työkalu posen estimointiin ulkotiloissa.Siirretty Doriast

    Learning geometric and lighting priors from natural images

    Get PDF
    Comprendre les images est d’une importance cruciale pour une plĂ©thore de tĂąches, de la composition numĂ©rique au rĂ©-Ă©clairage d’une image, en passant par la reconstruction 3D d’objets. Ces tĂąches permettent aux artistes visuels de rĂ©aliser des chef-d’oeuvres ou d’aider des opĂ©rateurs Ă  prendre des dĂ©cisions de façon sĂ©curitaire en fonction de stimulis visuels. Pour beaucoup de ces tĂąches, les modĂšles physiques et gĂ©omĂ©triques que la communautĂ© scientifique a dĂ©veloppĂ©s donnent lieu Ă  des problĂšmes mal posĂ©s possĂ©dant plusieurs solutions, dont gĂ©nĂ©ralement une seule est raisonnable. Pour rĂ©soudre ces indĂ©terminations, le raisonnement sur le contexte visuel et sĂ©mantique d’une scĂšne est habituellement relayĂ© Ă  un artiste ou un expert qui emploie son expĂ©rience pour rĂ©aliser son travail. Ceci est dĂ» au fait qu’il est gĂ©nĂ©ralement nĂ©cessaire de raisonner sur la scĂšne de façon globale afin d’obtenir des rĂ©sultats plausibles et apprĂ©ciables. Serait-il possible de modĂ©liser l’expĂ©rience Ă  partir de donnĂ©es visuelles et d’automatiser en partie ou en totalitĂ© ces tĂąches ? Le sujet de cette thĂšse est celui-ci : la modĂ©lisation d’a priori par apprentissage automatique profond pour permettre la rĂ©solution de problĂšmes typiquement mal posĂ©s. Plus spĂ©cifiquement, nous couvrirons trois axes de recherche, soient : 1) la reconstruction de surface par photomĂ©trie, 2) l’estimation d’illumination extĂ©rieure Ă  partir d’une seule image et 3) l’estimation de calibration de camĂ©ra Ă  partir d’une seule image avec un contenu gĂ©nĂ©rique. Ces trois sujets seront abordĂ©s avec une perspective axĂ©e sur les donnĂ©es. Chacun de ces axes comporte des analyses de performance approfondies et, malgrĂ© la rĂ©putation d’opacitĂ© des algorithmes d’apprentissage machine profonds, nous proposons des Ă©tudes sur les indices visuels captĂ©s par nos mĂ©thodes.Understanding images is needed for a plethora of tasks, from compositing to image relighting, including 3D object reconstruction. These tasks allow artists to realize masterpieces or help operators to safely make decisions based on visual stimuli. For many of these tasks, the physical and geometric models that the scientific community has developed give rise to ill-posed problems with several solutions, only one of which is generally reasonable. To resolve these indeterminations, the reasoning about the visual and semantic context of a scene is usually relayed to an artist or an expert who uses his experience to carry out his work. This is because humans are able to reason globally on the scene in order to obtain plausible and appreciable results. Would it be possible to model this experience from visual data and partly or totally automate tasks? This is the topic of this thesis: modeling priors using deep machine learning to solve typically ill-posed problems. More specifically, we will cover three research axes: 1) surface reconstruction using photometric cues, 2) outdoor illumination estimation from a single image and 3) camera calibration estimation from a single image with generic content. These three topics will be addressed from a data-driven perspective. Each of these axes includes in-depth performance analyses and, despite the reputation of opacity of deep machine learning algorithms, we offer studies on the visual cues captured by our methods

    Computer vision models in surveillance robotics

    Get PDF
    2009/2010In questa Tesi, abbiamo sviluppato algoritmi che usano l’informazione visiva per eseguire, in tempo reale, individuazione, riconoscimento e classificazione di oggetti in movimento, indipendentemente dalle condizioni ambientali e con l’accurattezza migliore. A tal fine, abbiamo sviluppato diversi concetti di visione artificial, cioù l'identificazione degli oggetti di interesse in tutta la scena visiva (monoculare o stereo), e la loro classificazione. Nel corso della ricerca, sono stati provati diversi approcci, inclusa l’individuazione di possibili candidati tramite la segmentazione di immagini con classificatori deboli e centroidi, algoritmi per la segmentazione di immagini rafforzate tramite informazioni stereo e riduzione del rumore, combinazione di popolari caratteristiche quali quelle invarianti a fattori di scala (SIFT) combinate con informazioni di distanza. Abbiamo sviluppato due grandi categorie di soluzioni associate al tipo di sistema usato. Con camera mobile, abbiamo favorito l’individuazione di oggetti conosciuti tramite scansione dell’immagine; con camera fissa abbiamo anche utilizzato algoritmi per l’individuazione degli oggetti in primo piano ed in movimento (foreground detection). Nel caso di “foreground detection”, il tasso di individuazione e classificazione aumenta se la qualita’ degli oggetti estratti e’ alta. Noi proponiamo metodi per ridurre gli effetti dell’ombra, illuminazione e movimenti ripetitivi prodotti dagli oggetti in movimento. Un aspetto importante studiato e’ la possibilita’ di usare algoritmi per l’individuazione di oggetti in movimento tramite camera mobile. Soluzioni efficienti stanno diventando sempre piu’ complesse, ma anche gli strumenti di calcolo per elaborare gli algoritmi sono piu’ potenti e negli anni recenti, le architetture delle schede video (GPU) offrono un grande potenziale. Abbiamo proposto una soluzione per architettura GPU di una gestione delle immagini di sfondo, al fine di aumentare le prestazioni di individuazione. In questa Tesi abbiamo studiato l’individuazione ed inseguimento di persone for applicazioni come la prevenzione di situazione di rischio (attraversamento delle strade), e conteggio per l’analisi del traffico. Noi abbiamo studiato questi problemi ed esplorato vari aspetti dell’individuazione delle persone, gruppi ed individuazione in scenari affollati. Comunque, in un ambiente generico, e’ impossibile predire la configurazione di oggetti che saranno catturati dalla telecamera. In questi casi, e’ richiesto di “astrarre il concetto” di oggetti. Con questo requisito in mente, abbiamo esplorato le proprieta’ dei metodi stocastici e mostrano che buoni tassi di classificazione possono essere ottenuti a condizione che l’insieme di addestramento sia abbastanza grande. Una struttura flessibile deve essere in grado di individuare le regioni in movimento e riconoscere gli oggetti di interesse. Abbiamo sviluppato una struttura per la gestione dei problemi di individuazione e classificazione. Rispetto ad altri metodi, i metodi proposti offrono una struttura flessibile per l’individuazione e classificazione degli oggetti, e che puo’ essere usata in modo efficiente in diversi ambienti interni ed esterni.XXII Cicl
    • 

    corecore