456 research outputs found
ShadingNet: Image Intrinsics by Fine-Grained Shading Decomposition
In general, intrinsic image decomposition algorithms interpret shading as one
unified component including all photometric effects. As shading transitions are
generally smoother than reflectance (albedo) changes, these methods may fail in
distinguishing strong photometric effects from reflectance variations.
Therefore, in this paper, we propose to decompose the shading component into
direct (illumination) and indirect shading (ambient light and shadows)
subcomponents. The aim is to distinguish strong photometric effects from
reflectance variations. An end-to-end deep convolutional neural network
(ShadingNet) is proposed that operates in a fine-to-coarse manner with a
specialized fusion and refinement unit exploiting the fine-grained shading
model. It is designed to learn specific reflectance cues separated from
specific photometric effects to analyze the disentanglement capability. A
large-scale dataset of scene-level synthetic images of outdoor natural
environments is provided with fine-grained intrinsic image ground-truths. Large
scale experiments show that our approach using fine-grained shading
decompositions outperforms state-of-the-art algorithms utilizing unified
shading on NED, MPI Sintel, GTA V, IIW, MIT Intrinsic Images, 3DRMS and SRD
datasets.Comment: Submitted to International Journal of Computer Vision (IJCV
A Dataset of Multi-Illumination Images in the Wild
Collections of images under a single, uncontrolled illumination have enabled
the rapid advancement of core computer vision tasks like classification,
detection, and segmentation. But even with modern learning techniques, many
inverse problems involving lighting and material understanding remain too
severely ill-posed to be solved with single-illumination datasets. To fill this
gap, we introduce a new multi-illumination dataset of more than 1000 real
scenes, each captured under 25 lighting conditions. We demonstrate the richness
of this dataset by training state-of-the-art models for three challenging
applications: single-image illumination estimation, image relighting, and
mixed-illuminant white balance.Comment: ICCV 201
Free-viewpoint Indoor Neural Relighting from Multi-view Stereo
We introduce a neural relighting algorithm for captured indoors scenes, that
allows interactive free-viewpoint navigation. Our method allows illumination to
be changed synthetically, while coherently rendering cast shadows and complex
glossy materials. We start with multiple images of the scene and a 3D mesh
obtained by multi-view stereo (MVS) reconstruction. We assume that lighting is
well-explained as the sum of a view-independent diffuse component and a
view-dependent glossy term concentrated around the mirror reflection direction.
We design a convolutional network around input feature maps that facilitate
learning of an implicit representation of scene materials and illumination,
enabling both relighting and free-viewpoint navigation. We generate these input
maps by exploiting the best elements of both image-based and physically-based
rendering. We sample the input views to estimate diffuse scene irradiance, and
compute the new illumination caused by user-specified light sources using path
tracing. To facilitate the network's understanding of materials and synthesize
plausible glossy reflections, we reproject the views and compute mirror images.
We train the network on a synthetic dataset where each scene is also
reconstructed with MVS. We show results of our algorithm relighting real indoor
scenes and performing free-viewpoint navigation with complex and realistic
glossy reflections, which so far remained out of reach for view-synthesis
techniques
OutCast: Outdoor Single-image Relighting with Cast Shadows
We propose a relighting method for outdoor images. Our method mainly focuses
on predicting cast shadows in arbitrary novel lighting directions from a single
image while also accounting for shading and global effects such the sun light
color and clouds. Previous solutions for this problem rely on reconstructing
occluder geometry, e.g. using multi-view stereo, which requires many images of
the scene. Instead, in this work we make use of a noisy off-the-shelf
single-image depth map estimation as a source of geometry. Whilst this can be a
good guide for some lighting effects, the resulting depth map quality is
insufficient for directly ray-tracing the shadows. Addressing this, we propose
a learned image space ray-marching layer that converts the approximate depth
map into a deep 3D representation that is fused into occlusion queries using a
learned traversal. Our proposed method achieves, for the first time,
state-of-the-art relighting results, with only a single image as input. For
supplementary material visit our project page at:
https://dgriffiths.uk/outcast.Comment: Eurographics 2022 - Accepte
Inverse rendering techniques for physically grounded image editing
From a single picture of a scene, people can typically grasp the spatial layout immediately and even make good guesses at materials properties and where light is coming from to illuminate the scene. For example, we can reliably tell which objects occlude others, what an object is made of and its rough shape, regions that are illuminated or in shadow, and so on. It is interesting how little is known about our ability to make these determinations; as such, we are still not able to robustly "teach" computers to make the same high-level observations as people.
This document presents algorithms for understanding intrinsic scene properties from single images. The goal of these inverse rendering techniques is to estimate the configurations of scene elements (geometry, materials, luminaires, camera parameters, etc) using only information visible in an image. Such algorithms have applications in robotics and computer graphics. One such application is in physically grounded image editing: photo editing made easier by leveraging knowledge of the physical space. These applications allow sophisticated editing operations to be performed in a matter of seconds, enabling seamless addition, removal, or relocation of objects in images
Adaptive Vision Based Scene Registration for Outdoor Augmented Reality
Augmented Reality (AR) involves adding virtual content into real scenes. Scenes are viewed using a Head-Mounted Display or other display type. In
order to place content into the user's view of a scene, the user's position and orientation relative to the scene, commonly referred to as their pose, must be determined accurately. This allows the objects to be placed in the correct positions and to remain there when the user moves or the scene changes. It is achieved by tracking the user in relation to their environment using a variety of technology. One technology which has proven to provide accurate results is computer vision. Computer vision involves a computer
analysing images and achieving an understanding of them. This may be locating objects such as faces in the images, or in the case of AR, determining the pose of the user.
One of the ultimate goals of AR systems is to be capable of operating under any condition. For example, a computer vision system must be robust under a range of different scene types, and under unpredictable environmental conditions due to variable illumination and weather. The majority of existing literature tests algorithms under the assumption of ideal or 'normal' imaging conditions. To ensure robustness under as many circumstances as possible it is also important to evaluate the systems under adverse conditions.
This thesis seeks to analyse the effects that variable illumination has on computer vision algorithms. To enable this analysis, test data is required to isolate weather and illumination effects, without other factors such as changes in viewpoint that would bias the results. A new dataset is presented which also allows controlled viewpoint differences in the presence of weather and illumination changes. This is achieved by capturing video from a camera undergoing a repeatable motion sequence. Ground truth data is stored per frame allowing images from the same position under differing environmental conditions, to be easily extracted from the
videos.
An in depth analysis of six detection algorithms and five matching techniques demonstrates the impact that non-uniform illumination changes can have on vision algorithms. Specifically, shadows can degrade performance and reduce confidence in the system, decrease reliability, or even completely prevent successful operation.
An investigation into approaches to improve performance yields techniques that can help reduce the impact of shadows. A novel algorithm is presented that merges reference data captured at different times, resulting in reference data with minimal shadow effects. This can significantly improve performance and reliability when operating on images containing shadow effects. These advances improve the robustness of computer vision systems and extend the range of conditions in which they can operate. This can increase the usefulness of the algorithms and the AR systems that employ them
Applying Augmented Reality to Outdoors Industrial Use
Augmented Reality (AR) is currently gaining popularity in multiple different fields. However, the technology for AR still requires development in both hardware and software when considering industrial use. In order to create immersive AR applications, more accurate pose estimation techniques to define virtual camera location are required. The algorithms for pose estimation often require a lot of processing power, which makes robust pose estimation a difficult task when using mobile devices or designated AR tools. The difficulties are even larger in outdoor scenarios where the environment can vary a lot and is often unprepared for AR.
This thesis aims to research different possibilities for creating AR applications for outdoor environments. Both hardware and software solutions are considered, but the focus is more on software. The majority of the thesis focuses on different visual pose estimation and tracking techniques for natural features.
During the thesis, multiple different solutions were tested for outdoor AR. One commercial AR SDK was tested, and three different custom software solutions were developed for an Android tablet. The custom software solutions were an algorithm for combining data from magnetometer and a gyroscope, a natural feature tracker and a tracker based on panorama images. The tracker based on panorama images was implemented based on an existing scientific publication, and the presented tracker was further developed by integrating it to Unity 3D and adding a possibility for augmenting content.
This thesis concludes that AR is very close to becoming a usable tool for professional use. The commercial solutions currently available are not yet ready for creating tools for professional use, but especially for different visualization tasks some custom solutions are capable of achieving a required robustness. The panorama tracker implemented in this thesis seems like a promising tool for robust pose estimation in unprepared outdoor environments.LisÀtyn todellisuuden suosio on tÀllÀ hetkellÀ kasvamassa usealla eri alalla. Saatavilla olevat ohjelmistot sekÀ laitteet eivÀt vielÀ riitÀ lisÀtyn todellisuuden soveltamiseen ammattimaisessa kÀytössÀ. Erityisesti posen estimointi vaatii tarkempia menetelmiÀ, jotta immersiivisten lisÀtyn todellisuuden sovellusten kehittÀminen olisi mahdollista. Posen estimointiin (laitteen asennon- sekÀ paikan arviointiin) kÀytetyt algoritmit ovat usein monimutkaisia, joten ne vaativat merkittÀvÀsti laskentatehoa. Laskentatehon vaatimukset ovat usein haasteellisia varsinkin mobiililaitteita sekÀ lisÀtyn todellisuuden laitteita kÀytettÀessÀ. LisÀongelmia tuottaa myös ulkotilat, jossa ympÀristö voi muuttua usein ja ympÀristöÀ ei ole valmisteltu lisÀtyn todellisuuden sovelluksille.
Diplomityön tarkoituksena on tutkia mahdollisuuksia lisÀtyn todellisuuden sovellusten kehittÀmiseen ulkotiloihin. SekÀ laitteisto- ettÀ ohjelmistopohjaisia ratkaisuja kÀsitellÀÀn. Ohjelmistopohjaisia ratkaisuja kÀsitellÀÀn työssÀ laitteistopohjaisia ratkaisuja laajemmin. Suurin osa diplomityöstÀ keskittyy erilaisiin visuaalisiin posen estimointi tekniikoihin, jotka perustuvat kuvasta tunnistettujen luonnollisten piirteiden seurantaan.
Työn aikana testattiin useita ratkaisuja ulkotiloihin soveltuvaan lisÀttyyn todellisuuteen. YhtÀ kaupallista työkalua testattiin, jonka lisÀksi toteutettiin kolme omaa sovellusta Android tableteille. Työn aikana kehitetyt sovellukset olivat yksinkertainen algoritmi gyroskoopin ja magnetometrin datan yhdistÀmiseen, luonnollisen piirteiden seuranta-algoritmi sekÀ panoraamakuvaan perustuva seuranta-algoritmi. Panoraamakuvaan perustuva seuranta-algoritmi on toteuteutettu toisen tieteellisen julkaisun pohjalta, ja algoritmia jatkokehitettiin integroimalla se Unity 3D:hen. Unity 3D-integrointi mahdollisti myös sisÀllön esittÀmisen lisÀtyn todellisuuden avulla.
Työn lopputuloksena todetaan, ettÀ lisÀtyn todellisuuden teknologia on lÀhellÀ pistettÀ, jossa lisÀtyn todellisuuden työkaluja voitaisiin kÀyttÀÀ ammattimaisessa kÀytössÀ. TÀllÀ hetkellÀ saatavilla olevat kaupalliset työkalut eivÀt vielÀ pÀÀse ammattikÀytön vaatimalle tasolle, mutta erityisesti visualisointitehtÀviin soveltuvia ei-kaupallisia ratkaisuja on jo olemassa. LisÀksi työn aikana toteutetun panoraamakuviin perustuvan seuranta-algoritmin todetaan olevan lupaava työkalu posen estimointiin ulkotiloissa.Siirretty Doriast
Learning geometric and lighting priors from natural images
Comprendre les images est dâune importance cruciale pour une plĂ©thore de tĂąches, de la composition numĂ©rique au rĂ©-Ă©clairage dâune image, en passant par la reconstruction 3D dâobjets. Ces tĂąches permettent aux artistes visuels de rĂ©aliser des chef-dâoeuvres ou dâaider des opĂ©rateurs Ă prendre des dĂ©cisions de façon sĂ©curitaire en fonction de stimulis visuels. Pour beaucoup de ces tĂąches, les modĂšles physiques et gĂ©omĂ©triques que la communautĂ© scientifique a dĂ©veloppĂ©s donnent lieu Ă des problĂšmes mal posĂ©s possĂ©dant plusieurs solutions, dont gĂ©nĂ©ralement une seule est raisonnable. Pour rĂ©soudre ces indĂ©terminations, le raisonnement sur le contexte visuel et sĂ©mantique dâune scĂšne est habituellement relayĂ© Ă un artiste ou un expert qui emploie son expĂ©rience pour rĂ©aliser son travail. Ceci est dĂ» au fait quâil est gĂ©nĂ©ralement nĂ©cessaire de raisonner sur la scĂšne de façon globale afin dâobtenir des rĂ©sultats plausibles et apprĂ©ciables. Serait-il possible de modĂ©liser lâexpĂ©rience Ă partir de donnĂ©es visuelles et dâautomatiser en partie ou en totalitĂ© ces tĂąches ? Le sujet de cette thĂšse est celui-ci : la modĂ©lisation dâa priori par apprentissage automatique profond pour permettre la rĂ©solution de problĂšmes typiquement mal posĂ©s. Plus spĂ©cifiquement, nous couvrirons trois axes de recherche, soient : 1) la reconstruction de surface par photomĂ©trie, 2) lâestimation dâillumination extĂ©rieure Ă partir dâune seule image et 3) lâestimation de calibration de camĂ©ra Ă partir dâune seule image avec un contenu gĂ©nĂ©rique. Ces trois sujets seront abordĂ©s avec une perspective axĂ©e sur les donnĂ©es. Chacun de ces axes comporte des analyses de performance approfondies et, malgrĂ© la rĂ©putation dâopacitĂ© des algorithmes dâapprentissage machine profonds, nous proposons des Ă©tudes sur les indices visuels captĂ©s par nos mĂ©thodes.Understanding images is needed for a plethora of tasks, from compositing to image relighting, including 3D object reconstruction. These tasks allow artists to realize masterpieces or help operators to safely make decisions based on visual stimuli. For many of these tasks, the physical and geometric models that the scientific community has developed give rise to ill-posed problems with several solutions, only one of which is generally reasonable. To resolve these indeterminations, the reasoning about the visual and semantic context of a scene is usually relayed to an artist or an expert who uses his experience to carry out his work. This is because humans are able to reason globally on the scene in order to obtain plausible and appreciable results. Would it be possible to model this experience from visual data and partly or totally automate tasks? This is the topic of this thesis: modeling priors using deep machine learning to solve typically ill-posed problems. More specifically, we will cover three research axes: 1) surface reconstruction using photometric cues, 2) outdoor illumination estimation from a single image and 3) camera calibration estimation from a single image with generic content. These three topics will be addressed from a data-driven perspective. Each of these axes includes in-depth performance analyses and, despite the reputation of opacity of deep machine learning algorithms, we offer studies on the visual cues captured by our methods
Computer vision models in surveillance robotics
2009/2010In questa Tesi, abbiamo sviluppato algoritmi che usano lâinformazione visiva per eseguire, in tempo reale, individuazione, riconoscimento e classificazione di oggetti in movimento, indipendentemente dalle condizioni ambientali e con lâaccurattezza migliore.
A tal fine, abbiamo sviluppato diversi concetti di visione artificial, cioĂš l'identificazione degli oggetti di interesse in tutta la scena visiva (monoculare o stereo), e la loro classificazione.
Nel corso della ricerca, sono stati provati diversi approcci, inclusa lâindividuazione di possibili candidati tramite la segmentazione di immagini con classificatori deboli e centroidi, algoritmi per la segmentazione di immagini rafforzate tramite informazioni stereo e riduzione del rumore, combinazione di popolari caratteristiche quali quelle invarianti a fattori di scala (SIFT) combinate con informazioni di distanza.
Abbiamo sviluppato due grandi categorie di soluzioni associate al tipo di sistema usato. Con camera mobile, abbiamo favorito lâindividuazione di oggetti conosciuti tramite scansione dellâimmagine; con camera fissa abbiamo anche utilizzato algoritmi per lâindividuazione degli oggetti in primo piano ed in movimento (foreground detection).
Nel caso di âforeground detectionâ, il tasso di individuazione e classificazione aumenta se la qualitaâ degli oggetti estratti eâ alta. Noi proponiamo metodi per ridurre gli effetti dellâombra, illuminazione e movimenti ripetitivi prodotti dagli oggetti in movimento.
Un aspetto importante studiato eâ la possibilitaâ di usare algoritmi per lâindividuazione di oggetti in movimento tramite camera mobile.
Soluzioni efficienti stanno diventando sempre piuâ complesse, ma anche gli strumenti di calcolo per elaborare gli algoritmi sono piuâ potenti e negli anni recenti, le architetture delle schede video (GPU) offrono un grande potenziale. Abbiamo proposto una soluzione per architettura GPU di una gestione delle immagini di sfondo, al fine di aumentare le prestazioni di individuazione.
In questa Tesi abbiamo studiato lâindividuazione ed inseguimento di persone for applicazioni come la prevenzione di situazione di rischio (attraversamento delle strade), e conteggio per lâanalisi del traffico. Noi abbiamo studiato questi problemi ed esplorato vari aspetti dellâindividuazione delle persone, gruppi ed individuazione in scenari
affollati.
Comunque, in un ambiente generico, eâ impossibile predire la configurazione di oggetti che saranno catturati dalla telecamera. In questi casi, eâ richiesto di âastrarre il concettoâ di oggetti. Con questo requisito in mente, abbiamo esplorato le proprietaâ dei metodi stocastici e mostrano che buoni tassi di classificazione possono essere ottenuti a condizione che lâinsieme di addestramento sia abbastanza grande.
Una struttura flessibile deve essere in grado di individuare le regioni in movimento e riconoscere gli oggetti di interesse. Abbiamo sviluppato una struttura per la gestione dei problemi di individuazione e classificazione.
Rispetto ad altri metodi, i metodi proposti offrono una struttura flessibile per lâindividuazione e classificazione degli oggetti, e che puoâ essere usata in modo efficiente in diversi ambienti interni ed esterni.XXII Cicl
- âŠ