83 research outputs found

    Segmentation and Scene Content in Moving Images

    Get PDF
    The problem of scene content in moving images was brought by Aralia. The goal in this study group was to consider two problems. The first was image segmentation and the second is the context of the scene. These problems were explored in different areas, namely the Bayesian approach to image segmentation, shadow detection, shape recognition and background separation

    Reconstructing Geometry from Its Latent Structures

    Get PDF
    Our world is full of objects with complex shapes and structures. Through extensive experience humans quickly develop an intuition about how objects are shaped, and what their material properties are simply by analyzing their appearance. We engage this intuitive understanding of geometry in nearly everything we do.It is not surprising then, that a careful treatment of geometry stands to give machines a powerful advantage in the many tasks of visual perception. To that end, this thesis focuses on geometry recovery in a wide range of real-world problems. First, we describe a new approach to image registration. We observe that the structure of the imaged subject becomes embedded in the image intensities. By minimizing the change in shape of these intensity structures we ensure a physically realizable deformation. We then describe a method for reassembling fragmented, thin-shelled objects from range-images of their fragments using only the geometric and photometric structure embedded in the boundary of each fragment. Third, we describe a method for recovering and representing the shape of a geometric texture (such as bark, or sandpaper) by studying the characteristic properties of texture---self similarity and scale variability. Finally, we describe two methods for recovering the 3D geometry and reflectance properties of an object from images taken under natural illumination. We note that the structure of the surrounding environment, modulated by the reflectance, becomes embedded in the appearance of the object giving strong clues about the object's shape.Though these domains are quite diverse, an essential premise---that observations of objects contain within them salient clues about the object's structure---enables new and powerful approaches. For each problem we begin by investigating what these clues are.We then derive models and methods to canonically represent these clues and enable their full exploitation. The wide-ranging success of each method shows the importance of our carefully formulated observations about geometry, and the fundamental role geometry plays in visual perception.Ph.D., Computer Science -- Drexel University, 201

    Scalable, Detailed and Mask-Free Universal Photometric Stereo

    Full text link
    In this paper, we introduce SDM-UniPS, a groundbreaking Scalable, Detailed, Mask-free, and Universal Photometric Stereo network. Our approach can recover astonishingly intricate surface normal maps, rivaling the quality of 3D scanners, even when images are captured under unknown, spatially-varying lighting conditions in uncontrolled environments. We have extended previous universal photometric stereo networks to extract spatial-light features, utilizing all available information in high-resolution input images and accounting for non-local interactions among surface points. Moreover, we present a new synthetic training dataset that encompasses a diverse range of shapes, materials, and illumination scenarios found in real-world scenes. Through extensive evaluation, we demonstrate that our method not only surpasses calibrated, lighting-specific techniques on public benchmarks, but also excels with a significantly smaller number of input images even without object masks.Comment: CVPR 2023 (Highlight). The source code will be available at https://github.com/satoshi-ikehata/SDM-UniPS-CVPR202

    Harmony Potentials: Fusing Global and Local Scale for Semantic Image Segmentation

    Get PDF
    The Hierarchical Conditional Random Field (HCRF) model have been successfully applied to a number of image labeling problems, including image segmentation. However, existing HCRF models of image segmentation do not allow multiple classes to be assigned to a single region, which limits their ability to incorporate contextual information across multiple scales. At higher scales in the image, this representation yields an oversimplified model since multiple classes can be reasonably expected to appear within large regions. This simplified model particularly limits the impact of information at higher scales. Since class-label information at these scales is usually more reliable than at lower, noisier scales, neglecting this information is undesirable. To address these issues, we propose a new consistency potential for image labeling problems, which we call the harmony potential. It can encode any possible combination of labels, penalizing only unlikely combinations of classes. We also propose an effective sampling strategy over this expanded label set that renders tractable the underlying optimization problem. Our approach obtains state-of-the-art results on two challenging, standard benchmark datasets for semantic image segmentation: PASCAL VOC 2010, and MSRC-2

    Enhancing spatio-chromatic representation with more-than-three color coding for image description

    Get PDF
    The extraction of spatio-chromatic features from color images is usually performed independently on each color channel. Usual 3D color spaces, such as RGB, present a high inter-channel correlation for natural images. This correlation can be reduced using color-opponent representations, but the spatial structure of regions with small color differences is not fully captured in two generic Red-Green and Blue-Yellow channels. To overcome these problems, we propose new color coding that is adapted to the specific content of each image. Our proposal is based on two steps: (a) setting the number of channels to the number of distinctive colors we find in each image (avoiding the problem of channel correlation), and (b) building a channel representation that maximizes contrast differences within each color channel (avoiding the problem of low local contrast). We call this approach more-than-three color coding (MTT) to emphasize the fact that the number of channels is adapted to the image content. The higher the color complexity of an image, the more channels can be used to represent it. Here we select distinctive colors as the most predominant in the image, which we call color pivots, and we build the new color coding strategy using these color pivots as a basis. To evaluate the proposed approach, we measure the efficiency in an image categorization task. We show how a generic descriptor improves performance at the description level when applied to the MTT coding

    Surface reflectance recognition and real-world illumination statistics

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2003.Includes bibliographical references (p. 141-150).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Humans distinguish materials such as metal, plastic, and paper effortlessly at a glance. Traditional computer vision systems cannot solve this problem at all. Recognizing surface reflectance properties from a single photograph is difficult because the observed image depends heavily on the amount of light incident from every direction. A mirrored sphere, for example, produces a different image in every environment. To make matters worse, two surfaces with different reflectance properties could produce identical images. The mirrored sphere simply reflects its surroundings, so in the right artificial setting, it could mimic the appearance of a matte ping-pong ball. Yet, humans possess an intuitive sense of what materials typically "look like" in the real world. This thesis develops computational algorithms with a similar ability to recognize reflectance properties from photographs under unknown, real-world illumination conditions. Real-world illumination is complex, with light typically incident on a surface from every direction. We find, however, that real-world illumination patterns are not arbitrary. They exhibit highly predictable spatial structure, which we describe largely in the wavelet domain. Although they differ in several respects from the typical photographs, illumination patterns share much of the regularity described in the natural image statistics literature. These properties of real-world illumination lead to predictable image statistics for a surface with given reflectance properties. We construct a system that classifies a surface according to its reflectance from a single photograph under unknown illumination. Our algorithm learns relationships between surface reflectance and certain statistics computed from the observed image.(cont.) Like the human visual system, we solve the otherwise underconstrained inverse problem of reflectance estimation by taking advantage of the statistical regularity of illumination. For surfaces with homogeneous reflectance properties and known geometry, our system rivals human performance.by Ron O. Dror.Ph.D

    Evaluation of remote sensing methods for continuous cover forestry

    Get PDF
    The overall aim of the project was to investigate the potential and challenges in the application of high spatial and spectral resolution remote sensing to forest stands in the UK for Continuous Cover Forestry (CCF) purposes. Within the context of CCF, a relatively new forest management strategy that has been implemented in several European countries, the usefulness of digital remote sensing techniques lie in their potential ability to retrieve parameters at sub-stand level and, in particular, in the assessment of natural regeneration and light regimes. The idea behind CCF is the support of a sustainable forest management system reducing disturbance of the forest ecosystem and encouraging the use of more natural methods, e.g. natural regeneration, for which the light environment beneath the forest canopy plays a fundamental role.The study was carried out at a test area in central Scotland, situated within the Queen Elizabeth II Forest Park (lat. 56°10' N, long. 4° 23' W). Six plots containing three different species (Norway spruce, European larch and Sessile oak), characterized by their different light regimes, were established within the area for the measurement of forest variables using a forest inventory approach and hemispherical photography. The remote sensing data available for the study consisted of Landsat ETM+ imagery, a small footprint multi-return lidar dataset over the study area, Airborne Thematic Mapper (ATM) data, and aerial photography with same acquisition date as the lidar data.Landsat ETM+ imagery was used for the spectral characterisation of the species under study and the evaluation of phenological change as a factor to consider for future acquisitions of remotely sensed imagery. Three approaches were used for the discrimination between species: raw data, NDVI, and Principal Component Analysis (PCA). It can be concluded that no single date is ideal for discriminating the species studied (early summer was best) and that a combination of two or three datasets covering their phenological cycles is optimal for the differentiation. Although the approaches used helped to characterize the forest species, especially to the discrimination between spruces, larch and the deciduous oak species, further work is needed in order to define an optimum approach to discriminate between spruce species (e.g. Sitka spruce and Norway spruce) for which spectral responses are very similar. In general, the useful ranges of the indices were small, so a careful and accurate preprocessing of the imagery is highly recommended.Lidar, ATM, and aerial photographic datasets were analysed for the characterisation of vertical and horizontal forest structure. A slope-based algorithm was developed for the extraction of ground elevation and tree heights from multiple return lidar data, the production of a Digital Terrain Model (DTM) and Digital Surface Model (DSM) of the area under study, and for the comparison of the predicted lidar tree heights with the true tree heights, followed by the building of a Digital Canopy Model (DCM) for the determination of percentage canopy cover and tree crown delineation. Mean height and individual tree heights were estimated for all sample plots. The results showed that lidar underestimated tree heights by an average of 1.49 m. The standard deviation of the lidar estimates was 3.58 m and the mean standard error was 0.38 m.This study assessed the utility of an object-oriented approach for deciduous and coniferous crown delineation, based on small-footprint, multiple return lidar data, high resolution ATM imagery, and aerial photography. Special emphasis in the analysis was made in the fusion of aerial photography and lidar data for tree crown detection and classification, as it was expected that the high vertical accuracy of lidar, combined with the high spatial resolution aerial photography would render the best results and would provide the forestry sector with an affordable and accurate means for forest management and planning. Most of the field surveyed trees could be automatically and correctly detected, especially for the spruce and larch plots, but the complexity of the deciduous plots hindered the tree recognition approach, leading to poor crown extent and gap estimations. Indicators of light availability were calculated from the lidar data by calculation of laser hit penetration rates and percentage canopy cover. These results were compared to estimates of canopy openness obtained from hemispherical pictures for the same locations.Finally, the synergistic benefits of all datasets were evaluated and the forest structural variables determined from remote sensing and hemispherical photography were examined as indicators of light availability for regenerating seedlings

    State of the Art on Neural Rendering

    Get PDF
    Efficient rendering of photo-realistic virtual worlds is a long standing effort of computer graphics. Modern graphics techniques have succeeded in synthesizing photo-realistic images from hand-crafted scene representations. However, the automatic generation of shape, materials, lighting, and other aspects of scenes remains a challenging problem that, if solved, would make photo-realistic computer graphics more widely accessible. Concurrently, progress in computer vision and machine learning have given rise to a new approach to image synthesis and editing, namely deep generative models. Neural rendering is a new and rapidly emerging field that combines generative machine learning techniques with physical knowledge from computer graphics, e.g., by the integration of differentiable rendering into network training. With a plethora of applications in computer graphics and vision, neural rendering is poised to become a new area in the graphics community, yet no survey of this emerging field exists. This state-of-the-art report summarizes the recent trends and applications of neural rendering. We focus on approaches that combine classic computer graphics techniques with deep generative models to obtain controllable and photo-realistic outputs. Starting with an overview of the underlying computer graphics and machine learning concepts, we discuss critical aspects of neural rendering approaches. This state-of-the-art report is focused on the many important use cases for the described algorithms such as novel view synthesis, semantic photo manipulation, facial and body reenactment, relighting, free-viewpoint video, and the creation of photo-realistic avatars for virtual and augmented reality telepresence. Finally, we conclude with a discussion of the social implications of such technology and investigate open research problems

    Surface Reflectance Recognition and Real-World Illumination Statistics

    Get PDF
    Humans distinguish materials such as metal, plastic, and paper effortlessly at a glance. Traditional computer vision systems cannot solve this problem at all. Recognizing surface reflectance properties from a single photograph is difficult because the observed image depends heavily on the amount of light incident from every direction. A mirrored sphere, for example, produces a different image in every environment. To make matters worse, two surfaces with different reflectance properties could produce identical images. The mirrored sphere simply reflects its surroundings, so in the right artificial setting, it could mimic the appearance of a matte ping-pong ball. Yet, humans possess an intuitive sense of what materials typically "look like" in the real world. This thesis develops computational algorithms with a similar ability to recognize reflectance properties from photographs under unknown, real-world illumination conditions. Real-world illumination is complex, with light typically incident on a surface from every direction. We find, however, that real-world illumination patterns are not arbitrary. They exhibit highly predictable spatial structure, which we describe largely in the wavelet domain. Although they differ in several respects from the typical photographs, illumination patterns share much of the regularity described in the natural image statistics literature. These properties of real-world illumination lead to predictable image statistics for a surface with given reflectance properties. We construct a system that classifies a surface according to its reflectance from a single photograph under unknown illuminination. Our algorithm learns relationships between surface reflectance and certain statistics computed from the observed image. Like the human visual system, we solve the otherwise underconstrained inverse problem of reflectance estimation by taking advantage of the statistical regularity of illumination. For surfaces with homogeneous reflectance properties and known geometry, our system rivals human performance

    Material Recognition Meets 3D Reconstruction : Novel Tools for Efficient, Automatic Acquisition Systems

    Get PDF
    For decades, the accurate acquisition of geometry and reflectance properties has represented one of the major objectives in computer vision and computer graphics with many applications in industry, entertainment and cultural heritage. Reproducing even the finest details of surface geometry and surface reflectance has become a ubiquitous prerequisite in visual prototyping, advertisement or digital preservation of objects. However, today's acquisition methods are typically designed for only a rather small range of material types. Furthermore, there is still a lack of accurate reconstruction methods for objects with a more complex surface reflectance behavior beyond diffuse reflectance. In addition to accurate acquisition techniques, the demand for creating large quantities of digital contents also pushes the focus towards fully automatic and highly efficient solutions that allow for masses of objects to be acquired as fast as possible. This thesis is dedicated to the investigation of basic components that allow an efficient, automatic acquisition process. We argue that such an efficient, automatic acquisition can be realized when material recognition "meets" 3D reconstruction and we will demonstrate that reliably recognizing the materials of the considered object allows a more efficient geometry acquisition. Therefore, the main objectives of this thesis are given by the development of novel, robust geometry acquisition techniques for surface materials beyond diffuse surface reflectance, and the development of novel, robust techniques for material recognition. In the context of 3D geometry acquisition, we introduce an improvement of structured light systems, which are capable of robustly acquiring objects ranging from diffuse surface reflectance to even specular surface reflectance with a sufficient diffuse component. We demonstrate that the resolution of the reconstruction can be increased significantly for multi-camera, multi-projector structured light systems by using overlappings of patterns that have been projected under different projector poses. As the reconstructions obtained by applying such triangulation-based techniques still contain high-frequency noise due to inaccurately localized correspondences established for images acquired under different viewpoints, we furthermore introduce a novel geometry acquisition technique that complements the structured light system with additional photometric normals and results in significantly more accurate reconstructions. In addition, we also present a novel method to acquire the 3D shape of mirroring objects with complex surface geometry. The aforementioned investigations on 3D reconstruction are accompanied by the development of novel tools for reliable material recognition which can be used in an initial step to recognize the present surface materials and, hence, to efficiently select the subsequently applied appropriate acquisition techniques based on these classified materials. In the scope of this thesis, we therefore focus on material recognition for scenarios with controlled illumination as given in lab environments as well as scenarios with natural illumination that are given in photographs of typical daily life scenes. Finally, based on the techniques developed in this thesis, we provide novel concepts towards efficient, automatic acquisition systems
    corecore