1,021 research outputs found

    Exemplar-Based Image and Video Stylization Using Fully Convolutional Semantic Features

    Get PDF
    postprin

    Automatic Photo Adjustment Using Deep Neural Networks

    Get PDF
    postprin

    Physically based geometry and reflectance recovery from images

    Get PDF
    An image is a projection of the three-dimensional world taken at an instance in space and time. Its formation involves a complex interplay between geometry, illumination and material properties of objects in the scene. Given image data and knowledge of some scene properties, the recovery of the remaining components can be cast as a set of physically based inverse problems. This thesis investigates three inverse problems on the recovery of scene properties and discusses how we can develop appropriate physical constraints and build them into effective algorithms. Firstly, we study the problem of geometry recovery from a single image with repeated texture. Our technique leverages the PatchMatch algorithm to detect and match repeated patterns undergoing geometric transformations. This allows effective enforcement of translational symmetry constraint in the recovery of texture lattice. Secondly, we study the problem of computational relighting using RGB-D data, where the depth data is acquired through a Kinect sensor and is often noisy. We show how the inclusion of noisy depth input helps to resolve ambiguities in the recovery of shape and reflectance in the inverse rendering problem. Our results show that the complementary nature of RGB and depth is highly beneficial for a practical relighting system. Lastly, in the third problem, we exploit the use of geometric constraints relating two views, to address a challenging problem in Internet image matching. Our solution is robust to geometric and photometric distortions over wide baselines. It also accommodates repeated structures that are commonly found in our modern environment. Building on the image correspondence, we also investigate the use of color transfer as an additional global constraint in relating Internet images. It shows promising results in obtaining more accurate and denser correspondence

    Perceptually inspired image estimation and enhancement

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2009.Includes bibliographical references (p. 137-144).In this thesis, we present three image estimation and enhancement algorithms inspired by human vision. In the first part of the thesis, we propose an algorithm for mapping one image to another based on the statistics of a training set. Many vision problems can be cast as image mapping problems, such as, estimating reflectance from luminance, estimating shape from shading, separating signal and noise, etc. Such problems are typically under-constrained, and yet humans are remarkably good at solving them. Classic computational theories about the ability of the human visual system to solve such under-constrained problems attribute this feat to the use of some intuitive regularities of the world, e.g., surfaces tend to be piecewise constant. In recent years, there has been considerable interest in deriving more sophisticated statistical constraints from natural images, but because of the high-dimensional nature of images, representing and utilizing the learned models remains a challenge. Our techniques produce models that are very easy to store and to query. We show these techniques to be effective for a number of applications: removing noise from images, estimating a sharp image from a blurry one, decomposing an image into reflectance and illumination, and interpreting lightness illusions. In the second part of the thesis, we present an algorithm for compressing the dynamic range of an image while retaining important visual detail. The human visual system confronts a serious challenge with dynamic range, in that the physical world has an extremely high dynamic range, while neurons have low dynamic ranges.(cont.) The human visual system performs dynamic range compression by applying automatic gain control, in both the retina and the visual cortex. Taking inspiration from that, we designed techniques that involve multi-scale subband transforms and smooth gain control on subband coefficients, and resemble the contrast gain control mechanism in the visual cortex. We show our techniques to be successful in producing dynamic-range-compressed images without compromising the visibility of detail or introducing artifacts. We also show that the techniques can be adapted for the related problem of "companding", in which a high dynamic range image is converted to a low dynamic range image and saved using fewer bits, and later expanded back to high dynamic range with minimal loss of visual quality. In the third part of the thesis, we propose a technique that enables a user to easily localize image and video editing by drawing a small number of rough scribbles. Image segmentation, usually treated as an unsupervised clustering problem, is extremely difficult to solve. With a minimal degree of user supervision, however, we are able to generate selection masks with good quality. Our technique learns a classifier using the user-scribbled pixels as training examples, and uses the classifier to classify the rest of the pixels into distinct classes. It then uses the classification results as per-pixel data terms, combines them with a smoothness term that respects color discontinuities, and generates better results than state-of-art algorithms for interactive segmentation.by Yuanzhen Li.Ph.D

    Data-driven approaches for interactive appearance editing

    Get PDF
    This thesis proposes several techniques for interactive editing of digital content and fast rendering of virtual 3D scenes. Editing of digital content - such as images or 3D scenes - is difficult, requires artistic talent and technical expertise. To alleviate these difficulties, we exploit data-driven approaches that use the easily accessible Internet data (e. g., images, videos, materials) to develop new tools for digital content manipulation. Our proposed techniques allow casual users to achieve high-quality editing by interactively exploring the manipulations without the need to understand the underlying physical models of appearance. First, the thesis presents a fast algorithm for realistic image synthesis of virtual 3D scenes. This serves as the core framework for a new method that allows artists to fine tune the appearance of a rendered 3D scene. Here, artists directly paint the final appearance and the system automatically solves for the material parameters that best match the desired look. Along this line, an example-based material assignment approach is proposed, where the 3D models of a virtual scene can be "materialized" simply by giving a guidance source (image/video). Next, the thesis proposes shape and color subspaces of an object that are learned from a collection of exemplar images. These subspaces can be used to constrain image manipulations to valid shapes and colors, or provide suggestions for manipulations. Finally, data-driven color manifolds which contain colors of a specific context are proposed. Such color manifolds can be used to improve color picking performance, color stylization, compression or white balancing.Diese Dissertation stellt Techniken zum interaktiven Editieren von digitalen Inhalten und zum schnellen Rendering von virtuellen 3D Szenen vor. Digitales Editieren - seien es Bilder oder dreidimensionale Szenen - ist kompliziert, benötigt künstlerisches Talent und technische Expertise. Um diese Schwierigkeiten zu relativieren, nutzen wir datengesteuerte Ansätze, die einfach zugängliche Internetdaten, wie Bilder, Videos und Materialeigenschaften, nutzen um neue Werkzeuge zur Manipulation von digitalen Inhalten zu entwickeln. Die von uns vorgestellten Techniken erlauben Gelegenheitsnutzern das Editieren in hoher Qualität, indem Manipulationsmöglichkeiten interaktiv exploriert werden können ohne die zugrundeliegenden physikalischen Modelle der Bildentstehung verstehen zu müssen. Zunächst stellen wir einen effizienten Algorithmus zur realistischen Bildsynthese von virtuellen 3D Szenen vor. Dieser dient als Kerngerüst einer Methode, die Nutzern die Feinabstimmung des finalen Aussehens einer gerenderten dreidimensionalen Szene erlaubt. Hierbei malt der Künstler direkt das beabsichtigte Aussehen und das System errechnet automatisch die zugrundeliegenden Materialeigenschaften, die den beabsichtigten Eigenschaften am nahesten kommen. Zu diesem Zweck wird ein auf Beispielen basierender Materialzuordnungsansatz vorgestellt, für den das 3D Model einer virtuellen Szene durch das simple Anführen einer Leitquelle (Bild, Video) in Materialien aufgeteilt werden kann. Als Nächstes schlagen wir Form- und Farbunterräume von Objektklassen vor, die aus einer Sammlung von Beispielbildern gelernt werden. Diese Unterräume können genutzt werden um Bildmanipulationen auf valide Formen und Farben einzuschränken oder Manipulationsvorschläge zu liefern. Schließlich werden datenbasierte Farbmannigfaltigkeiten vorgestellt, die Farben eines spezifischen Kontexts enthalten. Diese Mannigfaltigkeiten ermöglichen eine Leistungssteigerung bei Farbauswahl, Farbstilisierung, Komprimierung und Weißabgleich

    Synthesizing and Editing Photo-realistic Visual Objects

    Get PDF
    In this thesis we investigate novel methods of synthesizing new images of a deformable visual object using a collection of images of the object. We investigate both parametric and non-parametric methods as well as a combination of the two methods for the problem of image synthesis. Our main focus are complex visual objects, specifically deformable objects and objects with varying numbers of visible parts. We first introduce sketch-driven image synthesis system, which allows the user to draw ellipses and outlines in order to sketch a rough shape of animals as a constraint to the synthesized image. This system interactively provides feedback in the form of ellipse and contour suggestions to the partial sketch of the user. The user's sketch guides the non-parametric synthesis algorithm that blends patches from two exemplar images in a coarse-to-fine fashion to create a final image. We evaluate the method and synthesized images through two user studies. Instead of non-parametric blending of patches, a parametric model of the appearance is more desirable as its appearance representation is shared between all images of the dataset. Hence, we propose Context-Conditioned Component Analysis, a probabilistic generative parametric model, which described images with a linear combination of basis functions. The basis functions are evaluated for each pixel using a context vector computed from the local shape information. We evaluate C-CCA qualitatively and quantitatively on inpainting, appearance transfer and reconstruction tasks. Drawing samples of C-CCA generates novel, globally-coherent images, which, unfortunately, lack high-frequency details due to dimensionality reduction and misalignment. We develop a non-parametric model that enhances the samples of C-CCA with locally-coherent, high-frequency details. The non-parametric model efficiently finds patches from the dataset that match the C-CCA sample and blends the patches together. We analyze the results of the combined method on the datasets of horse and elephant images

    Computer Vision on Web Pages: A Study of Man-Made Images

    Get PDF
    This thesis is focused on the development of computer vision techniques for parsing web pages using an image of the rendered page as evidence, and on understanding this under-explored class of images from the perspective of computer vision. This project is divided into two tracks---applied and theoretical---which complement each other. Our practical motivation is the application of improved web page parsing to assistive technology, such as screenreaders for visually impaired users or the ability to declutter the presentation of a web page for those with cognitive deficit. From a more theoretical standpoint, images of rendered web pages have interesting properties from a computer vision perspective; in particular, low-level assumptions can be made in this domain, but the most important cues are often subtle and can be highly non-local. The parsing system developed in this thesis is a principled Bayesian segmentation-classification pipeline, using innovative techniques to produce valuable results in this challenging domain. The thesis includes both implementation and evaluation solutions. Segmentation of a web page is the problem of dividing it into semantically significant, visually coherent regions. We use a hierarchical segmentation method based on the detection of semantically significant lines (possibly broken lines) which divide regions. The Bayesian design allows sophisticated probability models to be applied to the segmentation process, and our method produces segmentation trees that achieve good performance on a variety of measures. Classification, for our purposes, is identifying the semantic role of regions in the segmentation tree of a page. We achieve promising results with a Bayesian classification algorithm based on the novel use of a hidden Markov tree model, in which the structure of the model is adapted to reflect the structure of the segmentation tree. This allows the algorithm to make effective use of the context in which regions appear as well as the features of each individual region. The methods used to evaluate our page parsing system include qualitative and quantitative evaluation of algorithm performance (using manually-prepared ground truth data) as well as a user study of an assistive interface based on our page segmentation algorithm. We also performed a separate user study to investigate users' perceptions of web page organization and to generate ground truth segmentations, leading to important insights about consistency. Taken as a whole, this thesis presents innovative work in computer vision which contributes both to addressing the problem of web accessibility and to the understanding of semantic cues in images

    How Do Users Map Points Between Dissimilar Shapes

    Get PDF
    Finding similar points in globally or locally similar shapes has been studied extensively through the use of various point descriptors or shape-matching methods. However, little work exists on finding similar points in dissimilar shapes. In this paper, we present the results of a study where users were given two dissimilar two-dimensional shapes and asked to map a given point in the first shape to the point in the second shape they consider most similar. We find that user mappings in this study correlate strongly with simple geometric relationships between points and shapes. To predict the probability distribution of user mappings between any pair of simple two-dimensional shapes, two distinct statistical models are defined using these relationships. We perform a thorough validation of the accuracy of these predictions and compare our models qualitatively and quantitatively to well-known shape-matching methods. Using our predictive models, we propose an approach to map objects or procedural content between different shapes in different design scenarios

    Physically Based Rendering of Synthetic Objects in Real Environments

    Full text link
    corecore