15 research outputs found

    Interaktive Raumzeitrekonstruktion in der Computergraphik

    Get PDF
    High-quality dense spatial and/or temporal reconstructions and correspondence maps from camera images, be it optical flow, stereo or scene flow, are an essential prerequisite for a multitude of computer vision and graphics tasks, e.g. scene editing or view interpolation in visual media production. Due to the ill-posed nature of the estimation problem in typical setups (i.e. limited amount of cameras, limited frame rate), automated estimation approaches are prone to erroneous correspondences and subsequent quality degradation in many non-trivial cases such as occlusions, ambiguous movements, long displacements, or low texture. While improving estimation algorithms is one obvious possible direction, this thesis complementarily concerns itself with creating intuitive, high-level user interactions that lead to improved correspondence maps and scene reconstructions. Where visually convincing results are essential, rendering artifacts resulting from estimation errors are usually repaired by hand with image editing tools, which is time consuming and therefore costly. My new user interactions, which integrate human scene recognition capabilities to guide a semi-automatic correspondence or scene reconstruction algorithm, save considerable effort and enable faster and more efficient production of visually convincing rendered images.Raumzeit-Rekonstruktion in Form von dichten rĂ€umlichen und/oder zeitlichen Korrespondenzen zwischen Kamerabildern, sei es optischer Fluss, Stereo oder Szenenfluss, ist eine wesentliche Voraussetzung fĂŒr eine Vielzahl von Aufgaben in der Computergraphik, zum Beispiel zum Editieren von Szenen oder Bildinterpolation. Da sowohl die Anzahl der Kameras als auch die Bildfrequenz begrenzt sind, ist das Rekonstruktionsproblem unterbestimmt, weswegen automatisierte SchĂ€tzungen hĂ€ufig fehlerhafte Korrespondenzen fĂŒr nichttriviale FĂ€lle wie Verdeckungen, mehrdeutige oder große Bewegungen, oder einheitliche Texturen enthalten; jede Bildsynthese basierend auf den partiell falschen SchĂ€tzungen muß daher QualitĂ€tseinbußen in Kauf nehmen. Man kann nun zum einen versuchen, die SchĂ€tzungsalgorithmen zu verbessern. KomplementĂ€r dazu kann man möglichst effiziente Interaktionsmöglichkeiten entwickeln, die die QualitĂ€t der Rekonstruktion drastisch verbessern. Dies ist das Ziel dieser Dissertation. FĂŒr visuell ĂŒberzeugende Resultate mĂŒssen Bildsynthesefehler bislang manuell in einem aufwĂ€ndigen Nachbearbeitungsschritt mit Hilfe von Bildbearbeitungswerkzeugen korrigiert werden. Meine neuen Benutzerinteraktionen, welche menschliches SzenenverstĂ€ndnis in halbautomatische Algorithmen integrieren, verringern den Nachbearbeitungsaufwand betrĂ€chtlich und ermöglichen so eine schnellere und effizientere Produktion qualitativ hochwertiger synthetisierter Bilder

    Generalizations of the Multicut Problem for Computer Vision

    Get PDF
    Graph decomposition has always been a very important concept in machine learning and computer vision. Many tasks like image and mesh segmentation, community detection in social networks, as well as object tracking and human pose estimation can be formulated as a graph decomposition problem. The multicut problem in particular is a popular model to optimize for a decomposition of a given graph. Its main advantage is that no prior knowledge about the number of components or their sizes is required. However, it has several limitations, which we address in this thesis: Firstly, the multicut problem allows to specify only cost or reward for putting two direct neighbours into distinct components. This limits the expressibility of the cost function. We introduce special edges into the graph that allow to define cost or reward for putting any two vertices into distinct components, while preserving the original set of feasible solutions. We show that this considerably improves the quality of image and mesh segmentations. Second, multicut is notorious to be NP-hard for general graphs, that limits its applications to small super-pixel graphs. We define and implement two primal feasible heuristics to solve the problem. They do not provide any guarantees on the runtime or quality of solutions, but in practice show good convergence behaviour. We perform an extensive comparison on multiple graphs of different sizes and properties. Third, we extend the multicut framework by introducing node labels, so that we can jointly optimize for graph decomposition and nodes classification by means of exactly the same optimization algorithm, thus eliminating the need to hand-tune optimizers for a particular task. To prove its universality we applied it to diverse computer vision tasks, including human pose estimation, multiple object tracking, and instance-aware semantic segmentation. We show that we can improve the results over the prior art using exactly the same data as in the original works. Finally, we use employ multicuts in two applications: 1) a client-server tool for interactive video segmentation: After the pre-processing of the video a user draws strokes on several frames and a time-coherent segmentation of the entire video is performed on-the-fly. 2) we formulate a method for simultaneous segmentation and tracking of living cells in microscopy data. This task is challenging as cells split and our algorithm accounts for this, creating parental hierarchies. We also present results on multiple model fitting. We find models in data heavily corrupted by noise by finding components defining these models using higher order multicuts. We introduce an interesting extension that allows our optimization to pick better hyperparameters for each discovered model. In summary, this thesis extends the multicut problem in different directions, proposes algorithms for optimization, and applies it to novel data and settings.Die Zerlegung von Graphen ist ein sehr wichtiges Konzept im maschinellen Lernen und maschinellen Sehen. Viele Aufgaben wie Bild- und Gittersegmentierung, KommunitĂ€tserkennung in sozialen Netzwerken, sowie Objektverfolgung und SchĂ€tzung von menschlichen Posen können als Graphzerlegungsproblem formuliert werden. Der Mehrfachschnitt-Ansatz ist ein populĂ€res Mittel um ĂŒber die Zerlegungen eines gegebenen Graphen zu optimieren. Sein grĂ¶ĂŸter Vorteil ist, dass kein Vorwissen ĂŒber die Anzahl an Komponenten und deren GrĂ¶ĂŸen benötigt wird. Dennoch hat er mehrere ernsthafte Limitierungen, welche wir in dieser Arbeit behandeln: Erstens erlaubt der klassische Mehrfachschnitt nur die Spezifikation von Kosten oder Belohnungen fĂŒr die Trennung von zwei Nachbarn in verschiedene Komponenten. Dies schrĂ€nkt die AusdrucksfĂ€higkeit der Kostenfunktion ein und fĂŒhrt zu suboptimalen Ergebnissen. Wir fĂŒgen dem Graphen spezielle Kanten hinzu, welche es erlauben, Kosten oder Belohnungen fĂŒr die Trennung von beliebigen Paaren von Knoten in verschiedene Komponenten zu definieren, ohne die Menge an zulĂ€ssigen Lösungen zu verĂ€ndern. Wir zeigen, dass dies die QualitĂ€t von Bild- und Gittersegmentierungen deutlich verbessert. Zweitens ist das Mehrfachschnittproblem berĂŒchtigt dafĂŒr NP-schwer fĂŒr allgemeine Graphen zu sein, was die Anwendungen auf kleine superpixel-basierte Graphen einschrĂ€nkt. Wir definieren und implementieren zwei primal-zulĂ€ssige Heuristiken um das Problem zu lösen. Diese geben keine Garantien bezĂŒglich der Laufzeit oder der QualitĂ€t der Lösungen, zeigen in der Praxis jedoch gutes Konvergenzverhalten. Wir fĂŒhren einen ausfĂŒhrlichen Vergleich auf vielen Graphen verschiedener GrĂ¶ĂŸen und Eigenschaften durch. Drittens erweitern wir den Mehrfachschnitt-Ansatz um Knoten-Kennzeichnungen, sodass wir gemeinsam ĂŒber Zerlegungen und Knoten-Klassifikationen mit dem gleichen Optimierungs-Algorithmus optimieren können. Dadurch wird der Bedarf der Feinabstimmung einzelner aufgabenspezifischer Löser aus dem Weg gerĂ€umt. Um die AllgemeingĂŒltigkeit dieses Ansatzes zu ĂŒberprĂŒfen, haben wir ihn auf verschiedenen Aufgaben des maschinellen Sehens, einschließlich menschliche PosenschĂ€tzung, Mehrobjektverfolgung und instanz-bewusste semantische Segmentierung, angewandt. Wir zeigen, dass wir Resultate von vorherigen Arbeiten mit exakt den gleichen Daten verbessern können. Abschließend benutzen wir Mehrfachschnitte in zwei Anwendungen: 1) Ein Nutzer-Server-Werkzeug fĂŒr interaktive Video Segmentierung: Nach der Vorbearbeitung eines Videos zeichnet der Nutzer Striche auf mehrere Einzelbilder und eine zeit-kohĂ€rente Segmentierung des gesamten Videos wird in Echtzeit berechnet. 2) Wir formulieren eine Methode fĂŒr simultane Segmentierung und Verfolgung von lebenden Zellen in Mikroskopie-Aufnahmen. Diese Aufgabe ist anspruchsvoll, da Zellen sich aufteilen und unser Algorithmus dies in der Erstellung von Eltern-Hierarchien mitberĂŒcksichtigen muss. Wir prĂ€sentieren außerdem Resultate zur Mehrmodellanpassung. Wir berechnen Modelle in stark verrauschten Daten indem wir mithilfe von Mehrfachschnitten höherer Ordnung Komponenten finden, die diesen Modellen entsprechen. Wir fĂŒhren eine interessante Erweiterung ein, die es unserer Optimierung erlaubt, bessere Hyperparameter fĂŒr jedes entdeckte Modell auszuwĂ€hlen. Zusammenfassend erweitert diese Arbeit den Mehrfachschnitt-Ansatz in unterschiedlichen Richtungen, schlĂ€gt Algorithmen zur Inferenz in den resultierenden Modellen vor und wendet ihn auf neuartigen Daten und Umgebungen an

    Analysis and Synthesis of Interactive Video Sprites

    Get PDF
    In this thesis, we explore how video, an extremely compelling medium that is traditionally consumed passively, can be transformed into interactive experiences and what is preventing content creators from using it for this purpose. Film captures extremely rich and dynamic information but, due to the sheer amount of data and the drastic change in content appearance over time, it is problematic to work with. Content creators are willing to invest time and effort to design and capture video so why not for manipulating and interacting with it? We hypothesize that people can help and be helped by automatic video processing and synthesis algorithms when they are given the right tools. Computer games are a very popular interactive media where players engage with dynamic content in compelling and intuitive ways. The first contribution of this thesis is an in-depth exploration of the modes of interaction that enable game-like video experiences. Through active discussions with game developers, we identify both how to assist content creators and how their creation can be dynamically interacted with by players. We present concepts, explore algorithms and design tools that together enable interactive video experiences. Our findings concerning processing videos and interacting with filmed content come together in this thesis' second major contribution. We present a new medium of expression where video elements can be looped, merged and triggered interactively. Static-camera videos are converted into loopable sequences that can be controlled in real time in response to simple end-user requests. We present novel algorithms and interactive tools that enable our new medium of expression. Our human-in-the-loop system gives the user progressively more creative control over the video content as they invest more effort and artists help us evaluate it. Monocular, static-camera videos are a good fit for looping algorithms but they have been limited to two-dimensional applications as pixels are reshuffled in space and time on the image plane. The final contribution of this thesis breaks through this barrier by allowing users to interact with filmed objects in a three-dimensional manner. Our novel object tracking algorithm extends existing 2D bounding box trackers with 3D information, such as a well-fitting bounding volume, which in turn enables a new breed of interactive video experiences. The filmed content becomes a three-dimensional playground as users are free to move the virtual camera or the tracked objects and see them from novel viewpoints

    Film, Video, and Digitality: An Analysis of Cultural Form in Time-based Media

    Get PDF
    This thesis examines the material properties of time-based image media, in particular live video. The project is practice-based with a theoretical underpinning drawn from the debates on form and meaning associated with Walter Benjamin

    Interactive computer vision through the Web

    Get PDF
    Computer vision is the computational science aiming at reproducing and improving the ability of human vision to understand its environment. In this thesis, we focus on two fields of computer vision, namely image segmentation and visual odometry and we show the positive impact that interactive Web applications provide on each. The first part of this thesis focuses on image annotation and segmentation. We introduce the image annotation problem and challenges it brings for large, crowdsourced datasets. Many interactions have been explored in the literature to help segmentation algorithms. The most common consist in designating contours, bounding boxes around objects, or interior and exterior scribbles. When crowdsourcing, annotation tasks are delegated to a non-expert public, sometimes on cheaper devices such as tablets. In this context, we conducted a user study showing the advantages of the outlining interaction over scribbles and bounding boxes. Another challenge of crowdsourcing is the distribution medium. While evaluating an interaction in a small user study does not require complex setup, distributing an annotation campaign to thousands of potential users might differ. Thus we describe how the Elm programming language helped us build a reliable image annotation Web application. A highlights tour of its functionalities and architecture is provided, as well as a guide on how to deploy it to crowdsourcing services such as Amazon Mechanical Turk. The application is completely opensource and available online. In the second part of this thesis we present our open-source direct visual odometry library. In that endeavor, we provide an evaluation of other open-source RGB-D camera tracking algorithms and show that our approach performs as well as the currently available alternatives. The visual odometry problem relies on geometry tools and optimization techniques traditionally requiring much processing power to perform at realtime framerates. Since we aspire to run those algorithms directly in the browser, we review past and present technologies enabling high performance computations on the Web. In particular, we detail how to target a new standard called WebAssembly from the C++ and Rust programming languages. Our library has been started from scratch in the Rust programming language, which then allowed us to easily port it to WebAssembly. Thanks to this property, we are able to showcase a visual odometry Web application with multiple types of interactions available. A timeline enables one-dimensional navigation along the video sequence. Pairs of image points can be picked on two 2D thumbnails of the image sequence to realign cameras and correct drifts. Colors are also used to identify parts of the 3D point cloud, selectable to reinitialize camera positions. Combining those interactions enables improvements on the tracking and 3D point reconstruction results

    Methodology for extensive evaluation of semiautomatic and interactive segmentation algorithms using simulated Interaction models

    Get PDF
    Performance of semiautomatic and interactive segmentation(SIS) algorithms are usually evaluated by employing a small number of human operators to segment the images. The human operators typically provide the approximate location of objects of interest and their boundaries in an interactive phase, which is followed by an automatic phase where the segmentation is performed under the constraints of the operator-provided guidance. The segmentation results produced from this small set of interactions do not represent the true capability and potential of the algorithm being evaluated. For example, due to inter-operator variability, human operators may make choices that may provide either overestimated or underestimated results. As well, their choices may not be realistic when compared to how the algorithm is used in the field, since interaction may be influenced by operator fatigue and lapses in judgement. Other drawbacks to using human operators to assess SIS algorithms, include: human error, the lack of available expert users, and the expense. A methodology for evaluating segmentation performance is proposed here which uses simulated Interaction models to programmatically generate large numbers of interactions to ensure the presence of interactions throughout the object region. These interactions are used to segment the objects of interest and the resulting segmentations are then analysed using statistical methods. The large number of interactions generated by simulated interaction models capture the variabilities existing in the set of user interactions by considering each and every pixel inside the entire region of the object as a potential location for an interaction to be placed with equal probability. Due to the practical limitation imposed by the enormous amount of computation for the enormous number of possible interactions, uniform sampling of interactions at regular intervals is used to generate the subset of all possible interactions which still can represent the diverse pattern of the entire set of interactions. Categorization of interactions into different groups, based on the position of the interaction inside the object region and texture properties of the image region where the interaction is located, provides the opportunity for fine-grained algorithm performance analysis based on these two criteria. Application of statistical hypothesis testing make the analysis more accurate, scientific and reliable in comparison to conventional evaluation of semiautomatic segmentation algorithms. The proposed methodology has been demonstrated by two case studies through implementation of seven different algorithms using three different types of interaction modes making a total of nine segmentation applications to assess the efficacy of the methodology. Application of this methodology has revealed in-depth, fine details about the performance of the segmentation algorithms which currently existing methods could not achieve due to the absence of a large, unbiased set of interactions. Practical application of the methodology for a number of algorithms and diverse interaction modes have shown its feasibility and generality for it to be established as an appropriate methodology. Development of this methodology to be used as a potential application for automatic evaluation of the performance of SIS algorithms looks very promising for users of image segmentation

    Inverse rendering techniques for physically grounded image editing

    Get PDF
    From a single picture of a scene, people can typically grasp the spatial layout immediately and even make good guesses at materials properties and where light is coming from to illuminate the scene. For example, we can reliably tell which objects occlude others, what an object is made of and its rough shape, regions that are illuminated or in shadow, and so on. It is interesting how little is known about our ability to make these determinations; as such, we are still not able to robustly "teach" computers to make the same high-level observations as people. This document presents algorithms for understanding intrinsic scene properties from single images. The goal of these inverse rendering techniques is to estimate the configurations of scene elements (geometry, materials, luminaires, camera parameters, etc) using only information visible in an image. Such algorithms have applications in robotics and computer graphics. One such application is in physically grounded image editing: photo editing made easier by leveraging knowledge of the physical space. These applications allow sophisticated editing operations to be performed in a matter of seconds, enabling seamless addition, removal, or relocation of objects in images

    Labour in a Single Shot

    Get PDF
    This collection of essays offers a critical assessment of Labour in a Single Shot, a groundbreaking documentary video workshop. From 2011 to 2014, curator Antje Ehmann and film- and videomaker Harun Farocki produced an art project of truly global proportions. They travelled to fifteen cities around the world to conduct workshops inspired by cinema history’s first film, Workers Leaving the Lumiùre Factory, shot in 1895 by the Lumiùre brothers in France. While the workshop videos are in colour and the camera was not required to remain static, Ehmann and Farocki’s students were tasked with honouring the original Lumiùre film’s basic parameters of theme and style. The fascinating result is a collection of more than 550 short videos that have appeared in international exhibitions and on an open-access website, offering the widest possible audience the opportunity to ponder contemporary labour in multiple contexts around the world

    The Gospel according to no one and rewriting the South : Eudora Welty and the self-conscious Southern novel

    Get PDF
    PhD ThesisBoth my novel and the critical work explore Southern places, how they are defined and how Southern people imbue them with meaning—sometimes multiple and paradoxical meanings—and, in turn, how those places define them. In my novel, The Gospel According to No One, narratives tied to place are pitted against each other: New South versus Old South, fundamentalism versus liberalism, nihilism versus the mythic worldview, and the Agrarian Proprietary Ideal versus what some scholars see as the homogenizing forces of Late Capitalism. The struggles between these discourses threaten to undo order within the city. Those who survive forge new identities from the fragments of postmodernism, inventing new narratives about both themselves and the places they inhabit. My work on Eudora Welty also examines multiple Southern discourses. I argue that Welty’s self-conscious focus on reproductions of the South in Delta Wedding and The Optimist’s Daughter challenges the idea of a monolithic South, which also challenges any definitive categorization of Welty and her relationship to the imagined (the only ‘real’) South. In the bridging section of the work, I explain why I chose Welty as a subject of study, explore connections between postmodernism and Southern literature, suggest a definition of the South that is reflective of Place, and examine my creative work in light of the theoretical issues I have encountered
    corecore