11 research outputs found

    Generalized roof duality and bisubmodular functions

    Full text link
    Consider a convex relaxation f^\hat f of a pseudo-boolean function ff. We say that the relaxation is {\em totally half-integral} if f^(x)\hat f(x) is a polyhedral function with half-integral extreme points xx, and this property is preserved after adding an arbitrary combination of constraints of the form xi=xjx_i=x_j, xi=1xjx_i=1-x_j, and xi=γx_i=\gamma where \gamma\in\{0, 1, 1/2} is a constant. A well-known example is the {\em roof duality} relaxation for quadratic pseudo-boolean functions ff. We argue that total half-integrality is a natural requirement for generalizations of roof duality to arbitrary pseudo-boolean functions. Our contributions are as follows. First, we provide a complete characterization of totally half-integral relaxations f^\hat f by establishing a one-to-one correspondence with {\em bisubmodular functions}. Second, we give a new characterization of bisubmodular functions. Finally, we show some relationships between general totally half-integral relaxations and relaxations based on the roof duality.Comment: 14 pages. Shorter version to appear in NIPS 201

    Globally optimal pixel labeling algorithms for tree metrics

    Full text link

    Dense Corresspondence Estimation for Image Interpolation

    Get PDF
    We evaluate the current state-of-the-art in dense correspondence estimation for the use in multi-image interpolation algorithms. The evaluation is carried out on three real-world scenes and one synthetic scene, each featuring varying challenges for dense correspondence estimation. The primary focus of our study is on the perceptual quality of the interpolation sequences created from the estimated flow fields. Perceptual plausibility is assessed by means of a psychophysical userstudy. Our results show that current state-of-the-art in dense correspondence estimation does not produce visually plausible interpolations.In diesem Bericht evaluieren wir den gegenwärtigen Stand der Technik in dichter Korrespondenzschätzung hinsichtlich der Eignung für die Nutzung in Algorithmen zur Zwischenbildsynthese. Die Auswertung erfolgt auf drei realen und einer synthetischen Szene mit variierenden Herausforderungen für Algorithmen zur Korrespondenzschätzung. Mittels einer perzeptuellen Benutzerstudie werten wir die wahrgenommene Qualität der interpolierten Bildsequenzen aus. Unsere Ergebnisse zeigen dass der gegenwärtige Stand der Technik in dichter Korrespondezschätzung nicht für die Zwischenbildsynthese geeignet ist

    Transformation of General Binary MRF Minimization to the First-Order Case

    Full text link

    Algorithmen zur Korrespondenzschätzung und Bildinterpolation für die photorealistische Bildsynthese

    Get PDF
    Free-viewpoint video is a new form of visual medium that has received considerable attention in the last 10 years. Most systems reconstruct the geometry of the scene, thus restricting themselves to synchronized multi-view footage and Lambertian scenes. In this thesis we follow a different approach and describe contributions to a purely image-based end-to-end system operating on sparse, unsynchronized multi-view footage. In particular, we focus on dense correspondence estimation and synthesis of in-between views. In contrast to previous approaches, our correspondence estimation is specifically tailored to the needs of image interpolation; our multi-image interpolation technique advances the state-of-the-art by disposing the conventional blending step. Both algorithms are put to work in an image-based free-viewpoint video system and we demonstrate their applicability to space-time visual effects production as well as to stereoscopic content creation.3D-Video mit Blickpunktnavigation ist eine neues digitales Medium welchem die Forschung in den letzten 10 Jahren viel Aufmerksamkeit gewidmet hat. Die meisten Verfahren rekonstruieren dabei die Szenengeometrie und schränken sich somit auf Lambertsche Szenen und synchron aufgenommene Eingabedaten ein. In dieser Dissertation beschreiben wir Beiträge zu einem rein bild-basierten System welches auf unsynchronisierten Eingabevideos arbeitet. Unser Fokus liegt dabei auf der Schätzung dichter Korrespondenzkarten und auf der Synthese von Zwischenbildern. Im Gegensatz zu bisherigen Verfahren ist unser Ansatz der Korrespondenzschätzung auf die Bedürfnisse der Bilderinterpolation ausgerichtet; unsere Zwischenbildsynthese verzichtet auf das Überblenden der Eingabebilder zu Gunsten der Lösung eines Labelingproblems. Das resultierende System eignet sich sowohl zur Produktion räumlich-zeitlicher Spezialeffekte als auch zur Erzeugung stereoskopischer Videosequenzen

    Efficient multi-level scene understanding in videos

    No full text
    Automatic video parsing is a key step towards human-level dynamic scene understanding, and a fundamental problem in computer vision. A core issue in video understanding is to infer multiple scene properties of a video in an efficient and consistent manner. This thesis addresses the problem of holistic scene understanding from monocular videos, which jointly reason about semantic and geometric scene properties from multiple levels, including pixelwise annotation of video frames, object instance segmentation in spatio-temporal domain, and/or scene-level description in terms of scene categories and layouts. We focus on four main issues in the holistic video understanding: 1) what is the representation for consistent semantic and geometric parsing of videos? 2) how do we integrate high-level reasoning (e.g., objects) with pixel-wise video parsing? 3) how can we do efficient inference for multi-level video understanding? and 4) what is the representation learning strategy for efficient/cost-aware scene parsing? We discuss three multi-level video scene segmentation scenarios based on different aspects of scene properties and efficiency requirements. The first case addresses the problem of consistent geometric and semantic video segmentation for outdoor scenes. We propose a geometric scene layout representation, or a stage scene model, to efficiently capture the dependency between the semantic and geometric labels. We build a unified conditional random field for joint modeling of the semantic class, geometric label and the stage representation, and design an alternating inference algorithm to minimize the resulting energy function. The second case focuses on the problem of simultaneous pixel-level and object-level segmentation in videos. We propose to incorporate foreground object information into pixel labeling by jointly reasoning semantic labels of supervoxels, object instance tracks and geometric relations between objects. In order to model objects, we take an exemplar approach based on a small set of object annotations to generate a set of object proposals. We then design a conditional random field framework that jointly models the supervoxel labels and object instance segments. To scale up our method, we develop an active inference strategy to improve the efficiency of multi-level video parsing, which adaptively selects an informative subset of object proposals and performs inference on the resulting compact model. The last case explores the problem of learning a flexible representation for efficient scene labeling. We propose a dynamic hierarchical model that allows us to achieve flexible trade-offs between efficiency and accuracy. Our approach incorporates the cost of feature computation and model inference, and optimizes the model performance for any given test-time budget. We evaluate all our methods on several publicly available video and image semantic segmentation datasets, and demonstrate superior performance in efficiency and accuracy. Keywords: Semantic video segmentation, Multi-level scene understanding, Efficient inference, Cost-aware scene parsin
    corecore