270 research outputs found

    Selectively De-animating and Stabilizing Videos

    Full text link
    This thesis presents three systems for editing the motion of videos. First, selectively de-animating videos seeks to remove the large-scale motions of one or more objects so that other motions are easier to see. The user draws strokes to indicate the regions that should be immobilized, and our algorithm warps the video to remove large-scale motion in regions while leaving finer-scale, relative motions intact. We then use a graph-cut-based optimization to composite the warped video with still frames from the input video to remove unwanted background motion. Our technique enables applications such as clearer motion visualization, simpler creation of artistic cinemagraphs, and new ways to edit appearance and motion paths in video. Second, we design a fully automatic system to create portrait cinemagraphs by tracking facial features and de-animating the video with respect to the face and torso. We then generate compositing weights automatically to create the final cinemagraph portraits.Third, we present a user-assisted video stabilization algorithm that is able to stabilize challenging videos when state-of-the-art automatic algorithms fail to generate a satisfactory result. Our system introduces two new modes of interaction that allow the user to improve an unsatisfactory automatically stabilized video. First, we cluster tracks and visualize them on the warped video. The user ensures that appropriate tracks are selected by clicking on track clusters to include or exclude them to guide the stabilization. Second, the user can directly specify how regions in the output video should look by drawing quadrilaterals to select and deform parts of the frame. Our algorithm then computes a stabilized video using the user-selected tracks, while respecting the user-modified regions

    The Video Mesh: A Data Structure for Image-based Three-dimensional Video Editing

    Get PDF
    This paper introduces the video mesh, a data structure for representing video as 2.5D “paper cutouts.” The video mesh allows interactive editing of moving objects and modeling of depth, which enables 3D effects and post-exposure camera control. The video mesh sparsely encodes optical flow as well as depth, and handles occlusion using local layering and alpha mattes. Motion is described by a sparse set of points tracked over time. Each point also stores a depth value. The video mesh is a triangulation over this point set and per-pixel information is obtained by interpolation. The user rotoscopes occluding contours and we introduce an algorithm to cut the video mesh along them. Object boundaries are refined with per-pixel alpha values. The video mesh is at its core a set of texture mapped triangles, we leverage graphics hardware to enable interactive editing and rendering of a variety of effects. We demonstrate the effectiveness of our representation with special effects such as 3D viewpoint changes, object insertion, depth-of-field manipulation, and 2D to 3D video conversion

    Sim2real transfer learning for 3D human pose estimation: motion to the rescue

    Full text link
    Synthetic visual data can provide practically infinite diversity and rich labels, while avoiding ethical issues with privacy and bias. However, for many tasks, current models trained on synthetic data generalize poorly to real data. The task of 3D human pose estimation is a particularly interesting example of this sim2real problem, because learning-based approaches perform reasonably well given real training data, yet labeled 3D poses are extremely difficult to obtain in the wild, limiting scalability. In this paper, we show that standard neural-network approaches, which perform poorly when trained on synthetic RGB images, can perform well when the data is pre-processed to extract cues about the person's motion, notably as optical flow and the motion of 2D keypoints. Therefore, our results suggest that motion can be a simple way to bridge a sim2real gap when video is available. We evaluate on the 3D Poses in the Wild dataset, the most challenging modern benchmark for 3D pose estimation, where we show full 3D mesh recovery that is on par with state-of-the-art methods trained on real 3D sequences, despite training only on synthetic humans from the SURREAL dataset.Comment: Accepted at NeurIPS 201

    New editing techniques for video post-processing

    Get PDF
    This thesis contributes to capturing 3D cloth shape, editing cloth texture and altering object shape and motion in multi-camera and monocular video recordings. We propose a technique to capture cloth shape from a 3D scene flow by determining optical flow in several camera views. Together with a silhouette matching constraint we can track and reconstruct cloth surfaces in long video sequences. In the area of garment motion capture, we present a system to reconstruct time-coherent triangle meshes from multi-view video recordings. Texture mapping of the acquired triangle meshes is used to replace the recorded texture with new cloth patterns. We extend this work to the more challenging single camera view case. Extracting texture deformation and shading effects simultaneously enables us to achieve texture replacement effects for garments in monocular video recordings. Finally, we propose a system for the keyframe editing of video objects. A color-based segmentation algorithm together with automatic video inpainting for filling in missing background texture allows us to edit the shape and motion of 2D video objects. We present examples for altering object trajectories, applying non-rigid deformation and simulating camera motion.In dieser Dissertation stellen wir Beiträge zur 3D-Rekonstruktion von Stoffoberfächen, zum Editieren von Stofftexturen und zum Editieren von Form und Bewegung von Videoobjekten in Multikamera- und Einkamera-Aufnahmen vor. Wir beschreiben eine Methode für die 3D-Rekonstruktion von Stoffoberflächen, die auf der Bestimmung des optischen Fluß in mehreren Kameraansichten basiert. In Kombination mit einem Abgleich der Objektsilhouetten im Video und in der Rekonstruktion erhalten wir Rekonstruktionsergebnisse für längere Videosequenzen. Für die Rekonstruktion von Kleidungsstücken beschreiben wir ein System, das zeitlich kohärente Dreiecksnetze aus Multikamera-Aufnahmen rekonstruiert. Mittels Texturemapping der erhaltenen Dreiecksnetze wird die Stofftextur in der Aufnahme mit neuen Texturen ersetzt. Wir setzen diese Arbeit fort, indem wir den anspruchsvolleren Fall mit nur einer einzelnen Videokamera betrachten. Um realistische Resultate beim Ersetzen der Textur zu erzielen, werden sowohl Texturdeformationen durch zugrundeliegende Deformation der Oberfläche als auch Beleuchtungseffekte berücksichtigt. Im letzten Teil der Dissertation stellen wir ein System zum Editieren von Videoobjekten mittels Keyframes vor. Dies wird durch eine Kombination eines farbbasierten Segmentierungsalgorithmus mit automatischem Auffüllen des Hintergrunds erreicht, wodurch Form und Bewegung von 2D-Videoobjekten editiert werden können. Wir zeigen Beispiele für editierte Objekttrajektorien, beliebige Deformationen und simulierte Kamerabewegung

    Data-driven methods for interactive visual content creation and manipulation

    Get PDF
    Software tools for creating and manipulating visual content --- be they for images, video or 3D models --- are often difficult to use and involve a lot of manual interaction at several stages of the process. Coupled with long processing and acquisition times, content production is rather costly and poses a potential barrier to many applications. Although cameras now allow anyone to easily capture photos and video, tools for manipulating such media demand both artistic talent and technical expertise. However, at the same time, vast corpuses with existing visual content such as Flickr, YouTube or Google 3D Warehouse are now available and easily accessible. This thesis proposes a data-driven approach to tackle the above mentioned problems encountered in content generation. To this end, statistical models trained on semantic knowledge harvested from existing visual content corpuses are created. Using these models, we then develop tools which are easy to learn and use, even by novice users, but still produce high-quality content. These tools have intuitive interfaces, and enable the user to have precise and flexible control. Specifically, we apply our models to create tools to simplify the tasks of video manipulation, 3D modeling and material assignment to 3D objects.Softwarewerkzeuge zum Erstellen und Bearbeiten von visuellen Inhalten --- seien es Bilder, Videos oder 3D-Modelle --- sind häufig schwierig zu bedienen und erfordern viel manuelle Interaktion an verschiedenen Stellen des Verfahrens. In Verbindung mit langen Bearbeitungs- und Erfassungszeiten ist die Erzeugung von Inhalten eher aufwendig und stellt ein potentielles Hindernis für viele Anwendungen dar. Obwohl heute Kameras jedem Anwender auf einfache Art und Weise erlauben Bilder und Videos aufzunehmen, erfordern Werkzeuge zur Bearbeitung dieser sowohl künstlerisches Talent, als auch technische Kompetenz. Gleichzeitig sind riesige Korpora mit bereits vorhandenen visuellen Inhalten, wie zum Beispiel Flickr, Youtube oder Google 3D Warehouse, verfügbar und leicht zugänglich. Diese Arbeit stellt einen datengetriebenen Ansatz vor, der die erwähnten Probleme der Inhaltserzeugung behandelt. Zu diesem Zweck werden statistische Modelle erzeugt, die auf semantischem Wissen trainiert worden sind, welches aus bestehenden Korpora von visuellen Inhalten gesammelt worden ist. Durch die Verwendung dieser Modelle ist es möglich Werkzeuge zu entwickeln, die sogar von unerfahrenen Anwendern einfach zu erlernen und zu benutzen sind, aber dennoch qualitativ hochwertige Inhalte produzieren. Diese Werkzeuge haben intuitive Benutzeroberflächen und geben dem Benutzer eine präzise und flexible Kontrolle. Insbesondere werden die Modelle eingesetzt, um Werkzeuge zu erzeugen, die Aufgaben Videobearbeitung, 3D-Modellerstellung und Materialzuweisung zu 3D-Modellen vereinfachen

    Data-driven methods for interactive visual content creation and manipulation

    Get PDF
    Software tools for creating and manipulating visual content --- be they for images, video or 3D models --- are often difficult to use and involve a lot of manual interaction at several stages of the process. Coupled with long processing and acquisition times, content production is rather costly and poses a potential barrier to many applications. Although cameras now allow anyone to easily capture photos and video, tools for manipulating such media demand both artistic talent and technical expertise. However, at the same time, vast corpuses with existing visual content such as Flickr, YouTube or Google 3D Warehouse are now available and easily accessible. This thesis proposes a data-driven approach to tackle the above mentioned problems encountered in content generation. To this end, statistical models trained on semantic knowledge harvested from existing visual content corpuses are created. Using these models, we then develop tools which are easy to learn and use, even by novice users, but still produce high-quality content. These tools have intuitive interfaces, and enable the user to have precise and flexible control. Specifically, we apply our models to create tools to simplify the tasks of video manipulation, 3D modeling and material assignment to 3D objects.Softwarewerkzeuge zum Erstellen und Bearbeiten von visuellen Inhalten --- seien es Bilder, Videos oder 3D-Modelle --- sind häufig schwierig zu bedienen und erfordern viel manuelle Interaktion an verschiedenen Stellen des Verfahrens. In Verbindung mit langen Bearbeitungs- und Erfassungszeiten ist die Erzeugung von Inhalten eher aufwendig und stellt ein potentielles Hindernis für viele Anwendungen dar. Obwohl heute Kameras jedem Anwender auf einfache Art und Weise erlauben Bilder und Videos aufzunehmen, erfordern Werkzeuge zur Bearbeitung dieser sowohl künstlerisches Talent, als auch technische Kompetenz. Gleichzeitig sind riesige Korpora mit bereits vorhandenen visuellen Inhalten, wie zum Beispiel Flickr, Youtube oder Google 3D Warehouse, verfügbar und leicht zugänglich. Diese Arbeit stellt einen datengetriebenen Ansatz vor, der die erwähnten Probleme der Inhaltserzeugung behandelt. Zu diesem Zweck werden statistische Modelle erzeugt, die auf semantischem Wissen trainiert worden sind, welches aus bestehenden Korpora von visuellen Inhalten gesammelt worden ist. Durch die Verwendung dieser Modelle ist es möglich Werkzeuge zu entwickeln, die sogar von unerfahrenen Anwendern einfach zu erlernen und zu benutzen sind, aber dennoch qualitativ hochwertige Inhalte produzieren. Diese Werkzeuge haben intuitive Benutzeroberflächen und geben dem Benutzer eine präzise und flexible Kontrolle. Insbesondere werden die Modelle eingesetzt, um Werkzeuge zu erzeugen, die Aufgaben Videobearbeitung, 3D-Modellerstellung und Materialzuweisung zu 3D-Modellen vereinfachen

    Die Virtuelle Videokamera: ein System zur Blickpunktsynthese in beliebigen, dynamischen Szenen

    Get PDF
    The Virtual Video Camera project strives to create free viewpoint video from casually captured multi-view data. Multiple video streams of a dynamic scene are captured with off-the-shelf camcorders, and the user can re-render the scene from novel perspectives. In this thesis the algorithmic core of the Virtual Video Camera is presented. This includes the algorithm for image correspondence estimation as well as the image-based renderer. Furthermore, its application in the context of an actual video production is showcased, and the rendering and image processing pipeline is extended to incorporate depth information.Das Virtual Video Camera Projekt dient der Erzeugung von Free Viewpoint Video Ansichten von Multi-View Aufnahmen: Material mehrerer Videoströme wird hierzu mit handelsüblichen Camcordern aufgezeichnet. Im Anschluss kann die Szene aus beliebigen, von den ursprünglichen Kameras nicht abgedeckten Blickwinkeln betrachtet werden. In dieser Dissertation wird der algorithmische Kern der Virtual Video Camera vorgestellt. Dies beinhaltet das Verfahren zur Bildkorrespondenzschätzung sowie den bildbasierten Renderer. Darüber hinaus wird die Anwendung im Kontext einer Videoproduktion beleuchtet. Dazu wird die bildbasierte Erzeugung neuer Blickpunkte um die Erzeugung und Einbindung von Tiefeninformationen erweitert

    Efficient data structures for piecewise-smooth video processing

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 95-102).A number of useful image and video processing techniques, ranging from low level operations such as denoising and detail enhancement to higher level methods such as object manipulation and special effects, rely on piecewise-smooth functions computed from the input data. In this thesis, we present two computationally efficient data structures for representing piecewise-smooth visual information and demonstrate how they can dramatically simplify and accelerate a variety of video processing algorithms. We start by introducing the bilateral grid, an image representation that explicitly accounts for intensity edges. By interpreting brightness values as Euclidean coordinates, the bilateral grid enables simple expressions for edge-aware filters. Smooth functions defined on the bilateral grid are piecewise-smooth in image space. Within this framework, we derive efficient reinterpretations of a number of edge-aware filters commonly used in computational photography as operations on the bilateral grid, including the bilateral filter, edgeaware scattered data interpolation, and local histogram equalization. We also show how these techniques can be easily parallelized onto modern graphics hardware for real-time processing of high definition video. The second data structure we introduce is the video mesh, designed as a flexible central data structure for general-purpose video editing. It represents objects in a video sequence as 2.5D "paper cutouts" and allows interactive editing of moving objects and modeling of depth, which enables 3D effects and post-exposure camera control. In our representation, we assume that motion and depth are piecewise-smooth, and encode them sparsely as a set of points tracked over time. The video mesh is a triangulation over this point set and per-pixel information is obtained by interpolation. To handle occlusions and detailed object boundaries, we rely on the user to rotoscope the scene at a sparse set of frames using spline curves. We introduce an algorithm to robustly and automatically cut the mesh into local layers with proper occlusion topology, and propagate the splines to the remaining frames. Object boundaries are refined with per-pixel alpha mattes. At its core, the video mesh is a collection of texture-mapped triangles, which we can edit and render interactively using graphics hardware. We demonstrate the effectiveness of our representation with special effects such as 3D viewpoint changes, object insertion, depthof- field manipulation, and 2D to 3D video conversion.by Jiawen Chen.Ph.D

    Software Takes Command

    Get PDF
    This book is available as open access through the Bloomsbury Open Access programme and is available on www.bloomsburycollections.com. Software has replaced a diverse array of physical, mechanical, and electronic technologies used before 21st century to create, store, distribute and interact with cultural artifacts. It has become our interface to the world, to others, to our memory and our imagination - a universal language through which the world speaks, and a universal engine on which the world runs. What electricity and combustion engine were to the early 20th century, software is to the early 21st century. Offering the the first theoretical and historical account of software for media authoring and its effects on the practice and the very concept of 'media,' the author of The Language of New Media (2001) develops his own theory for this rapidly-growing, always-changing field. What was the thinking and motivations of people who in the 1960 and 1970s created concepts and practical techniques that underlie contemporary media software such as Photoshop, Illustrator, Maya, Final Cut and After Effects? How do their interfaces and tools shape the visual aesthetics of contemporary media and design? What happens to the idea of a 'medium' after previously media-specific tools have been simulated and extended in software? Is it still meaningful to talk about different mediums at all? Lev Manovich answers these questions and supports his theoretical arguments by detailed analysis of key media applications such as Photoshop and After Effects, popular web services such as Google Earth, and the projects in motion graphics, interactive environments, graphic design and architecture. Software Takes Command is a must for all practicing designers and media artists and scholars concerned with contemporary media

    Irish Machine Vision and Image Processing Conference Proceedings 2017

    Get PDF
    • …
    corecore