Search CORE

39,249 research outputs found

Calipso: Physics-based Image and Video Editing through CAD Model Proxies

Author: Cotin Stephane
Courtecuisse Hadrien
Haouchine Nazim
Nießner Matthias
Roy Frederick
Publication venue
Publication date: 12/08/2017
Field of study

We present Calipso, an interactive method for editing images and videos in a physically-coherent manner. Our main idea is to realize physics-based manipulations by running a full physics simulation on proxy geometries given by non-rigidly aligned CAD models. Running these simulations allows us to apply new, unseen forces to move or deform selected objects, change physical parameters such as mass or elasticity, or even add entire new objects that interact with the rest of the underlying scene. In Calipso, the user makes edits directly in 3D; these edits are processed by the simulation and then transfered to the target 2D content using shape-to-image correspondences in a photo-realistic rendering process. To align the CAD models, we introduce an efficient CAD-to-image alignment procedure that jointly minimizes for rigid and non-rigid alignment while preserving the high-level structure of the input shape. Moreover, the user can choose to exploit image flow to estimate scene motion, producing coherent physical behavior with ambient dynamics. We demonstrate Calipso's physics-based editing on a wide range of examples producing myriad physical behavior while preserving geometric and visual consistency.Comment: 11 page

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Rennes 1

Semantic Photo Manipulation with a Generative Image Prior

Author: Bau David
Peebles William
Strobelt Hendrik
Torralba Antonio
Wulff Jonas
Zhou Bolei
Zhu Jun-Yan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/09/2020
Field of study

Despite the recent success of GANs in synthesizing images conditioned on inputs such as a user sketch, text, or semantic labels, manipulating the high-level attributes of an existing natural photograph with GANs is challenging for two reasons. First, it is hard for GANs to precisely reproduce an input image. Second, after manipulation, the newly synthesized pixels often do not fit the original image. In this paper, we address these issues by adapting the image prior learned by GANs to image statistics of an individual image. Our method can accurately reconstruct the input image and synthesize new content, consistent with the appearance of the input image. We demonstrate our interactive system on several semantic image editing tasks, including synthesizing new objects consistent with background, removing unwanted objects, and changing the appearance of an object. Quantitative and qualitative comparisons against several existing methods demonstrate the effectiveness of our method.Comment: SIGGRAPH 201

arXiv.org e-Print Archive

DSpace@MIT

Text-based Editing of Talking-head Video

Author: Agrawala M.
Finkelstein A.
Fried O.
Genova K.
Goldman D.
Jin Z.
Shechtman E.
Tewari A.
Theobalt C.
Zollhöfer M.
Publication venue
Publication date: 01/01/2019
Field of study

Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript. We demonstrate a large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis

MPG.PuRe

High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

Author: Catanzaro Bryan
Kautz Jan
Liu Ming-Yu
Tao Andrew
Wang Ting-Chun
Zhu Jun-Yan
Publication venue
Publication date: 20/08/2018
Field of study

We present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs). Conditional GANs have enabled a variety of applications, but the results are often limited to low-resolution and still far from realistic. In this work, we generate 2048x1024 visually appealing results with a novel adversarial loss, as well as new multi-scale generator and discriminator architectures. Furthermore, we extend our framework to interactive visual manipulation with two additional features. First, we incorporate object instance segmentation information, which enables object manipulations such as removing/adding objects and changing the object category. Second, we propose a method to generate diverse results given the same input, allowing users to edit the object appearance interactively. Human opinion studies demonstrate that our method significantly outperforms existing methods, advancing both the quality and the resolution of deep image synthesis and editing.Comment: v2: CVPR camera ready, adding more results for edge-to-photo example

arXiv.org e-Print Archive

Crossref

ImageSpirit: Verbal Guided Image Parsing

Author: Cheng Ming-Ming
Crook Nigel
Lin Wen-Yan
Mitra Niloy
Sturgess Paul
Torr Philip
Vineet Vibhav
Warrell Jonathan
Zheng Shuai
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Humans describe images in terms of nouns and adjectives while algorithms operate on images represented as sets of pixels. Bridging this gap between how humans would like to access images versus their typical representation is the goal of image parsing, which involves assigning object and attribute labels to pixel. In this paper we propose treating nouns as object labels and adjectives as visual attribute labels. This allows us to formulate the image parsing problem as one of jointly estimating per-pixel object and attribute labels from a set of training images. We propose an efficient (interactive time) solution. Using the extracted labels as handles, our system empowers a user to verbally refine the results. This enables hands-free parsing of an image into pixel-wise object/attribute labels that correspond to human semantics. Verbally selecting objects of interests enables a novel and natural interaction modality that can possibly be used to interact with new generation devices (e.g. smart phones, Google Glass, living room devices). We demonstrate our system on a large number of real-world images with varying complexity. To help understand the tradeoffs compared to traditional mouse based interactions, results are reported for both a large scale quantitative evaluation and a user study.Comment: http://mmcheng.net/imagespirit

arXiv.org e-Print Archive

CiteSeerX

Crossref

Institutional Knowledge at Singapore Management University

UCL Discovery

Oxford University Research Archive

Oxford Brookes University: RADAR

Creating virtual models from uncalibrated camera views

Author: Brisc Felicia
Whelan Paul F.
Publication venue
Publication date: 01/01/2004
Field of study

The reconstruction of photorealistic 3D models from camera views is becoming an ubiquitous element in many applications that simulate physical interaction with the real world. In this paper, we present a low-cost, interactive pipeline aimed at non-expert users, that achieves 3D reconstruction from multiple views acquired with a standard digital camera. 3D models are amenable to access through diverse representation modalities that typically imply trade-offs between level of detail, interaction, and computational costs. Our approach allows users to selectively control the complexity of different surface regions, while requiring only simple 2D image editing operations. An initial reconstruction at coarse resolution is followed by an iterative refining of the surface areas corresponding to the selected regions

CiteSeerX

Irish Universities

DCU Online Research Access Service

(MU-CTL-01-12) Towards Model Driven Game Engineering in SimSYS: Requirements for the Agile Software Development Process Game

Author: Cooper Kendra M. L.
Longstreet C. Shaun
Publication venue: e-Publications@Marquette
Publication date: 01/03/2012
Field of study

Software Engineering (SE) and Systems Engineering (Sys) are knowledge intensive, specialized, rapidly changing disciplines; their educational infrastructure faces significant challenges including the need to rapidly, widely, and cost effectively introduce new or revised course material; encourage the broad participation of students; address changing student motivations and attitudes; support undergraduate, graduate and lifelong learning; and incorporate the skills needed by industry. Games have a reputation for being fun and engaging; more importantly immersive, requiring deep thinking and complex problem solving. We believe educational games are essential in the next generation of e-learning tools. An extensible, freely available, engaging, problem-based game platform that provides students with an interactive simulated experience closely resembling the activities performed in a (real) industry development project would transform the SE/Sys education infrastructure. Our goal is to extend the state-of-the-art research in SE/Sys education by investigating a game development platform (GDP) from an interdisciplinary perspective (education, game research, and software/systems engineering). A meta-model has been proposed to provide a rigourous foundation that integrates the three disciplines. The GDP is intended to support the semi-automated development of collections of scripted games and their execution, where each game embodies a specific set of learning objectives. The games are scripted using a template based approach. The templates integrate three approaches: use cases; storyboards; and state machines (timed, concurrent, hierarchical state machines). The specification templates capture the structure of the game (Game, Acts, Scenes, Screens, Challenges), storyline, characters (player, non-player, external), graphics, music/sound effects, rules, and so on. The instantiated templates are (manually) transformed into XML game scripts that can be loaded into the SimSYS Game Play Engine. As a game is played, the game play events are logged; they are analyzed to automatically assess a player’s accomplishments and automatically adapt the game play script. Currently, we are manually defining a collection of games. The games are being used to ensure the GDP is flexible and reliable (i.e., the prototype can load and correctly run a variety of game scripts), the ontology is comprehensive, and the templates assist in defining well-organized, modular game scripts. In this report, we present the initial part of an Agile Software Development Process game (Act I, Scenes 1 and 2) that embodies learning objectives related to SE fundamentals (requirements, architecture, testing, process); planning with Gantt charts; working with budgets; and selecting a team for an agile development project. A student player is rewarded in the game by getting hired, scoring points, or getting promoted to lead a project. The game has a variety of settings including a classroom, job fair, and a work environment with meeting rooms, cubicles, and a water cooler station. The main non-player characters include a teacher, boss, and an evil peer. In the future, semi-automated support for creating new game scripts will be explored using a wizard interface. The templates will be formally defined, supporting automated transformation into XML game scripts that can be loaded into the SimSYS Game Engine. We also plan to explore transforming the requirements into a notation that can be imported into a commercial tool that supports Statechart simulation

epublications@Marquette