39,249 research outputs found
Calipso: Physics-based Image and Video Editing through CAD Model Proxies
We present Calipso, an interactive method for editing images and videos in a
physically-coherent manner. Our main idea is to realize physics-based
manipulations by running a full physics simulation on proxy geometries given by
non-rigidly aligned CAD models. Running these simulations allows us to apply
new, unseen forces to move or deform selected objects, change physical
parameters such as mass or elasticity, or even add entire new objects that
interact with the rest of the underlying scene. In Calipso, the user makes
edits directly in 3D; these edits are processed by the simulation and then
transfered to the target 2D content using shape-to-image correspondences in a
photo-realistic rendering process. To align the CAD models, we introduce an
efficient CAD-to-image alignment procedure that jointly minimizes for rigid and
non-rigid alignment while preserving the high-level structure of the input
shape. Moreover, the user can choose to exploit image flow to estimate scene
motion, producing coherent physical behavior with ambient dynamics. We
demonstrate Calipso's physics-based editing on a wide range of examples
producing myriad physical behavior while preserving geometric and visual
consistency.Comment: 11 page
Semantic Photo Manipulation with a Generative Image Prior
Despite the recent success of GANs in synthesizing images conditioned on
inputs such as a user sketch, text, or semantic labels, manipulating the
high-level attributes of an existing natural photograph with GANs is
challenging for two reasons. First, it is hard for GANs to precisely reproduce
an input image. Second, after manipulation, the newly synthesized pixels often
do not fit the original image. In this paper, we address these issues by
adapting the image prior learned by GANs to image statistics of an individual
image. Our method can accurately reconstruct the input image and synthesize new
content, consistent with the appearance of the input image. We demonstrate our
interactive system on several semantic image editing tasks, including
synthesizing new objects consistent with background, removing unwanted objects,
and changing the appearance of an object. Quantitative and qualitative
comparisons against several existing methods demonstrate the effectiveness of
our method.Comment: SIGGRAPH 201
Text-based Editing of Talking-head Video
Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript. We demonstrate a large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
We present a new method for synthesizing high-resolution photo-realistic
images from semantic label maps using conditional generative adversarial
networks (conditional GANs). Conditional GANs have enabled a variety of
applications, but the results are often limited to low-resolution and still far
from realistic. In this work, we generate 2048x1024 visually appealing results
with a novel adversarial loss, as well as new multi-scale generator and
discriminator architectures. Furthermore, we extend our framework to
interactive visual manipulation with two additional features. First, we
incorporate object instance segmentation information, which enables object
manipulations such as removing/adding objects and changing the object category.
Second, we propose a method to generate diverse results given the same input,
allowing users to edit the object appearance interactively. Human opinion
studies demonstrate that our method significantly outperforms existing methods,
advancing both the quality and the resolution of deep image synthesis and
editing.Comment: v2: CVPR camera ready, adding more results for edge-to-photo example
ImageSpirit: Verbal Guided Image Parsing
Humans describe images in terms of nouns and adjectives while algorithms
operate on images represented as sets of pixels. Bridging this gap between how
humans would like to access images versus their typical representation is the
goal of image parsing, which involves assigning object and attribute labels to
pixel. In this paper we propose treating nouns as object labels and adjectives
as visual attribute labels. This allows us to formulate the image parsing
problem as one of jointly estimating per-pixel object and attribute labels from
a set of training images. We propose an efficient (interactive time) solution.
Using the extracted labels as handles, our system empowers a user to verbally
refine the results. This enables hands-free parsing of an image into pixel-wise
object/attribute labels that correspond to human semantics. Verbally selecting
objects of interests enables a novel and natural interaction modality that can
possibly be used to interact with new generation devices (e.g. smart phones,
Google Glass, living room devices). We demonstrate our system on a large number
of real-world images with varying complexity. To help understand the tradeoffs
compared to traditional mouse based interactions, results are reported for both
a large scale quantitative evaluation and a user study.Comment: http://mmcheng.net/imagespirit
Creating virtual models from uncalibrated camera views
The reconstruction of photorealistic 3D models from camera views is becoming an ubiquitous element in many applications that simulate physical interaction with the real world. In this paper, we present a low-cost, interactive pipeline aimed at non-expert users, that achieves 3D reconstruction from multiple views acquired with a standard digital camera. 3D models are amenable to access through diverse representation modalities that typically imply trade-offs between level of detail, interaction, and computational costs. Our approach allows users to selectively control the complexity of different surface regions, while requiring only simple 2D image editing operations. An initial reconstruction at coarse resolution is followed by an iterative refining of the surface areas corresponding to the selected regions
(MU-CTL-01-12) Towards Model Driven Game Engineering in SimSYS: Requirements for the Agile Software Development Process Game
Software Engineering (SE) and Systems Engineering (Sys) are knowledge intensive, specialized, rapidly changing disciplines; their educational infrastructure faces significant challenges including the need to rapidly, widely, and cost effectively introduce new or revised course material; encourage the broad participation of students; address changing student motivations and attitudes; support undergraduate, graduate and lifelong learning; and incorporate the skills needed by industry. Games have a reputation for being fun and engaging; more importantly immersive, requiring deep thinking and complex problem solving. We believe educational games are essential in the next generation of e-learning tools. An extensible, freely available, engaging, problem-based game platform that provides students with an interactive simulated experience closely resembling the activities performed in a (real) industry development project would transform the SE/Sys education infrastructure.
Our goal is to extend the state-of-the-art research in SE/Sys education by investigating a game development platform (GDP) from an interdisciplinary perspective (education, game research, and software/systems engineering). A meta-model has been proposed to provide a rigourous foundation that integrates the three disciplines. The GDP is intended to support the semi-automated development of collections of scripted games and their execution, where each game embodies a specific set of learning objectives. The games are scripted using a template based approach. The templates integrate three approaches: use cases; storyboards; and state machines (timed, concurrent, hierarchical state machines). The specification templates capture the structure of the game (Game, Acts, Scenes, Screens, Challenges), storyline, characters (player, non-player, external), graphics, music/sound effects, rules, and so on. The instantiated templates are (manually) transformed into XML game scripts that can be loaded into the SimSYS Game Play Engine. As a game is played, the game play events are logged; they are analyzed to automatically assess a player’s accomplishments and automatically adapt the game play script.
Currently, we are manually defining a collection of games. The games are being used to ensure the GDP is flexible and reliable (i.e., the prototype can load and correctly run a variety of game scripts), the ontology is comprehensive, and the templates assist in defining well-organized, modular game scripts. In this report, we present the initial part of an Agile Software Development Process game (Act I, Scenes 1 and 2) that embodies learning objectives related to SE fundamentals (requirements, architecture, testing, process); planning with Gantt charts; working with budgets; and selecting a team for an agile development project. A student player is rewarded in the game by getting hired, scoring points, or getting promoted to lead a project. The game has a variety of settings including a classroom, job fair, and a work environment with meeting rooms, cubicles, and a water cooler station. The main non-player characters include a teacher, boss, and an evil peer.
In the future, semi-automated support for creating new game scripts will be explored using a wizard interface. The templates will be formally defined, supporting automated transformation into XML game scripts that can be loaded into the SimSYS Game Engine. We also plan to explore transforming the requirements into a notation that can be imported into a commercial tool that supports Statechart simulation
- …