3,523 research outputs found
Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from a Single Image
We study the problem of synthesizing a long-term dynamic video from only a
single image. This is challenging since it requires consistent visual content
movements given large camera motions. Existing methods either hallucinate
inconsistent perpetual views or struggle with long camera trajectories. To
address these issues, it is essential to estimate the underlying 4D (including
3D geometry and scene motion) and fill in the occluded regions. To this end, we
present Make-It-4D, a novel method that can generate a consistent long-term
dynamic video from a single image. On the one hand, we utilize layered depth
images (LDIs) to represent a scene, and they are then unprojected to form a
feature point cloud. To animate the visual content, the feature point cloud is
displaced based on the scene flow derived from motion estimation and the
corresponding camera pose. Such 4D representation enables our method to
maintain the global consistency of the generated dynamic video. On the other
hand, we fill in the occluded regions by using a pretrained diffusion model to
inpaint and outpaint the input image. This enables our method to work under
large camera motions. Benefiting from our design, our method can be
training-free which saves a significant amount of training time. Experimental
results demonstrate the effectiveness of our approach, which showcases
compelling rendering results.Comment: accepted by ACM MM'2
Interpretation at the controller's edge: designing graphical user interfaces for the digital publication of the excavations at Gabii (Italy)
This paper discusses the authors’ approach to designing an interface for the Gabii Project’s digital volumes that attempts to fuse elements of traditional synthetic publications and site reports with rich digital datasets. Archaeology, and classical archaeology in particular, has long engaged with questions of the formation and lived experience of towns and cities. Such studies might draw on evidence of local topography, the arrangement of the built environment, and the placement of architectural details, monuments and inscriptions (e.g. Johnson and Millett 2012). Fundamental to the continued development of these studies is the growing body of evidence emerging from new excavations. Digital techniques for recording evidence “on the ground,” notably SFM (structure from motion aka close range photogrammetry) for the creation of detailed 3D models and for scene-level modeling in 3D have advanced rapidly in recent years. These parallel developments have opened the door for approaches to the study of the creation and experience of urban space driven by a combination of scene-level reconstruction models (van Roode et al. 2012, Paliou et al. 2011, Paliou 2013) explicitly combined with detailed SFM or scanning based 3D models representing stratigraphic evidence. It is essential to understand the subtle but crucial impact of the design of the user interface on the interpretation of these models. In this paper we focus on the impact of design choices for the user interface, and make connections between design choices and the broader discourse in archaeological theory surrounding the practice of the creation and consumption of archaeological knowledge. As a case in point we take the prototype interface being developed within the Gabii Project for the publication of the Tincu House. In discussing our own evolving practices in engagement with the archaeological record created at Gabii, we highlight some of the challenges of undertaking theoretically-situated user interface design, and their implications for the publication and study of archaeological materials
Transport-Based Neural Style Transfer for Smoke Simulations
Artistically controlling fluids has always been a challenging task.
Optimization techniques rely on approximating simulation states towards target
velocity or density field configurations, which are often handcrafted by
artists to indirectly control smoke dynamics. Patch synthesis techniques
transfer image textures or simulation features to a target flow field. However,
these are either limited to adding structural patterns or augmenting coarse
flows with turbulent structures, and hence cannot capture the full spectrum of
different styles and semantically complex structures. In this paper, we propose
the first Transport-based Neural Style Transfer (TNST) algorithm for volumetric
smoke data. Our method is able to transfer features from natural images to
smoke simulations, enabling general content-aware manipulations ranging from
simple patterns to intricate motifs. The proposed algorithm is physically
inspired, since it computes the density transport from a source input smoke to
a desired target configuration. Our transport-based approach allows direct
control over the divergence of the stylization velocity field by optimizing
incompressible and irrotational potentials that transport smoke towards
stylization. Temporal consistency is ensured by transporting and aligning
subsequent stylized velocities, and 3D reconstructions are computed by
seamlessly merging stylizations from different camera viewpoints.Comment: ACM Transaction on Graphics (SIGGRAPH ASIA 2019), additional
materials: http://www.byungsoo.me/project/neural-flow-styl
Shape Animation with Combined Captured and Simulated Dynamics
We present a novel volumetric animation generation framework to create new
types of animations from raw 3D surface or point cloud sequence of captured
real performances. The framework considers as input time incoherent 3D
observations of a moving shape, and is thus particularly suitable for the
output of performance capture platforms. In our system, a suitable virtual
representation of the actor is built from real captures that allows seamless
combination and simulation with virtual external forces and objects, in which
the original captured actor can be reshaped, disassembled or reassembled from
user-specified virtual physics. Instead of using the dominant surface-based
geometric representation of the capture, which is less suitable for volumetric
effects, our pipeline exploits Centroidal Voronoi tessellation decompositions
as unified volumetric representation of the real captured actor, which we show
can be used seamlessly as a building block for all processing stages, from
capture and tracking to virtual physic simulation. The representation makes no
human specific assumption and can be used to capture and re-simulate the actor
with props or other moving scenery elements. We demonstrate the potential of
this pipeline for virtual reanimation of a real captured event with various
unprecedented volumetric visual effects, such as volumetric distortion,
erosion, morphing, gravity pull, or collisions
Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks
We study the problem of synthesizing a number of likely future frames from a
single input image. In contrast to traditional methods that have tackled this
problem in a deterministic or non-parametric way, we propose to model future
frames in a probabilistic manner. Our probabilistic model makes it possible for
us to sample and synthesize many possible future frames from a single input
image. To synthesize realistic movement of objects, we propose a novel network
structure, namely a Cross Convolutional Network; this network encodes image and
motion information as feature maps and convolutional kernels, respectively. In
experiments, our model performs well on synthetic data, such as 2D shapes and
animated game sprites, and on real-world video frames. We present analyses of
the learned network representations, showing it is implicitly learning a
compact encoding of object appearance and motion. We also demonstrate a few of
its applications, including visual analogy-making and video extrapolation.Comment: Journal preprint of arXiv:1607.02586 (IEEE TPAMI, 2019). The first
two authors contributed equally to this work. Project page:
http://visualdynamics.csail.mit.ed
3D Cinemagraphy from a Single Image
We present 3D Cinemagraphy, a new technique that marries 2D image animation
with 3D photography. Given a single still image as input, our goal is to
generate a video that contains both visual content animation and camera motion.
We empirically find that naively combining existing 2D image animation and 3D
photography methods leads to obvious artifacts or inconsistent animation. Our
key insight is that representing and animating the scene in 3D space offers a
natural solution to this task. To this end, we first convert the input image
into feature-based layered depth images using predicted depth values, followed
by unprojecting them to a feature point cloud. To animate the scene, we perform
motion estimation and lift the 2D motion into the 3D scene flow. Finally, to
resolve the problem of hole emergence as points move forward, we propose to
bidirectionally displace the point cloud as per the scene flow and synthesize
novel views by separately projecting them into target image planes and blending
the results. Extensive experiments demonstrate the effectiveness of our method.
A user study is also conducted to validate the compelling rendering results of
our method.Comment: Accepted by CVPR 2023. Project page:
https://xingyi-li.github.io/3d-cinemagraphy
Neural Radiance Fields: Past, Present, and Future
The various aspects like modeling and interpreting 3D environments and
surroundings have enticed humans to progress their research in 3D Computer
Vision, Computer Graphics, and Machine Learning. An attempt made by Mildenhall
et al in their paper about NeRFs (Neural Radiance Fields) led to a boom in
Computer Graphics, Robotics, Computer Vision, and the possible scope of
High-Resolution Low Storage Augmented Reality and Virtual Reality-based 3D
models have gained traction from res with more than 1000 preprints related to
NeRFs published. This paper serves as a bridge for people starting to study
these fields by building on the basics of Mathematics, Geometry, Computer
Vision, and Computer Graphics to the difficulties encountered in Implicit
Representations at the intersection of all these disciplines. This survey
provides the history of rendering, Implicit Learning, and NeRFs, the
progression of research on NeRFs, and the potential applications and
implications of NeRFs in today's world. In doing so, this survey categorizes
all the NeRF-related research in terms of the datasets used, objective
functions, applications solved, and evaluation criteria for these applications.Comment: 413 pages, 9 figures, 277 citation
- …