11,273 research outputs found
In-Network View Synthesis for Interactive Multiview Video Systems
To enable Interactive multiview video systems with a minimum view-switching
delay, multiple camera views are sent to the users, which are used as reference
images to synthesize additional virtual views via depth-image-based rendering.
In practice, bandwidth constraints may however restrict the number of reference
views sent to clients per time unit, which may in turn limit the quality of the
synthesized viewpoints. We argue that the reference view selection should
ideally be performed close to the users, and we study the problem of in-network
reference view synthesis such that the navigation quality is maximized at the
clients. We consider a distributed cloud network architecture where data stored
in a main cloud is delivered to end users with the help of cloudlets, i.e.,
resource-rich proxies close to the users. In order to satisfy last-hop
bandwidth constraints from the cloudlet to the users, a cloudlet re-samples
viewpoints of the 3D scene into a discrete set of views (combination of
received camera views and virtual views synthesized) to be used as reference
for the synthesis of additional virtual views at the client. This in-network
synthesis leads to better viewpoint sampling given a bandwidth constraint
compared to simple selection of camera views, but it may however carry a
distortion penalty in the cloudlet-synthesized reference views. We therefore
cast a new reference view selection problem where the best subset of views is
defined as the one minimizing the distortion over a view navigation window
defined by the user under some transmission bandwidth constraints. We show that
the view selection problem is NP-hard, and propose an effective polynomial time
algorithm using dynamic programming to solve the optimization problem.
Simulation results finally confirm the performance gain offered by virtual view
synthesis in the network
Wavelet based stereo images reconstruction using depth images
It is believed by many that three-dimensional (3D) television will be the next logical development toward a more natural and vivid home entertaiment experience. While classical 3D approach requires the transmission of two video streams, one for each view, 3D TV systems based on depth image rendering (DIBR) require a single stream of monoscopic images and a second stream of associated images usually termed depth images or depth maps, that contain per-pixel depth information. Depth map is a two-dimensional function that contains information about distance from camera to a certain point of the object as a function of the image coordinates. By using this depth information and the original image it is possible to reconstruct a virtual image of a nearby viewpoint by projecting the pixels of available image to their locations in 3D space and finding their position in the desired view plane. One of the most significant advantages of the DIBR is that depth maps can be coded more efficiently than two streams corresponding to left and right view of the scene, thereby reducing the bandwidth required for transmission, which makes it possible to reuse existing transmission channels for the transmission of 3D TV. This technique can also be applied for other 3D technologies such as multimedia systems.
In this paper we propose an advanced wavelet domain scheme for the reconstruction of stereoscopic images, which solves some of the shortcommings of the existing methods discussed above. We perform the wavelet transform of both the luminance and depth images in order to obtain significant geometric features, which enable more sensible reconstruction of the virtual view. Motion estimation employed in our approach uses Markov random field smoothness prior for regularization of the estimated motion field.
The evaluation of the proposed reconstruction method is done on two video sequences which are typically used for comparison of stereo reconstruction algorithms. The results demonstrate advantages of the proposed approach with respect to the state-of-the-art methods, in terms of both objective and subjective performance measures
3D Face Synthesis Driven by Personality Impression
Synthesizing 3D faces that give certain personality impressions is commonly
needed in computer games, animations, and virtual world applications for
producing realistic virtual characters. In this paper, we propose a novel
approach to synthesize 3D faces based on personality impression for creating
virtual characters. Our approach consists of two major steps. In the first
step, we train classifiers using deep convolutional neural networks on a
dataset of images with personality impression annotations, which are capable of
predicting the personality impression of a face. In the second step, given a 3D
face and a desired personality impression type as user inputs, our approach
optimizes the facial details against the trained classifiers, so as to
synthesize a face which gives the desired personality impression. We
demonstrate our approach for synthesizing 3D faces giving desired personality
impressions on a variety of 3D face models. Perceptual studies show that the
perceived personality impressions of the synthesized faces agree with the
target personality impressions specified for synthesizing the faces. Please
refer to the supplementary materials for all results.Comment: 8pages;6 figure
Federated Multi-View Synthesizing for Metaverse
The metaverse is expected to provide immersive entertainment, education, and
business applications. However, virtual reality (VR) transmission over wireless
networks is data- and computation-intensive, making it critical to introduce
novel solutions that meet stringent quality-of-service requirements. With
recent advances in edge intelligence and deep learning, we have developed a
novel multi-view synthesizing framework that can efficiently provide
computation, storage, and communication resources for wireless content delivery
in the metaverse. We propose a three-dimensional (3D)-aware generative model
that uses collections of single-view images. These single-view images are
transmitted to a group of users with overlapping fields of view, which avoids
massive content transmission compared to transmitting tiles or whole 3D models.
We then present a federated learning approach to guarantee an efficient
learning process. The training performance can be improved by characterizing
the vertical and horizontal data samples with a large latent feature space,
while low-latency communication can be achieved with a reduced number of
transmitted parameters during federated learning. We also propose a federated
transfer learning framework to enable fast domain adaptation to different
target domains. Simulation results have demonstrated the effectiveness of our
proposed federated multi-view synthesizing framework for VR content delivery
Improving Surgical Training Phantoms by Hyperrealism: Deep Unpaired Image-to-Image Translation from Real Surgeries
Current `dry lab' surgical phantom simulators are a valuable tool for
surgeons which allows them to improve their dexterity and skill with surgical
instruments. These phantoms mimic the haptic and shape of organs of interest,
but lack a realistic visual appearance. In this work, we present an innovative
application in which representations learned from real intraoperative
endoscopic sequences are transferred to a surgical phantom scenario. The term
hyperrealism is introduced in this field, which we regard as a novel subform of
surgical augmented reality for approaches that involve real-time object
transfigurations. For related tasks in the computer vision community, unpaired
cycle-consistent Generative Adversarial Networks (GANs) have shown excellent
results on still RGB images. Though, application of this approach to continuous
video frames can result in flickering, which turned out to be especially
prominent for this application. Therefore, we propose an extension of
cycle-consistent GANs, named tempCycleGAN, to improve temporal consistency.The
novel method is evaluated on captures of a silicone phantom for training
endoscopic reconstructive mitral valve procedures. Synthesized videos show
highly realistic results with regard to 1) replacement of the silicone
appearance of the phantom valve by intraoperative tissue texture, while 2)
explicitly keeping crucial features in the scene, such as instruments, sutures
and prostheses. Compared to the original CycleGAN approach, tempCycleGAN
efficiently removes flickering between frames. The overall approach is expected
to change the future design of surgical training simulators since the generated
sequences clearly demonstrate the feasibility to enable a considerably more
realistic training experience for minimally-invasive procedures.Comment: 8 pages, accepted at MICCAI 2018, supplemental material at
https://youtu.be/qugAYpK-Z4
- …