84 research outputs found
3D Car Shape Reconstruction from a Single Sketch Image
Efficient car shape design is a challenging problem in both the automotive industry and the computer animation/games industry. In this paper, we present a system to reconstruct the 3D car shape from a single 2D sketch image. To learn the correlation between 2D sketches and 3D cars, we propose a Variational Autoencoder deep neural network that takes a 2D sketch and generates a set of multiview depth & mask images, which are more effective representation comparing to 3D mesh, and can be combined to form the 3D car shape. To ensure the volume and diversity of the training data, we propose a feature-preserving car mesh augmentation pipeline for data augmentation. Since deep learning has limited capacity to reconstruct fine-detail features, we propose a lazy learning approach that constructs a small subspace based on a few relevant car samples in the database. Due to the small size of such a subspace, fine details can be represented effectively with a small number of parameters. With a low-cost optimization process, a high-quality car with detailed features is created. Experimental results show that the system performs consistently to create highly realistic cars of substantially different shape and topology, with a very low computational cost
Single Sketch Image based 3D Car Shape Reconstruction with Deep Learning and Lazy Learning
Efficient car shape design is a challenging problem in both the automotive industry and the computer animation/games industry. In this paper, we present a system to reconstruct the 3D car shape from a single 2D sketchimage. To learn the correlation between 2D sketches and 3D cars, we propose a Variational Autoencoder deepneural network that takes a 2D sketch and generates a set of multi-view depth and mask images, which forma more effective representation comparing to 3D meshes, and can be effectively fused to generate a 3D carshape. Since global models like deep learning have limited capacity to reconstruct fine-detail features, wepropose a local lazy learning approach that constructs a small subspace based on a few relevant car samples inthe database. Due to the small size of such a subspace, fine details can be represented effectively with a smallnumber of parameters. With a low-cost optimization process, a high-quality car shape with detailed featuresis created. Experimental results show that the system performs consistently to create highly realistic cars ofsubstantially different shape and topology
Event-based Camera Simulation using Monte Carlo Path Tracing with Adaptive Denoising
This paper presents an algorithm to obtain an event-based video from noisy
frames given by physics-based Monte Carlo path tracing over a synthetic 3D
scene. Given the nature of dynamic vision sensor (DVS), rendering event-based
video can be viewed as a process of detecting the changes from noisy brightness
values. We extend a denoising method based on a weighted local regression (WLR)
to detect the brightness changes rather than applying denoising to every pixel.
Specifically, we derive a threshold to determine the likelihood of event
occurrence and reduce the number of times to perform the regression. Our method
is robust to noisy video frames obtained from a few path-traced samples.
Despite its efficiency, our method performs comparably to or even better than
an approach that exhaustively denoises every frame.Comment: 8 pages, 6 figures, 3 table
Enhancing Perception and Immersion in Pre-Captured Environments through Learning-Based Eye Height Adaptation
Pre-captured immersive environments using omnidirectional cameras provide a
wide range of virtual reality applications. Previous research has shown that
manipulating the eye height in egocentric virtual environments can
significantly affect distance perception and immersion. However, the influence
of eye height in pre-captured real environments has received less attention due
to the difficulty of altering the perspective after finishing the capture
process. To explore this influence, we first propose a pilot study that
captures real environments with multiple eye heights and asks participants to
judge the egocentric distances and immersion. If a significant influence is
confirmed, an effective image-based approach to adapt pre-captured real-world
environments to the user's eye height would be desirable. Motivated by the
study, we propose a learning-based approach for synthesizing novel views for
omnidirectional images with altered eye heights. This approach employs a
multitask architecture that learns depth and semantic segmentation in two
formats, and generates high-quality depth and semantic segmentation to
facilitate the inpainting stage. With the improved omnidirectional-aware
layered depth image, our approach synthesizes natural and realistic visuals for
eye height adaptation. Quantitative and qualitative evaluation shows favorable
results against state-of-the-art methods, and an extensive user study verifies
improved perception and immersion for pre-captured real-world environments.Comment: 10 pages, 13 figures, 3 tables, submitted to ISMAR 202
Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning
This paper presents a novel metric learning approach to address the
performance gap between normal and silent speech in visual speech recognition
(VSR). The difference in lip movements between the two poses a challenge for
existing VSR models, which exhibit degraded accuracy when applied to silent
speech. To solve this issue and tackle the scarcity of training data for silent
speech, we propose to leverage the shared literal content between normal and
silent speech and present a metric learning approach based on visemes.
Specifically, we aim to map the input of two speech types close to each other
in a latent space if they have similar viseme representations. By minimizing
the Kullback-Leibler divergence of the predicted viseme probability
distributions between and within the two speech types, our model effectively
learns and predicts viseme identities. Our evaluation demonstrates that our
method improves the accuracy of silent VSR, even when limited training data is
available.Comment: Accepted by INTERSPEECH 202
DanceReProducer: An Automatic Mashup Music Video Generation System by Reusing Dance Video Clips on the Web
(Abstract to follow
Automatic Dance Generation System Considering Sign Language Information
In recent years, thanks to the development of 3DCG animation editing tools (e.g. MikuMikuDance), a lot of 3D character dance animation movies are created by amateur users. However it is very difficult to create choreography from scratch without any technical knowledge. Shiratori et al. [2006] produced the dance automatic generation system considering rhythm and intensity of dance motions. However each segment is selected randomly from database, so the generated dance motion has no linguistic or emotional meanings. Takano et al. [2010] produced a human motion generation system considering motion labels. However they use simple motion labels like “running” or “jump”, so they cannot generate motions that express emotions. In reality, professional dancers make choreography based on music features or lyrics in music, and express emotion or how they feel in music. In our work, we aim at generating more emotional dance motion easily. Therefore, we use linguistic information in lyrics, and generate dance motion.
In this paper, we propose the system to generate the sign dance motion from continuous sign language motion based on lyrics of music. This system could help the deaf to listen to music as visualized music application
- …