374 research outputs found
Depth-Assisted Semantic Segmentation, Image Enhancement and Parametric Modeling
This dissertation addresses the problem of employing 3D depth information on solving a number of traditional challenging computer vision/graphics problems. Humans have the abilities of perceiving the depth information in 3D world, which enable humans to reconstruct layouts, recognize objects and understand the geometric space and semantic meanings of the visual world. Therefore it is significant to explore how the 3D depth information can be utilized by computer vision systems to mimic such abilities of humans. This dissertation aims at employing 3D depth information to solve vision/graphics problems in the following aspects: scene understanding, image enhancements and 3D reconstruction and modeling.
In addressing scene understanding problem, we present a framework for semantic segmentation and object recognition on urban video sequence only using dense depth maps recovered from the video. Five view-independent 3D features that vary with object class are extracted from dense depth maps and used for segmenting and recognizing different object classes in street scene images. We demonstrate a scene parsing algorithm that uses only dense 3D depth information to outperform using sparse 3D or 2D appearance features.
In addressing image enhancement problem, we present a framework to overcome the imperfections of personal photographs of tourist sites using the rich information provided by large-scale internet photo collections (IPCs). By augmenting personal 2D images with 3D information reconstructed from IPCs, we address a number of traditionally challenging image enhancement techniques and achieve high-quality results using simple and robust algorithms.
In addressing 3D reconstruction and modeling problem, we focus on parametric modeling of flower petals, the most distinctive part of a plant. The complex structure, severe occlusions and wide variations make the reconstruction of their 3D models a challenging task. We overcome these challenges by combining data driven modeling techniques with domain knowledge from botany. Taking a 3D point cloud of an input flower scanned from a single view, each segmented petal is fitted with a scale-invariant morphable petal shape model, which is constructed from individually scanned 3D exemplar petals. Novel constraints based on botany studies are incorporated into the fitting process for realistically reconstructing occluded regions and maintaining correct 3D spatial relations.
The main contribution of the dissertation is in the intelligent usage of 3D depth information on solving traditional challenging vision/graphics problems. By developing some advanced algorithms either automatically or with minimum user interaction, the goal of this dissertation is to demonstrate that computed 3D depth behind the multiple images contains rich information of the visual world and therefore can be intelligently utilized to recognize/ understand semantic meanings of scenes, efficiently enhance and augment single 2D images, and reconstruct high-quality 3D models
Dance Your Latents: Consistent Dance Generation through Spatial-temporal Subspace Attention Guided by Motion Flow
The advancement of generative AI has extended to the realm of Human Dance
Generation, demonstrating superior generative capacities. However, current
methods still exhibit deficiencies in achieving spatiotemporal consistency,
resulting in artifacts like ghosting, flickering, and incoherent motions. In
this paper, we present Dance-Your-Latents, a framework that makes latents dance
coherently following motion flow to generate consistent dance videos. Firstly,
considering that each constituent element moves within a confined space, we
introduce spatial-temporal subspace-attention blocks that decompose the global
space into a combination of regular subspaces and efficiently model the
spatiotemporal consistency within these subspaces. This module enables each
patch pay attention to adjacent areas, mitigating the excessive dispersion of
long-range attention. Furthermore, observing that body part's movement is
guided by pose control, we design motion flow guided subspace align & restore.
This method enables the attention to be computed on the irregular subspace
along the motion flow. Experimental results in TikTok dataset demonstrate that
our approach significantly enhances spatiotemporal consistency of the generated
videos.Comment: 10 pages, 5 figure
Facial Expression Analysis under Partial Occlusion: A Survey
Automatic machine-based Facial Expression Analysis (FEA) has made substantial
progress in the past few decades driven by its importance for applications in
psychology, security, health, entertainment and human computer interaction. The
vast majority of completed FEA studies are based on non-occluded faces
collected in a controlled laboratory environment. Automatic expression
recognition tolerant to partial occlusion remains less understood, particularly
in real-world scenarios. In recent years, efforts investigating techniques to
handle partial occlusion for FEA have seen an increase. The context is right
for a comprehensive perspective of these developments and the state of the art
from this perspective. This survey provides such a comprehensive review of
recent advances in dataset creation, algorithm development, and investigations
of the effects of occlusion critical for robust performance in FEA systems. It
outlines existing challenges in overcoming partial occlusion and discusses
possible opportunities in advancing the technology. To the best of our
knowledge, it is the first FEA survey dedicated to occlusion and aimed at
promoting better informed and benchmarked future work.Comment: Authors pre-print of the article accepted for publication in ACM
Computing Surveys (accepted on 02-Nov-2017
Novel Video Completion Approaches and Their Applications
Video completion refers to automatically restoring damaged or removed objects in a video sequence, with applications ranging from sophisticated video removal of undesired static or dynamic objects to correction of missing or corrupted video frames in old movies and synthesis of new video frames to add, modify, or generate a new visual story. The video completion problem can be solved using texture synthesis and/or data interpolation to fill-in the holes of the sequence inward. This thesis makes a distinction between still image completion and video completion. The latter requires visually pleasing consistency by taking into account the temporal information. Based on their applied concepts, video completion techniques are categorized as inpainting and texture synthesis. We present a bandlet transform-based technique for each of these categories of video completion techniques. The proposed inpainting-based technique is a 3D volume regularization scheme that takes advantage of bandlet bases for exploiting the anisotropic regularities to reconstruct a damaged video. The proposed exemplar-based approach, on the other hand, performs video completion using a precise patch fusion in the bandlet domain instead of patch replacement. The video completion task is extended to two important applications in video restoration. First, we develop an automatic video text detection and removal that benefits from the proposed inpainting scheme and a novel video text detector. Second, we propose a novel video super-resolution technique that employs the inpainting algorithm spatially in conjunction with an effective structure tensor, generated using bandlet geometry. The experimental results show a good performance of the proposed video inpainting method and demonstrate the effectiveness of bandlets in video completion tasks. The proposed video text detector and the video super resolution scheme also show a high performance in comparison with existing methods
- …