20,445 research outputs found
Efficient Feature-based Image Registration by Mapping Sparsified Surfaces
With the advancement in the digital camera technology, the use of high
resolution images and videos has been widespread in the modern society. In
particular, image and video frame registration is frequently applied in
computer graphics and film production. However, conventional registration
approaches usually require long computational time for high resolution images
and video frames. This hinders the application of the registration approaches
in the modern industries. In this work, we first propose a new image
representation method to accelerate the registration process by triangulating
the images effectively. For each high resolution image or video frame, we
compute an optimal coarse triangulation which captures the important features
of the image. Then, we apply a surface registration algorithm to obtain a
registration map which is used to compute the registration of the high
resolution image. Experimental results suggest that our overall algorithm is
efficient and capable to achieve a high compression rate while the accuracy of
the registration is well retained when compared with the conventional
grid-based approach. Also, the computational time of the registration is
significantly reduced using our triangulation-based approach
Video Face Editing Using Temporal-Spatial-Smooth Warping
Editing faces in videos is a popular yet challenging aspect of computer
vision and graphics, which encompasses several applications including facial
attractiveness enhancement, makeup transfer, face replacement, and expression
manipulation. Simply applying image-based warping algorithms to video-based
face editing produces temporal incoherence in the synthesized videos because it
is impossible to consistently localize facial features in two frames
representing two different faces in two different videos (or even two
consecutive frames representing the same face in one video). Therefore, high
performance face editing usually requires significant manual manipulation. In
this paper we propose a novel temporal-spatial-smooth warping (TSSW) algorithm
to effectively exploit the temporal information in two consecutive frames, as
well as the spatial smoothness within each frame. TSSW precisely estimates two
control lattices in the horizontal and vertical directions respectively from
the corresponding control lattices in the previous frame, by minimizing a novel
energy function that unifies a data-driven term, a smoothness term, and feature
point constraints. Corresponding warping surfaces then precisely map source
frames to the target frames. Experimental testing on facial attractiveness
enhancement, makeup transfer, face replacement, and expression manipulation
demonstrates that the proposed approaches can effectively preserve spatial
smoothness and temporal coherence in editing facial geometry, skin detail,
identity, and expression, which outperform the existing face editing methods.
In particular, TSSW is robust to subtly inaccurate localization of feature
points and is a vast improvement over image-based warping methods
3D Trajectory Reconstruction of Dynamic Objects Using Planarity Constraints
We present a method to reconstruct the three-dimensional trajectory of a
moving instance of a known object category in monocular video data. We track
the two-dimensional shape of objects on pixel level exploiting instance-aware
semantic segmentation techniques and optical flow cues. We apply Structure from
Motion techniques to object and background images to determine for each frame
camera poses relative to object instances and background structures. By
combining object and background camera pose information, we restrict the object
trajectory to a one-parameter family of possible solutions. We compute a ground
representation by fusing background structures and corresponding semantic
segmentations. This allows us to determine an object trajectory consistent to
image observations and reconstructed environment model. Our method is robust to
occlusion and handles temporarily stationary objects. We show qualitative
results using drone imagery. Due to the lack of suitable benchmark datasets we
present a new dataset to evaluate the quality of reconstructed
three-dimensional object trajectories. The video sequences contain vehicles in
urban areas and are rendered using the path-tracing render engine Cycles to
achieve realistic results. We perform a quantitative evaluation of the
presented approach using this dataset. Our algorithm achieves an average
reconstruction-to-ground-truth distance of 0.31 meter.Comment: 9 Pages, under revie
High-quality Instance-aware Semantic 3D Map Using RGB-D Camera
We present a mapping system capable of constructing detailed instance-level
semantic models of room-sized indoor environments by means of an RGB-D camera.
In this work, we integrate deep-learning-based instance segmentation and
classification into a state of the art RGB-D SLAM system. We leverage the
pipeline of ElasticFusion [1] as a backbone and propose modifications of the
registration cost function. The proposed objective function features a tunable
weight for the appearance channel, which can be learned from data. The
resulting system is capable of producing accurate semantic maps of room-sized
environments, as well as reconstructing highly detailed object-level models.
The developed method has been verified through experimental validation on the
TUMRGB-D SLAM benchmark and the YCB video dataset. Our results confirmed that
the proposed system performs favorably in terms of trajectory estimation,
surface reconstruction, and segmentation quality in comparison to other
state-of-the-art systems
Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-related Applications
Facial expressions are an important way through which humans interact
socially. Building a system capable of automatically recognizing facial
expressions from images and video has been an intense field of study in recent
years. Interpreting such expressions remains challenging and much research is
needed about the way they relate to human affect. This paper presents a general
overview of automatic RGB, 3D, thermal and multimodal facial expression
analysis. We define a new taxonomy for the field, encompassing all steps from
face detection to facial expression recognition, and describe and classify the
state of the art methods accordingly. We also present the important datasets
and the bench-marking of most influential methods. We conclude with a general
discussion about trends, important questions and future lines of research
EgoSampling: Fast-Forward and Stereo for Egocentric Videos
While egocentric cameras like GoPro are gaining popularity, the videos they
capture are long, boring, and difficult to watch from start to end. Fast
forwarding (i.e. frame sampling) is a natural choice for faster video browsing.
However, this accentuates the shake caused by natural head motion, making the
fast forwarded video useless.
We propose EgoSampling, an adaptive frame sampling that gives more stable
fast forwarded videos. Adaptive frame sampling is formulated as energy
minimization, whose optimal solution can be found in polynomial time.
In addition, egocentric video taken while walking suffers from the left-right
movement of the head as the body weight shifts from one leg to another. We turn
this drawback into a feature: Stereo video can be created by sampling the
frames from the left most and right most head positions of each step, forming
approximate stereo-pairs.Comment: in IEEE CVPR 2015, Boston, MA, June 201
IsMo-GAN: Adversarial Learning for Monocular Non-Rigid 3D Reconstruction
The majority of the existing methods for non-rigid 3D surface regression from
monocular 2D images require an object template or point tracks over multiple
frames as an input, and are still far from real-time processing rates. In this
work, we present the Isometry-Aware Monocular Generative Adversarial Network
(IsMo-GAN) - an approach for direct 3D reconstruction from a single image,
trained for the deformation model in an adversarial manner on a light-weight
synthetic dataset. IsMo-GAN reconstructs surfaces from real images under
varying illumination, camera poses, textures and shading at over 250 Hz. In
multiple experiments, it consistently outperforms several approaches in the
reconstruction accuracy, runtime, generalisation to unknown surfaces and
robustness to occlusions. In comparison to the state-of-the-art, we reduce the
reconstruction error by 10-30% including the textureless case and our surfaces
evince fewer artefacts qualitatively.Comment: 13 pages, 11 figures, 4 tables, 6 sections, 73 reference
Image retargeting via Beltrami representation
Image retargeting aims to resize an image to one with a prescribed aspect
ratio. Simple scaling inevitably introduces unnatural geometric distortions on
the important content of the image. In this paper, we propose a simple and yet
effective method to resize an image, which preserves the geometry of the
important content, using the Beltrami representation. Our algorithm allows
users to interactively label content regions as well as line structures. Image
resizing can then be achieved by warping the image by an orientation-preserving
bijective warping map with controlled distortion. The warping map is
represented by its Beltrami representation, which captures the local geometric
distortion of the map. By carefully prescribing the values of the Beltrami
representation, images with different complexity can be effectively resized.
Our method does not require solving any optimization problems and tuning
parameters throughout the process. This results in a simple and efficient
algorithm to solve the image retargeting problem. Extensive experiments have
been carried out, which demonstrate the efficacy of our proposed method.Comment: 13pages, 13 figure
Accurate 3D Reconstruction of Dynamic Scenes from Monocular Image Sequences with Severe Occlusions
The paper introduces an accurate solution to dense orthographic Non-Rigid
Structure from Motion (NRSfM) in scenarios with severe occlusions or, likewise,
inaccurate correspondences. We integrate a shape prior term into variational
optimisation framework. It allows to penalize irregularities of the
time-varying structure on the per-pixel level if correspondence quality
indicator such as an occlusion tensor is available. We make a realistic
assumption that several non-occluded views of the scene are sufficient to
estimate an initial shape prior, though the entire observed scene may exhibit
non-rigid deformations. Experiments on synthetic and real image data show that
the proposed framework significantly outperforms state of the art methods for
correspondence establishment in combination with the state of the art NRSfM
methods. Together with the profound insights into optimisation methods,
implementation details for heterogeneous platforms are provided
Stereo 3D Object Trajectory Reconstruction
We present a method to reconstruct the three-dimensional trajectory of a
moving instance of a known object category using stereo video data. We track
the two-dimensional shape of objects on pixel level exploiting instance-aware
semantic segmentation techniques and optical flow cues. We apply Structure from
Motion (SfM) techniques to object and background images to determine for each
frame initial camera poses relative to object instances and background
structures. We refine the initial SfM results by integrating stereo camera
constraints exploiting factor graphs. We compute the object trajectory by
combining object and background camera pose information. In contrast to stereo
matching methods, our approach leverages temporal adjacent views for object
point triangulation. As opposed to monocular trajectory reconstruction
approaches, our method shows no degenerated cases. We evaluate our approach
using publicly available video data of vehicles in urban scenes.Comment: Under Review. arXiv admin note: text overlap with arXiv:1711.0613
- …