6,136 research outputs found
Quality Assessment of Free-viewpoint Videos by Quantifying the Elastic Changes of Multi-Scale Motion Trajectories
Virtual viewpoints synthesis is an essential process for many immersive
applications including Free-viewpoint TV (FTV). A widely used technique for
viewpoints synthesis is Depth-Image-Based-Rendering (DIBR) technique. However,
such techniques may introduce challenging non-uniform spatial-temporal
structure-related distortions. Most of the existing state-of-the-art quality
metrics fail to handle these distortions, especially the temporal structure
inconsistencies observed during the switch of different viewpoints. To tackle
this problem, an elastic metric and multi-scale trajectory based video quality
metric (EM-VQM) is proposed in this paper. Dense motion trajectory is first
used as a proxy for selecting temporal sensitive regions, where local geometric
distortions might significantly diminish the perceived quality. Afterwards, the
amount of temporal structure inconsistencies and unsmooth viewpoints
transitions are quantified by calculating 1) the amount of motion trajectory
deformations with elastic metric and, 2) the spatial-temporal structural
dissimilarity. According to the comprehensive experimental results on two FTV
video datasets, the proposed metric outperforms the state-of-the-art metrics
designed for free-viewpoint videos significantly and achieves a gain of 12.86%
and 16.75% in terms of median Pearson linear correlation coefficient values on
the two datasets compared to the best one, respectively.Comment: 13 page
Real-time collision detection method for deformable bodies
This paper presents a real-time solution for collision detection between
objects based on the physics properties. Traditional approaches on collision
detection often rely on the geometric relationships that computing the
intersections between polygons. Such technique is very computationally
expensive when applied for deformable objects. As an alternative, we
approximate the 3D mesh in an spherical surface implicitly. This allows us to
perform a coarse-level collision detection at extremely fast speed. Then a
dynamic programming based procedure is applied to identify the collision in
fine details. Our method demonstrates better prevention to collision tunnelling
and works more efficiently than the state-of-the-arts.Comment: Computer-Aided Design, 201
A Novel Semantics and Feature Preserving Perspective for Content Aware Image Retargeting
There is an increasing requirement for efficient image retargeting techniques
to adapt the content to various forms of digital media. With rapid growth of
mobile communications and dynamic web page layouts, one often needs to resize
the media content to adapt to the desired display sizes. For various layouts of
web pages and typically small sizes of handheld portable devices, the
importance in the original image content gets obfuscated after resizing it with
the approach of uniform scaling. Thus, there occurs a need for resizing the
images in a content aware manner which can automatically discard irrelevant
information from the image and present the salient features with more
magnitude. There have been proposed some image retargeting techniques keeping
in mind the content awareness of the input image. However, these techniques
fail to prove globally effective for various kinds of images and desired sizes.
The major problem is the inefficiency of these algorithms to process these
images with minimal visual distortion while also retaining the meaning conveyed
from the image. In this dissertation, we present a novel perspective for
content aware image retargeting, which is well implementable in real time. We
introduce a novel method of analysing semantic information within the input
image while also maintaining the important and visually significant features.
We present the various nuances of our algorithm mathematically and logically,
and show that the results prove better than the state-of-the-art techniques.Comment: 74 Pages, 46 Figures, Masters Thesi
HDFD --- A High Deformation Facial Dynamics Benchmark for Evaluation of Non-Rigid Surface Registration and Classification
Objects that undergo non-rigid deformation are common in the real world. A
typical and challenging example is the human faces. While various techniques
have been developed for deformable shape registration and classification,
benchmarks with detailed labels and landmarks suitable for evaluating such
techniques are still limited. In this paper, we present a novel facial dynamic
dataset HDFD which addresses the gap of existing datasets, including 4D funny
faces with substantial non-isometric deformation, and 4D visual-audio faces of
spoken phrases in a minority language (Welsh). Both datasets are captured from
21 participants. The sequences are manually landmarked, with the spoken phrases
further rated by a Welsh expert for level of fluency. These are useful for
quantitative evaluation of both registration and classification tasks. We
further develop a methodology to evaluate several recent non-rigid surface
registration techniques, using our dynamic sequences as test cases. The study
demonstrates the significance and usefulness of our new dataset --- a
challenging benchmark dataset for future techniques
Diffusion framework for geometric and photometric data fusion in non-rigid shape analysis
In this paper, we explore the use of the diffusion geometry framework for the
fusion of geometric and photometric information in local and global shape
descriptors. Our construction is based on the definition of a diffusion process
on the shape manifold embedded into a high-dimensional space where the
embedding coordinates represent the photometric information. Experimental
results show that such data fusion is useful in coping with different
challenges of shape analysis where pure geometric and pure photometric methods
fail
Cycle-IR: Deep Cyclic Image Retargeting
Supervised deep learning techniques have achieved great success in various
fields due to getting rid of the limitation of handcrafted representations.
However, most previous image retargeting algorithms still employ fixed design
principles such as using gradient map or handcrafted features to compute
saliency map, which inevitably restricts its generality. Deep learning
techniques may help to address this issue, but the challenging problem is that
we need to build a large-scale image retargeting dataset for the training of
deep retargeting models. However, building such a dataset requires huge human
efforts.
In this paper, we propose a novel deep cyclic image retargeting approach,
called Cycle-IR, to firstly implement image retargeting with a single deep
model, without relying on any explicit user annotations. Our idea is built on
the reverse mapping from the retargeted images to the given input images. If
the retargeted image has serious distortion or excessive loss of important
visual information, the reverse mapping is unlikely to restore the input image
well. We constrain this forward-reverse consistency by introducing a cyclic
perception coherence loss. In addition, we propose a simple yet effective image
retargeting network (IRNet) to implement the image retargeting process. Our
IRNet contains a spatial and channel attention layer, which is able to
discriminate visually important regions of input images effectively, especially
in cluttered images. Given arbitrary sizes of input images and desired aspect
ratios, our Cycle-IR can produce visually pleasing target images directly.
Extensive experiments on the standard RetargetMe dataset show the superiority
of our Cycle-IR. In addition, our Cycle-IR outperforms the Multiop method and
obtains the best result in the user study. Code is available at
https://github.com/mintanwei/Cycle-IR.Comment: 12 page
HDM-Net: Monocular Non-Rigid 3D Reconstruction with Learned Deformation Model
Monocular dense 3D reconstruction of deformable objects is a hard ill-posed
problem in computer vision. Current techniques either require dense
correspondences and rely on motion and deformation cues, or assume a highly
accurate reconstruction (referred to as a template) of at least a single frame
given in advance and operate in the manner of non-rigid tracking. Accurate
computation of dense point tracks often requires multiple frames and might be
computationally expensive. Availability of a template is a very strong prior
which restricts system operation to a pre-defined environment and scenarios. In
this work, we propose a new hybrid approach for monocular non-rigid
reconstruction which we call Hybrid Deformation Model Network (HDM-Net). In our
approach, deformation model is learned by a deep neural network, with a
combination of domain-specific loss functions. We train the network with
multiple states of a non-rigidly deforming structure with a known shape at
rest. HDM-Net learns different reconstruction cues including texture-dependent
surface deformations, shading and contours. We show generalisability of HDM-Net
to states not presented in the training dataset, with unseen textures and under
new illumination conditions. Experiments with noisy data and a comparison with
other methods demonstrate robustness and accuracy of the proposed approach and
suggest possible application scenarios of the new technique in interventional
diagnostics and augmented reality.Comment: 9 pages, 9 figure
Fast detection of multiple objects in traffic scenes with a common detection framework
Traffic scene perception (TSP) aims to real-time extract accurate on-road
environment information, which in- volves three phases: detection of objects of
interest, recognition of detected objects, and tracking of objects in motion.
Since recognition and tracking often rely on the results from detection, the
ability to detect objects of interest effectively plays a crucial role in TSP.
In this paper, we focus on three important classes of objects: traffic signs,
cars, and cyclists. We propose to detect all the three important objects in a
single learning based detection framework. The proposed framework consists of a
dense feature extractor and detectors of three important classes. Once the
dense features have been extracted, these features are shared with all
detectors. The advantage of using one common framework is that the detection
speed is much faster, since all dense features need only to be evaluated once
in the testing phase. In contrast, most previous works have designed specific
detectors using different features for each of these objects. To enhance the
feature robustness to noises and image deformations, we introduce spatially
pooled features as a part of aggregated channel features. In order to further
improve the generalization performance, we propose an object subcategorization
method as a means of capturing intra-class variation of objects. We
experimentally demonstrate the effectiveness and efficiency of the proposed
framework in three detection applications: traffic sign detection, car
detection, and cyclist detection. The proposed framework achieves the
competitive performance with state-of- the-art approaches on several benchmark
datasets.Comment: Appearing in IEEE Transactions on Intelligent Transportation System
Rigid and Non-rigid Shape Evolutions for Shape Alignment and Recovery in Images
The same type of objects in different images may vary in their shapes because
of rigid and non-rigid shape deformations, occluding foreground as well as
cluttered background. The problem concerned in this work is the shape
extraction in such challenging situations. We approach the shape extraction
through shape alignment and recovery. This paper presents a novel and general
method for shape alignment and recovery by using one example shapes based on
deterministic energy minimization. Our idea is to use general model of shape
deformation in minimizing active contour energies. Given \emph{a priori} form
of the shape deformation, we show how the curve evolution equation
corresponding to the shape deformation can be derived. The curve evolution is
called the prior variation shape evolution (PVSE). We also derive the
energy-minimizing PVSE for minimizing active contour energies. For shape
recovery, we propose to use the PVSE that deforms the shape while preserving
its shape characteristics. For choosing such shape-preserving PVSE, a theory of
shape preservability of the PVSE is established. Experimental results validate
the theory and the formulations, and they demonstrate the effectiveness of our
method
Skeletal Representations and Applications
When representing a solid object there are alternatives to the use of
traditional explicit (surface meshes) or implicit (zero crossing of implicit
functions) methods. Skeletal representations encode shape information in a
mixed fashion: they are composed of a set of explicit primitives, yet they are
able to efficiently encode the shape's volume as well as its topology. I will
discuss, in two dimensions, how symmetry can be used to reduce the
dimensionality of the data (from a 2D solid to a 1D curve), and how this
relates to the classical definition of skeletons by Medial Axis Transform.
While the medial axis of a 2D shape is composed of a set of curves, in 3D it
results in a set of sheets connected in a complex fashion. Because of this
complexity, medial skeletons are difficult to use in practical applications.
Curve skeletons address this problem by strictly requiring their geometry to be
one dimensional, resulting in an intuitive yet powerful shape representation.
In this report I will define both medial and curve skeletons and discuss their
mutual relationship. I will also present several algorithms for their
computation and a variety of scenarios where skeletons are employed, with a
special focus on geometry processing and shape analysis.Comment: 42 pages, SFU Depth Exa
- …