306 research outputs found
Unsupervised learning of object landmarks by factorized spatial embeddings
Learning automatically the structure of object categories remains an
important open problem in computer vision. In this paper, we propose a novel
unsupervised approach that can discover and learn landmarks in object
categories, thus characterizing their structure. Our approach is based on
factorizing image deformations, as induced by a viewpoint change or an object
deformation, by learning a deep neural network that detects landmarks
consistently with such visual effects. Furthermore, we show that the learned
landmarks establish meaningful correspondences between different object
instances in a category without having to impose this requirement explicitly.
We assess the method qualitatively on a variety of object types, natural and
man-made. We also show that our unsupervised landmarks are highly predictive of
manually-annotated landmarks in face benchmark datasets, and can be used to
regress these with a high degree of accuracy.Comment: To be published in ICCV 201
Non-rigid registration of 2-D/3-D dynamic data with feature alignment
In this work, we are computing the matching between 2D manifolds and 3D manifolds with temporal constraints, that is we are computing the matching among a time sequence of 2D/3D manifolds. It is solved by mapping all the manifolds to a common domain, then build their matching by composing the forward mapping and the inverse mapping. At first, we solve the matching problem between 2D manifolds with temporal constraints by using mesh-based registration method. We propose a surface parameterization method to compute the mapping between the 2D manifold and the common 2D planar domain. We can compute the matching among the time sequence of deforming geometry data through this common domain. Compared with previous work, our method is independent of the quality of mesh elements and more efficient for the time sequence data. Then we develop a global intensity-based registration method to solve the matching problem between 3D manifolds with temporal constraints. Our method is based on a 4D(3D+T) free-from B-spline deformation model which has both spatial and temporal smoothness. Compared with previous 4D image registration techniques, our method avoids some local minimum. Thus it can be solved faster and achieve better accuracy of landmark point predication. We demonstrate the efficiency of these works on the real applications. The first one is applied to the dynamic face registering and texture mapping. The second one is applied to lung tumor motion tracking in the medical image analysis. In our future work, we are developing more efficient mesh-based 4D registration method. It can be applied to tumor motion estimation and tracking, which can be used to calculate the read dose delivered to the lung and surrounding tissues. Thus this can support the online treatment of lung cancer radiotherapy
Effective 3D Geometric Matching for Data Restoration and Its Forensic Application
3D geometric matching is the technique to detect the similar patterns among multiple objects. It is an important and fundamental problem and can facilitate many tasks in computer graphics and vision, including shape comparison and retrieval, data fusion, scene understanding and object recognition, and data restoration. For example, 3D scans of an object from different angles are matched and stitched together to form the complete geometry. In medical image analysis, the motion of deforming organs is modeled and predicted by matching a series of CT images. This problem is challenging and remains unsolved, especially when the similar patterns are 1) small and lack geometric saliency; 2) incomplete due to the occlusion of the scanning and damage of the data. We study the reliable matching algorithm that can tackle the above difficulties and its application in data restoration. Data restoration is the problem to restore the fragmented or damaged model to its original complete state. It is a new area and has direct applications in many scientific fields such as Forensics and Archeology. In this dissertation, we study novel effective geometric matching algorithms, including curve matching, surface matching, pairwise matching, multi-piece matching and template matching. We demonstrate its applications in an integrated digital pipeline of skull reassembly, skull completion, and facial reconstruction, which is developed to facilitate the state-of-the-art forensic skull/facial reconstruction processing pipeline in law enforcement
Deep Learning-Based Human Pose Estimation: A Survey
Human pose estimation aims to locate the human body parts and build human
body representation (e.g., body skeleton) from input data such as images and
videos. It has drawn increasing attention during the past decade and has been
utilized in a wide range of applications including human-computer interaction,
motion analysis, augmented reality, and virtual reality. Although the recently
developed deep learning-based solutions have achieved high performance in human
pose estimation, there still remain challenges due to insufficient training
data, depth ambiguities, and occlusion. The goal of this survey paper is to
provide a comprehensive review of recent deep learning-based solutions for both
2D and 3D pose estimation via a systematic analysis and comparison of these
solutions based on their input data and inference procedures. More than 240
research papers since 2014 are covered in this survey. Furthermore, 2D and 3D
human pose estimation datasets and evaluation metrics are included.
Quantitative performance comparisons of the reviewed methods on popular
datasets are summarized and discussed. Finally, the challenges involved,
applications, and future research directions are concluded. We also provide a
regularly updated project page: \url{https://github.com/zczcwh/DL-HPE
FFD:Fast Feature Detector
Scale-invariance, good localization and robustness to noise and distortions
are the main properties that a local feature detector should possess. Most
existing local feature detectors find excessive unstable feature points that
increase the number of keypoints to be matched and the computational time of
the matching step. In this paper, we show that robust and accurate keypoints
exist in the specific scale-space domain. To this end, we first formulate the
superimposition problem into a mathematical model and then derive a closed-form
solution for multiscale analysis. The model is formulated via
difference-of-Gaussian (DoG) kernels in the continuous scale-space domain, and
it is proved that setting the scale-space pyramid's blurring ratio and
smoothness to 2 and 0.627, respectively, facilitates the detection of reliable
keypoints. For the applicability of the proposed model to discrete images, we
discretize it using the undecimated wavelet transform and the cubic spline
function. Theoretically, the complexity of our method is less than 5\% of that
of the popular baseline Scale Invariant Feature Transform (SIFT). Extensive
experimental results show the superiority of the proposed feature detector over
the existing representative hand-crafted and learning-based techniques in
accuracy and computational time. The code and supplementary materials can be
found at~{\url{https://github.com/mogvision/FFD}}
Deepfakes Generation using LSTM based Generative Adversarial Networks
Deep learning has been achieving promising results across a wide range of complex task domains. However, recent advancements in deep learning have also been employed to create software which causes threats to the privacy of people and national security. One among them is deepfakes, which creates fake images as well as videos that cannot be detected as forgeries by humans. Fake speeches of world leaders can even cause threat to world stability and peace. Apart from the malicious usage, deepfakes can also be used for positive purposes such as in films for post dubbing or performing language translation. This latter case was recently used in the latest Indian election such that politician speeches can be converted to many Indian dialects across the country. This work was traditionally done using computer graphic technology and 3D models. But with advances in deep learning and computer vision, in particular GANs, the earlier methods are being replaced by deep learning methods. This research will focus on using deep neural networks for generating manipulated faces in images and videos.
This master’s thesis develops a novel architecture which can generate a full sequence of video frames given a source image and a target video. We were inspired by the works done by NVIDIA in vid2vid and few-shot vid2vid where they learn to map source video domains to target domains. In our work, we propose a unified model using LSTM based GANs along with a motion module which uses a keypoint detector to generate the dense motion. The generator network employs warping to combine the appearance extracted from the source image and the motion from the target video to generate realistic videos and also to decouple the occlusions. The training is done end-to-end and the keypoints are learnt in a self-supervised way. Evaluation is demonstrated on the recently introduced FaceForensics++ and VoxCeleb datasets
- …