20,445 research outputs found

    Efficient Feature-based Image Registration by Mapping Sparsified Surfaces

    Full text link
    With the advancement in the digital camera technology, the use of high resolution images and videos has been widespread in the modern society. In particular, image and video frame registration is frequently applied in computer graphics and film production. However, conventional registration approaches usually require long computational time for high resolution images and video frames. This hinders the application of the registration approaches in the modern industries. In this work, we first propose a new image representation method to accelerate the registration process by triangulating the images effectively. For each high resolution image or video frame, we compute an optimal coarse triangulation which captures the important features of the image. Then, we apply a surface registration algorithm to obtain a registration map which is used to compute the registration of the high resolution image. Experimental results suggest that our overall algorithm is efficient and capable to achieve a high compression rate while the accuracy of the registration is well retained when compared with the conventional grid-based approach. Also, the computational time of the registration is significantly reduced using our triangulation-based approach

    Video Face Editing Using Temporal-Spatial-Smooth Warping

    Full text link
    Editing faces in videos is a popular yet challenging aspect of computer vision and graphics, which encompasses several applications including facial attractiveness enhancement, makeup transfer, face replacement, and expression manipulation. Simply applying image-based warping algorithms to video-based face editing produces temporal incoherence in the synthesized videos because it is impossible to consistently localize facial features in two frames representing two different faces in two different videos (or even two consecutive frames representing the same face in one video). Therefore, high performance face editing usually requires significant manual manipulation. In this paper we propose a novel temporal-spatial-smooth warping (TSSW) algorithm to effectively exploit the temporal information in two consecutive frames, as well as the spatial smoothness within each frame. TSSW precisely estimates two control lattices in the horizontal and vertical directions respectively from the corresponding control lattices in the previous frame, by minimizing a novel energy function that unifies a data-driven term, a smoothness term, and feature point constraints. Corresponding warping surfaces then precisely map source frames to the target frames. Experimental testing on facial attractiveness enhancement, makeup transfer, face replacement, and expression manipulation demonstrates that the proposed approaches can effectively preserve spatial smoothness and temporal coherence in editing facial geometry, skin detail, identity, and expression, which outperform the existing face editing methods. In particular, TSSW is robust to subtly inaccurate localization of feature points and is a vast improvement over image-based warping methods

    3D Trajectory Reconstruction of Dynamic Objects Using Planarity Constraints

    Full text link
    We present a method to reconstruct the three-dimensional trajectory of a moving instance of a known object category in monocular video data. We track the two-dimensional shape of objects on pixel level exploiting instance-aware semantic segmentation techniques and optical flow cues. We apply Structure from Motion techniques to object and background images to determine for each frame camera poses relative to object instances and background structures. By combining object and background camera pose information, we restrict the object trajectory to a one-parameter family of possible solutions. We compute a ground representation by fusing background structures and corresponding semantic segmentations. This allows us to determine an object trajectory consistent to image observations and reconstructed environment model. Our method is robust to occlusion and handles temporarily stationary objects. We show qualitative results using drone imagery. Due to the lack of suitable benchmark datasets we present a new dataset to evaluate the quality of reconstructed three-dimensional object trajectories. The video sequences contain vehicles in urban areas and are rendered using the path-tracing render engine Cycles to achieve realistic results. We perform a quantitative evaluation of the presented approach using this dataset. Our algorithm achieves an average reconstruction-to-ground-truth distance of 0.31 meter.Comment: 9 Pages, under revie

    High-quality Instance-aware Semantic 3D Map Using RGB-D Camera

    Full text link
    We present a mapping system capable of constructing detailed instance-level semantic models of room-sized indoor environments by means of an RGB-D camera. In this work, we integrate deep-learning-based instance segmentation and classification into a state of the art RGB-D SLAM system. We leverage the pipeline of ElasticFusion [1] as a backbone and propose modifications of the registration cost function. The proposed objective function features a tunable weight for the appearance channel, which can be learned from data. The resulting system is capable of producing accurate semantic maps of room-sized environments, as well as reconstructing highly detailed object-level models. The developed method has been verified through experimental validation on the TUMRGB-D SLAM benchmark and the YCB video dataset. Our results confirmed that the proposed system performs favorably in terms of trajectory estimation, surface reconstruction, and segmentation quality in comparison to other state-of-the-art systems

    Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-related Applications

    Full text link
    Facial expressions are an important way through which humans interact socially. Building a system capable of automatically recognizing facial expressions from images and video has been an intense field of study in recent years. Interpreting such expressions remains challenging and much research is needed about the way they relate to human affect. This paper presents a general overview of automatic RGB, 3D, thermal and multimodal facial expression analysis. We define a new taxonomy for the field, encompassing all steps from face detection to facial expression recognition, and describe and classify the state of the art methods accordingly. We also present the important datasets and the bench-marking of most influential methods. We conclude with a general discussion about trends, important questions and future lines of research

    EgoSampling: Fast-Forward and Stereo for Egocentric Videos

    Full text link
    While egocentric cameras like GoPro are gaining popularity, the videos they capture are long, boring, and difficult to watch from start to end. Fast forwarding (i.e. frame sampling) is a natural choice for faster video browsing. However, this accentuates the shake caused by natural head motion, making the fast forwarded video useless. We propose EgoSampling, an adaptive frame sampling that gives more stable fast forwarded videos. Adaptive frame sampling is formulated as energy minimization, whose optimal solution can be found in polynomial time. In addition, egocentric video taken while walking suffers from the left-right movement of the head as the body weight shifts from one leg to another. We turn this drawback into a feature: Stereo video can be created by sampling the frames from the left most and right most head positions of each step, forming approximate stereo-pairs.Comment: in IEEE CVPR 2015, Boston, MA, June 201

    IsMo-GAN: Adversarial Learning for Monocular Non-Rigid 3D Reconstruction

    Full text link
    The majority of the existing methods for non-rigid 3D surface regression from monocular 2D images require an object template or point tracks over multiple frames as an input, and are still far from real-time processing rates. In this work, we present the Isometry-Aware Monocular Generative Adversarial Network (IsMo-GAN) - an approach for direct 3D reconstruction from a single image, trained for the deformation model in an adversarial manner on a light-weight synthetic dataset. IsMo-GAN reconstructs surfaces from real images under varying illumination, camera poses, textures and shading at over 250 Hz. In multiple experiments, it consistently outperforms several approaches in the reconstruction accuracy, runtime, generalisation to unknown surfaces and robustness to occlusions. In comparison to the state-of-the-art, we reduce the reconstruction error by 10-30% including the textureless case and our surfaces evince fewer artefacts qualitatively.Comment: 13 pages, 11 figures, 4 tables, 6 sections, 73 reference

    Image retargeting via Beltrami representation

    Full text link
    Image retargeting aims to resize an image to one with a prescribed aspect ratio. Simple scaling inevitably introduces unnatural geometric distortions on the important content of the image. In this paper, we propose a simple and yet effective method to resize an image, which preserves the geometry of the important content, using the Beltrami representation. Our algorithm allows users to interactively label content regions as well as line structures. Image resizing can then be achieved by warping the image by an orientation-preserving bijective warping map with controlled distortion. The warping map is represented by its Beltrami representation, which captures the local geometric distortion of the map. By carefully prescribing the values of the Beltrami representation, images with different complexity can be effectively resized. Our method does not require solving any optimization problems and tuning parameters throughout the process. This results in a simple and efficient algorithm to solve the image retargeting problem. Extensive experiments have been carried out, which demonstrate the efficacy of our proposed method.Comment: 13pages, 13 figure

    Accurate 3D Reconstruction of Dynamic Scenes from Monocular Image Sequences with Severe Occlusions

    Full text link
    The paper introduces an accurate solution to dense orthographic Non-Rigid Structure from Motion (NRSfM) in scenarios with severe occlusions or, likewise, inaccurate correspondences. We integrate a shape prior term into variational optimisation framework. It allows to penalize irregularities of the time-varying structure on the per-pixel level if correspondence quality indicator such as an occlusion tensor is available. We make a realistic assumption that several non-occluded views of the scene are sufficient to estimate an initial shape prior, though the entire observed scene may exhibit non-rigid deformations. Experiments on synthetic and real image data show that the proposed framework significantly outperforms state of the art methods for correspondence establishment in combination with the state of the art NRSfM methods. Together with the profound insights into optimisation methods, implementation details for heterogeneous platforms are provided

    Stereo 3D Object Trajectory Reconstruction

    Full text link
    We present a method to reconstruct the three-dimensional trajectory of a moving instance of a known object category using stereo video data. We track the two-dimensional shape of objects on pixel level exploiting instance-aware semantic segmentation techniques and optical flow cues. We apply Structure from Motion (SfM) techniques to object and background images to determine for each frame initial camera poses relative to object instances and background structures. We refine the initial SfM results by integrating stereo camera constraints exploiting factor graphs. We compute the object trajectory by combining object and background camera pose information. In contrast to stereo matching methods, our approach leverages temporal adjacent views for object point triangulation. As opposed to monocular trajectory reconstruction approaches, our method shows no degenerated cases. We evaluate our approach using publicly available video data of vehicles in urban scenes.Comment: Under Review. arXiv admin note: text overlap with arXiv:1711.0613
    • …
    corecore