34 research outputs found

    FishRecGAN: An End to End GAN Based Network for Fisheye Rectification and Calibration

    Full text link
    We propose an end-to-end deep learning approach to rectify fisheye images and simultaneously calibrate camera intrinsic and distortion parameters. Our method consists of two parts: a Quick Image Rectification Module developed with a Pix2Pix GAN and Wasserstein GAN (W-Pix2PixGAN), and a Calibration Module with a CNN architecture. Our Quick Rectification Network performs robust rectification with good resolution, making it suitable for constant calibration in camera-based surveillance equipment. To achieve high-quality calibration, we use the straightened output from the Quick Rectification Module as a guidance-like semantic feature map for the Calibration Module to learn the geometric relationship between the straightened feature and the distorted feature. We train and validate our method with a large synthesized dataset labeled with well-simulated parameters applied to a perspective image dataset. Our solution has achieved robust performance in high-resolution with a significant PSNR value of 22.343.Comment: 18 pages, 7 figures, 4 tables, accepted by AAIML 202

    Personalized Cinemagraphs using Semantic Understanding and Collaborative Learning

    Full text link
    Cinemagraphs are a compelling way to convey dynamic aspects of a scene. In these media, dynamic and still elements are juxtaposed to create an artistic and narrative experience. Creating a high-quality, aesthetically pleasing cinemagraph requires isolating objects in a semantically meaningful way and then selecting good start times and looping periods for those objects to minimize visual artifacts (such a tearing). To achieve this, we present a new technique that uses object recognition and semantic segmentation as part of an optimization method to automatically create cinemagraphs from videos that are both visually appealing and semantically meaningful. Given a scene with multiple objects, there are many cinemagraphs one could create. Our method evaluates these multiple candidates and presents the best one, as determined by a model trained to predict human preferences in a collaborative way. We demonstrate the effectiveness of our approach with multiple results and a user study.Comment: To appear in ICCV 2017. Total 17 pages including the supplementary materia

    ContactGen: Contact-Guided Interactive 3D Human Generation for Partners

    Full text link
    Among various interactions between humans, such as eye contact and gestures, physical interactions by contact can act as an essential moment in understanding human behaviors. Inspired by this fact, given a 3D partner human with the desired interaction label, we introduce a new task of 3D human generation in terms of physical contact. Unlike previous works of interacting with static objects or scenes, a given partner human can have diverse poses and different contact regions according to the type of interaction. To handle this challenge, we propose a novel method of generating interactive 3D humans for a given partner human based on a guided diffusion framework. Specifically, we newly present a contact prediction module that adaptively estimates potential contact regions between two input humans according to the interaction label. Using the estimated potential contact regions as complementary guidances, we dynamically enforce ContactGen to generate interactive 3D humans for a given partner human within a guided diffusion model. We demonstrate ContactGen on the CHI3D dataset, where our method generates physically plausible and diverse poses compared to comparison methods.Comment: Accepted by AAAI 202

    Quasi-Globally Optimal and Real-Time Visual Compass in Manhattan Structured Environments

    No full text
    We present a drift-free visual compass for estimating the three degrees of freedom (DoF) rotational motion of a camera by recognizing structural regularities in a Manhattan world (MW), which posits that the major structures conform to three orthogonal principal directions. Existing Manhattan frame estimation approaches are based on either data sampling or a parameter search, and fail to guarantee accuracy and efficiency simultaneously. To overcome these limitations, we propose a novel approach to hybridize these two strategies, achieving quasi-global optimality and high efficiency. We first compute the two DoF of the camera orientation by detecting and tracking a vertical dominant direction from a depth camera or an IMU, and then search for the optimal third DoF with the image lines through the proposed Manhattan Mine-and-Stab (MnS) approach. Once we find the initial rotation estimate of the camera, we refine the absolute camera orientation by minimizing the average orthogonal distance from the endpoints of the lines to the MW axes. We compare the proposed algorithm with other state-of-the-art approaches on a variety of real-world datasets including data from a drone flying in an urban environment, and demonstrate that the proposed method outperforms them in terms of accuracy, efficiency, and stability. The code is available on the project page: https://github.com/PyojinKim/MWM

    Diffusion-based Signed Distance Fields for 3D Shape Generation

    No full text

    Learning-based reflection-aware virtual point removal for large-scale 3D point clouds

    No full text
    3D point clouds are widely used for robot perception and navigation. LiDAR sensors can provide large scale 3D point clouds (LS3DPC) with a certain level of accuracy in common environment. However, they often generate virtual points as reflection artifacts associated with reflective surfaces like glass planes, which may degrade the performance of various robot applications. In this letter, we propose a novel learning-based framework to remove such virtual points from LS3DPCs. We first project 3D point clouds onto 2D image domain to investigate the distribution of the LiDAR's echo pulses, which is then used as an input to the glass probability estimation network. Moreover, the 3D feature similarity estimation network exploits the deep features to compare the symmetry and geometric similarity between real and virtual points with respect to the estimated glass plane. We provide a LS3DPC dataset with synthetically generated reflection artifacts to train the proposed network. Experimental results show that the proposed method achieves the better performance qualitatively and quantitatively compared with the existing state-of-the-art methods of 3D reflection removal

    Pose-Guided 3D Human Generation in Indoor Scene

    No full text
    In this work, we address the problem of scene-aware 3D human avatar generation based on human-scene interactions. In particular, we pay attention to the fact that physical contact between a 3D human and a scene (i.e., physical human-scene interactions) requires a geometrical alignment to generate natural 3D human avatar. Motivated by this fact, we present a new 3D human generation framework that considers geometric alignment on potential contact areas between 3D human avatars and their surroundings. In addition, we introduce a compact yet effective human pose classifier that classifies the human pose and provides potential contact areas of the 3D human avatar. It allows us to adaptively use geometric alignment loss according to the classified human pose. Compared to state-of-the-art method, our method can generate physically and semantically plausible 3D humans that interact naturally with 3D scenes without additional post-processing. In our evaluations, we achieve the improvements with more plausible interactions and more variety of poses than prior research in qualitative and quantitative analysis. Project page: https://bupyeonghealer.github.io/phin/

    Volumetric Propagation Network: Stereo-LiDAR Fusion for Long-Range Depth Estimation

    No full text
    Stereo-LiDAR fusion is a promising task in that we can utilize two different types of 3D perceptions for practical usage - dense 3D information (stereo cameras) and highly-accurate sparse point clouds (LiDAR). However, due to their different modalities and structures, the method of aligning sensor data is the key for successful sensor fusion. To this end, we propose a geometry-aware stereo-LiDAR fusion network for long-range depth estimation, called volumetric propagation network. The key idea of our network is to exploit sparse and accurate point clouds as a cue for guiding correspondences of stereo images in a unified 3D volume space. Unlike existing fusion strategies, we directly embed point clouds into the volume, which enables us to propagate valid information into nearby voxels in the volume, and to reduce the uncertainty of correspondences. Thus, it allows us to fuse two different input modalities seamlessly and regress a long-range depth map. Our fusion is further enhanced by a newly proposed feature extraction layer for point clouds guided by images: FusionConv. FusionConv extracts point cloud features that consider both semantic (2D image domain) and geometric (3D domain) relations and aid fusion at the volume. Our network achieves state-of-the-art performance on KITTI and Virtual-KITTI datasets among recent stereo-LiDAR fusion methods

    Hierarchical 3D line restoration based on angular proximity in structured environments

    No full text
    We present a method based on a hierarchical clustering to restore the 3D lines of structured environments. In previous approaches, the restoration of noisy 3D lines is a challenging problem because it is difficult to define a suitable similarity measure discriminative to other lines. Our motivation to overcome the difficulty is that most structured scenes consist of sets of parallel 3D lines with the same angular proximity, which provides a hierarchical similarity measure for structured 3D lines. Accordingly, our restoration method works in a manner that clustering is hierarchically performed on angular and distance levels. The 3D line restoration is then achieved by finding the center of each cluster. The framework also makes the clustered 3D lines align along the associated angular directions. We compare the proposed algorithm with methods using no knowledge of the angular information, and demonstrate its effectiveness through real-world experiments. ยฉ 2013 IEEE.1

    Unified 3D Mesh Recovery of Humans and Animals by Learning Animal Exercise

    No full text
    1
    corecore