2,644 research outputs found

    Deep Generative Modeling of LiDAR Data

    Get PDF
    Building models capable of generating structured output is a key challenge for AI and robotics. While generative models have been explored on many types of data, little work has been done on synthesizing lidar scans, which play a key role in robot mapping and localization. In this work, we show that one can adapt deep generative models for this task by unravelling lidar scans into a 2D point map. Our approach can generate high quality samples, while simultaneously learning a meaningful latent representation of the data. We demonstrate significant improvements against state-of-the-art point cloud generation methods. Furthermore, we propose a novel data representation that augments the 2D signal with absolute positional information. We show that this helps robustness to noisy and imputed input; the learned model can recover the underlying lidar scan from seemingly uninformative dataComment: Presented at IROS 201

    Generative Model with Coordinate Metric Learning for Object Recognition Based on 3D Models

    Full text link
    Given large amount of real photos for training, Convolutional neural network shows excellent performance on object recognition tasks. However, the process of collecting data is so tedious and the background are also limited which makes it hard to establish a perfect database. In this paper, our generative model trained with synthetic images rendered from 3D models reduces the workload of data collection and limitation of conditions. Our structure is composed of two sub-networks: semantic foreground object reconstruction network based on Bayesian inference and classification network based on multi-triplet cost function for avoiding over-fitting problem on monotone surface and fully utilizing pose information by establishing sphere-like distribution of descriptors in each category which is helpful for recognition on regular photos according to poses, lighting condition, background and category information of rendered images. Firstly, our conjugate structure called generative model with metric learning utilizing additional foreground object channels generated from Bayesian rendering as the joint of two sub-networks. Multi-triplet cost function based on poses for object recognition are used for metric learning which makes it possible training a category classifier purely based on synthetic data. Secondly, we design a coordinate training strategy with the help of adaptive noises acting as corruption on input images to help both sub-networks benefit from each other and avoid inharmonious parameter tuning due to different convergence speed of two sub-networks. Our structure achieves the state of the art accuracy of over 50\% on ShapeNet database with data migration obstacle from synthetic images to real photos. This pipeline makes it applicable to do recognition on real images only based on 3D models.Comment: 14 page

    How to Train Your Agent to Read and Write

    Full text link
    Reading and writing research papers is one of the most privileged abilities that a qualified researcher should master. However, it is difficult for new researchers (\eg{students}) to fully {grasp} this ability. It would be fascinating if we could train an intelligent agent to help people read and summarize papers, and perhaps even discover and exploit the potential knowledge clues to write novel papers. Although there have been existing works focusing on summarizing (\emph{i.e.}, reading) the knowledge in a given text or generating (\emph{i.e.}, writing) a text based on the given knowledge, the ability of simultaneously reading and writing is still under development. Typically, this requires an agent to fully understand the knowledge from the given text materials and generate correct and fluent novel paragraphs, which is very challenging in practice. In this paper, we propose a Deep ReAder-Writer (DRAW) network, which consists of a \textit{Reader} that can extract knowledge graphs (KGs) from input paragraphs and discover potential knowledge, a graph-to-text \textit{Writer} that generates a novel paragraph, and a \textit{Reviewer} that reviews the generated paragraph from three different aspects. Extensive experiments show that our DRAW network outperforms considered baselines and several state-of-the-art methods on AGENDA and M-AGENDA datasets. Our code and supplementary are released at https://github.com/menggehe/DRAW

    Age Progression and Regression with Spatial Attention Modules

    Full text link
    Age progression and regression refers to aesthetically render-ing a given face image to present effects of face aging and rejuvenation, respectively. Although numerous studies have been conducted in this topic, there are two major problems: 1) multiple models are usually trained to simulate different age mappings, and 2) the photo-realism of generated face images is heavily influenced by the variation of training images in terms of pose, illumination, and background. To address these issues, in this paper, we propose a framework based on conditional Generative Adversarial Networks (cGANs) to achieve age progression and regression simultaneously. Particularly, since face aging and rejuvenation are largely different in terms of image translation patterns, we model these two processes using two separate generators, each dedicated to one age changing process. In addition, we exploit spatial attention mechanisms to limit image modifications to regions closely related to age changes, so that images with high visual fidelity could be synthesized for in-the-wild cases. Experiments on multiple datasets demonstrate the ability of our model in synthesizing lifelike face images at desired ages with personalized features well preserved, and keeping age-irrelevant regions unchanged

    Multi-crop Contrastive Learning for Unsupervised Image-to-Image Translation

    Full text link
    Recently, image-to-image translation methods based on contrastive learning achieved state-of-the-art results in many tasks. However, the negatives are sampled from the input feature spaces in the previous work, which makes the negatives lack diversity. Moreover, in the latent space of the embedings,the previous methods ignore domain consistency between the generated image and the real images of target domain. In this paper, we propose a novel contrastive learning framework for unpaired image-to-image translation, called MCCUT. We utilize the multi-crop views to generate the negatives via the center-crop and the random-crop, which can improve the diversity of negatives and meanwhile increase the quality of negatives. To constrain the embedings in the deep feature space,, we formulate a new domain consistency loss function, which encourages the generated images to be close to the real images in the embedding space of same domain. Furthermore, we present a dual coordinate channel attention network by embedding positional information into SENet, which called DCSE module. We employ the DCSE module in the design of generator, which makes the generator pays more attention to channels with greater weight. In many image-to-image translation tasks, our method achieves state-of-the-art results, and the advantages of our method have been proved through extensive comparison experiments and ablation research
    • …
    corecore