2,644 research outputs found
Deep Generative Modeling of LiDAR Data
Building models capable of generating structured output is a key challenge
for AI and robotics. While generative models have been explored on many types
of data, little work has been done on synthesizing lidar scans, which play a
key role in robot mapping and localization. In this work, we show that one can
adapt deep generative models for this task by unravelling lidar scans into a 2D
point map. Our approach can generate high quality samples, while simultaneously
learning a meaningful latent representation of the data. We demonstrate
significant improvements against state-of-the-art point cloud generation
methods. Furthermore, we propose a novel data representation that augments the
2D signal with absolute positional information. We show that this helps
robustness to noisy and imputed input; the learned model can recover the
underlying lidar scan from seemingly uninformative dataComment: Presented at IROS 201
Generative Model with Coordinate Metric Learning for Object Recognition Based on 3D Models
Given large amount of real photos for training, Convolutional neural network
shows excellent performance on object recognition tasks. However, the process
of collecting data is so tedious and the background are also limited which
makes it hard to establish a perfect database. In this paper, our generative
model trained with synthetic images rendered from 3D models reduces the
workload of data collection and limitation of conditions. Our structure is
composed of two sub-networks: semantic foreground object reconstruction network
based on Bayesian inference and classification network based on multi-triplet
cost function for avoiding over-fitting problem on monotone surface and fully
utilizing pose information by establishing sphere-like distribution of
descriptors in each category which is helpful for recognition on regular photos
according to poses, lighting condition, background and category information of
rendered images. Firstly, our conjugate structure called generative model with
metric learning utilizing additional foreground object channels generated from
Bayesian rendering as the joint of two sub-networks. Multi-triplet cost
function based on poses for object recognition are used for metric learning
which makes it possible training a category classifier purely based on
synthetic data. Secondly, we design a coordinate training strategy with the
help of adaptive noises acting as corruption on input images to help both
sub-networks benefit from each other and avoid inharmonious parameter tuning
due to different convergence speed of two sub-networks. Our structure achieves
the state of the art accuracy of over 50\% on ShapeNet database with data
migration obstacle from synthetic images to real photos. This pipeline makes it
applicable to do recognition on real images only based on 3D models.Comment: 14 page
How to Train Your Agent to Read and Write
Reading and writing research papers is one of the most privileged abilities
that a qualified researcher should master. However, it is difficult for new
researchers (\eg{students}) to fully {grasp} this ability. It would be
fascinating if we could train an intelligent agent to help people read and
summarize papers, and perhaps even discover and exploit the potential knowledge
clues to write novel papers. Although there have been existing works focusing
on summarizing (\emph{i.e.}, reading) the knowledge in a given text or
generating (\emph{i.e.}, writing) a text based on the given knowledge, the
ability of simultaneously reading and writing is still under development.
Typically, this requires an agent to fully understand the knowledge from the
given text materials and generate correct and fluent novel paragraphs, which is
very challenging in practice. In this paper, we propose a Deep ReAder-Writer
(DRAW) network, which consists of a \textit{Reader} that can extract knowledge
graphs (KGs) from input paragraphs and discover potential knowledge, a
graph-to-text \textit{Writer} that generates a novel paragraph, and a
\textit{Reviewer} that reviews the generated paragraph from three different
aspects. Extensive experiments show that our DRAW network outperforms
considered baselines and several state-of-the-art methods on AGENDA and
M-AGENDA datasets. Our code and supplementary are released at
https://github.com/menggehe/DRAW
Age Progression and Regression with Spatial Attention Modules
Age progression and regression refers to aesthetically render-ing a given
face image to present effects of face aging and rejuvenation, respectively.
Although numerous studies have been conducted in this topic, there are two
major problems: 1) multiple models are usually trained to simulate different
age mappings, and 2) the photo-realism of generated face images is heavily
influenced by the variation of training images in terms of pose, illumination,
and background. To address these issues, in this paper, we propose a framework
based on conditional Generative Adversarial Networks (cGANs) to achieve age
progression and regression simultaneously. Particularly, since face aging and
rejuvenation are largely different in terms of image translation patterns, we
model these two processes using two separate generators, each dedicated to one
age changing process. In addition, we exploit spatial attention mechanisms to
limit image modifications to regions closely related to age changes, so that
images with high visual fidelity could be synthesized for in-the-wild cases.
Experiments on multiple datasets demonstrate the ability of our model in
synthesizing lifelike face images at desired ages with personalized features
well preserved, and keeping age-irrelevant regions unchanged
Multi-crop Contrastive Learning for Unsupervised Image-to-Image Translation
Recently, image-to-image translation methods based on contrastive learning
achieved state-of-the-art results in many tasks. However, the negatives are
sampled from the input feature spaces in the previous work, which makes the
negatives lack diversity. Moreover, in the latent space of the embedings,the
previous methods ignore domain consistency between the generated image and the
real images of target domain. In this paper, we propose a novel contrastive
learning framework for unpaired image-to-image translation, called MCCUT. We
utilize the multi-crop views to generate the negatives via the center-crop and
the random-crop, which can improve the diversity of negatives and meanwhile
increase the quality of negatives. To constrain the embedings in the deep
feature space,, we formulate a new domain consistency loss function, which
encourages the generated images to be close to the real images in the embedding
space of same domain. Furthermore, we present a dual coordinate channel
attention network by embedding positional information into SENet, which called
DCSE module. We employ the DCSE module in the design of generator, which makes
the generator pays more attention to channels with greater weight. In many
image-to-image translation tasks, our method achieves state-of-the-art results,
and the advantages of our method have been proved through extensive comparison
experiments and ablation research
- …