162 research outputs found
Unsupervised Diverse Colorization via Generative Adversarial Networks
Colorization of grayscale images has been a hot topic in computer vision.
Previous research mainly focuses on producing a colored image to match the
original one. However, since many colors share the same gray value, an input
grayscale image could be diversely colored while maintaining its reality. In
this paper, we design a novel solution for unsupervised diverse colorization.
Specifically, we leverage conditional generative adversarial networks to model
the distribution of real-world item colors, in which we develop a fully
convolutional generator with multi-layer noise to enhance diversity, with
multi-layer condition concatenation to maintain reality, and with stride 1 to
keep spatial information. With such a novel network architecture, the model
yields highly competitive performance on the open LSUN bedroom dataset. The
Turing test of 80 humans further indicates our generated color schemes are
highly convincible
l-dyno: framework to learn consistent visual features using robot's motion
Historically, feature-based approaches have been used extensively for
camera-based robot perception tasks such as localization, mapping, tracking,
and others. Several of these approaches also combine other sensors (inertial
sensing, for example) to perform combined state estimation. Our work rethinks
this approach; we present a representation learning mechanism that identifies
visual features that best correspond to robot motion as estimated by an
external signal. Specifically, we utilize the robot's transformations through
an external signal (inertial sensing, for example) and give attention to image
space that is most consistent with the external signal. We use a pairwise
consistency metric as a representation to keep the visual features consistent
through a sequence with the robot's relative pose transformations. This
approach enables us to incorporate information from the robot's perspective
instead of solely relying on the image attributes. We evaluate our approach on
real-world datasets such as KITTI & EuRoC and compare the refined features with
existing feature descriptors. We also evaluate our method using our real robot
experiment. We notice an average of 49% reduction in the image search space
without compromising the trajectory estimation accuracy. Our method reduces the
execution time of visual odometry by 4.3% and also reduces reprojection errors.
We demonstrate the need to select only the most important features and show the
competitiveness using various feature detection baselines.Comment: 7 pages, 6 figure
MCFNet: Multi-scale Covariance Feature Fusion Network for Real-time Semantic Segmentation
The low-level spatial detail information and high-level semantic abstract
information are both essential to the semantic segmentation task. The features
extracted by the deep network can obtain rich semantic information, while a lot
of spatial information is lost. However, how to recover spatial detail
information effectively and fuse it with high-level semantics has not been well
addressed so far. In this paper, we propose a new architecture based on
Bilateral Segmentation Network (BiseNet) called Multi-scale Covariance Feature
Fusion Network (MCFNet). Specifically, this network introduces a new feature
refinement module and a new feature fusion module. Furthermore, a gating unit
named L-Gate is proposed to filter out invalid information and fuse multi-scale
features. We evaluate our proposed model on Cityscapes, CamVid datasets and
compare it with the state-of-the-art methods. Extensive experiments show that
our method achieves competitive success. On Cityscapes, we achieve 75.5% mIOU
with a speed of 151.3 FPS
FOUND: Foot Optimization with Uncertain Normals for Surface Deformation Using Synthetic Data
Surface reconstruction from multi-view images is a challenging task, with
solutions often requiring a large number of sampled images with high overlap.
We seek to develop a method for few-view reconstruction, for the case of the
human foot. To solve this task, we must extract rich geometric cues from RGB
images, before carefully fusing them into a final 3D object. Our FOUND approach
tackles this, with 4 main contributions: (i) SynFoot, a synthetic dataset of
50,000 photorealistic foot images, paired with ground truth surface normals and
keypoints; (ii) an uncertainty-aware surface normal predictor trained on our
synthetic dataset; (iii) an optimization scheme for fitting a generative foot
model to a series of images; and (iv) a benchmark dataset of calibrated images
and high resolution ground truth geometry. We show that our normal predictor
outperforms all off-the-shelf equivalents significantly on real images, and our
optimization scheme outperforms state-of-the-art photogrammetry pipelines,
especially for a few-view setting. We release our synthetic dataset and
baseline 3D scans to the research community.Comment: 14 pages, 15 figure
DiffMatch: Diffusion Model for Dense Matching
The objective for establishing dense correspondence between paired images
consists of two terms: a data term and a prior term. While conventional
techniques focused on defining hand-designed prior terms, which are difficult
to formulate, recent approaches have focused on learning the data term with
deep neural networks without explicitly modeling the prior, assuming that the
model itself has the capacity to learn an optimal prior from a large-scale
dataset. The performance improvement was obvious, however, they often fail to
address inherent ambiguities of matching, such as textureless regions,
repetitive patterns, and large displacements. To address this, we propose
DiffMatch, a novel conditional diffusion-based framework designed to explicitly
model both the data and prior terms. Unlike previous approaches, this is
accomplished by leveraging a conditional denoising diffusion model. DiffMatch
consists of two main components: conditional denoising diffusion module and
cost injection module. We stabilize the training process and reduce memory
usage with a stage-wise training strategy. Furthermore, to boost performance,
we introduce an inference technique that finds a better path to the accurate
matching field. Our experimental results demonstrate significant performance
improvements of our method over existing approaches, and the ablation studies
validate our design choices along with the effectiveness of each component.
Project page is available at https://ku-cvlab.github.io/DiffMatch/.Comment: Project page is available at https://ku-cvlab.github.io/DiffMatch
EVOLIN Benchmark: Evaluation of Line Detection and Association
Lines are interesting geometrical features commonly seen in indoor and urban
environments. There is missing a complete benchmark where one can evaluate
lines from a sequential stream of images in all its stages: Line detection,
Line Association and Pose error. To do so, we present a complete and exhaustive
benchmark for visual lines in a SLAM front-end, both for RGB and RGBD, by
providing a plethora of complementary metrics. We have also labelled data from
well-known SLAM datasets in order to have all in one poses and accurately
annotated lines. In particular, we have evaluated 17 line detection algorithms,
5 line associations methods and the resultant pose error for aligning a pair of
frames with several combinations of detector-association. We have packaged all
methods and evaluations metrics and made them publicly available on web-page
https://prime-slam.github.io/evolin/
- …