948 research outputs found
Negative Results in Computer Vision: A Perspective
A negative result is when the outcome of an experiment or a model is not what
is expected or when a hypothesis does not hold. Despite being often overlooked
in the scientific community, negative results are results and they carry value.
While this topic has been extensively discussed in other fields such as social
sciences and biosciences, less attention has been paid to it in the computer
vision community. The unique characteristics of computer vision, particularly
its experimental aspect, call for a special treatment of this matter. In this
paper, I will address what makes negative results important, how they should be
disseminated and incentivized, and what lessons can be learned from cognitive
vision research in this regard. Further, I will discuss issues such as computer
vision and human vision interaction, experimental design and statistical
hypothesis testing, explanatory versus predictive modeling, performance
evaluation, model comparison, as well as computer vision research culture
Cross-View Image Synthesis using Conditional GANs
Learning to generate natural scenes has always been a challenging task in
computer vision. It is even more painstaking when the generation is conditioned
on images with drastically different views. This is mainly because
understanding, corresponding, and transforming appearance and semantic
information across the views is not trivial. In this paper, we attempt to solve
the novel problem of cross-view image synthesis, aerial to street-view and vice
versa, using conditional generative adversarial networks (cGAN). Two new
architectures called Crossview Fork (X-Fork) and Crossview Sequential (X-Seq)
are proposed to generate scenes with resolutions of 64x64 and 256x256 pixels.
X-Fork architecture has a single discriminator and a single generator. The
generator hallucinates both the image and its semantic segmentation in the
target view. X-Seq architecture utilizes two cGANs. The first one generates the
target image which is subsequently fed to the second cGAN for generating its
corresponding semantic segmentation map. The feedback from the second cGAN
helps the first cGAN generate sharper images. Both of our proposed
architectures learn to generate natural images as well as their semantic
segmentation maps. The proposed methods show that they are able to capture and
maintain the true semantics of objects in source and target views better than
the traditional image-to-image translation method which considers only the
visual appearance of the scene. Extensive qualitative and quantitative
evaluations support the effectiveness of our frameworks, compared to two state
of the art methods, for natural scene generation across drastically different
views.Comment: Accepted at CVPR 201
- …