102,109 research outputs found
Multiple image view synthesis for free viewpoint video applications
Interactive audio-visual (AV) applications such as free viewpoint video (FVV) aim to enable unrestricted spatio-temporal navigation within multiple camera environments. Current virtual viewpoint view synthesis solutions for FVV are either purely image-based implying large information redundancy; or involve reconstructing complex 3D models of the scene. In this paper we present a new multiple image view synthesis algorithm that only requires camera parameters and disparity maps. The multi-view synthesis (MVS) approach can be used in any multi-camera environment and is scalable as virtual views can be created given 1 to N of the available video inputs, providing a means to gracefully handle scenarios where camera inputs decrease or increase over time. The algorithm identifies and selects only the best quality surface areas from available reference images, thereby reducing perceptual errors in virtual view reconstruction. Experimental results are presented and verified using both objective (PSNR) and subjective comparisons
Solving Visual Madlibs with Multiple Cues
This paper focuses on answering fill-in-the-blank style multiple choice
questions from the Visual Madlibs dataset. Previous approaches to Visual
Question Answering (VQA) have mainly used generic image features from networks
trained on the ImageNet dataset, despite the wide scope of questions. In
contrast, our approach employs features derived from networks trained for
specialized tasks of scene classification, person activity prediction, and
person and object attribute prediction. We also present a method for selecting
sub-regions of an image that are relevant for evaluating the appropriateness of
a putative answer. Visual features are computed both from the whole image and
from local regions, while sentences are mapped to a common space using a simple
normalized canonical correlation analysis (CCA) model. Our results show a
significant improvement over the previous state of the art, and indicate that
answering different question types benefits from examining a variety of image
cues and carefully choosing informative image sub-regions
Using image morphing for memory-efficient impostor rendering on GPU
Real-time rendering of large animated crowds consisting thousands of virtual humans is important for several applications including simulations, games and interactive walkthroughs; but cannot be performed using complex polygonal models at interactive frame rates. For that reason, several methods using large numbers of pre-computed image-based representations, which are called as impostors, have been proposed. These methods take the advantage of existing programmable graphics hardware to compensate the computational expense while maintaining the visual fidelity. Making the number of different virtual humans, which can be rendered in real-time, not restricted anymore by the required computational power but by the texture memory consumed for the variety and discretization of their animations. In this work, we proposed an alternative method that reduces the memory consumption by generating compelling intermediate textures using image-morphing techniques. In order to demonstrate the preserved perceptual quality of animations, where half of the key-frames were rendered using the proposed methodology, we have implemented the system using the graphical processing unit and obtained promising results at interactive frame rates
Exploiting Deep Features for Remote Sensing Image Retrieval: A Systematic Investigation
Remote sensing (RS) image retrieval is of great significant for geological
information mining. Over the past two decades, a large amount of research on
this task has been carried out, which mainly focuses on the following three
core issues: feature extraction, similarity metric and relevance feedback. Due
to the complexity and multiformity of ground objects in high-resolution remote
sensing (HRRS) images, there is still room for improvement in the current
retrieval approaches. In this paper, we analyze the three core issues of RS
image retrieval and provide a comprehensive review on existing methods.
Furthermore, for the goal to advance the state-of-the-art in HRRS image
retrieval, we focus on the feature extraction issue and delve how to use
powerful deep representations to address this task. We conduct systematic
investigation on evaluating correlative factors that may affect the performance
of deep features. By optimizing each factor, we acquire remarkable retrieval
results on publicly available HRRS datasets. Finally, we explain the
experimental phenomenon in detail and draw conclusions according to our
analysis. Our work can serve as a guiding role for the research of
content-based RS image retrieval
Near real-time stereo vision system
The apparatus for a near real-time stereo vision system for use with a robotic vehicle is described. The system is comprised of two cameras mounted on three-axis rotation platforms, image-processing boards, a CPU, and specialized stereo vision algorithms. Bandpass-filtered image pyramids are computed, stereo matching is performed by least-squares correlation, and confidence ranges are estimated by means of Bayes' theorem. In particular, Laplacian image pyramids are built and disparity maps are produced from the 60 x 64 level of the pyramids at rates of up to 2 seconds per image pair. The first autonomous cross-country robotic traverses (of up to 100 meters) have been achieved using the stereo vision system of the present invention with all computing done onboard the vehicle. The overall approach disclosed herein provides a unifying paradigm for practical domain-independent stereo ranging
Manipulating Attributes of Natural Scenes via Hallucination
In this study, we explore building a two-stage framework for enabling users
to directly manipulate high-level attributes of a natural scene. The key to our
approach is a deep generative network which can hallucinate images of a scene
as if they were taken at a different season (e.g. during winter), weather
condition (e.g. in a cloudy day) or time of the day (e.g. at sunset). Once the
scene is hallucinated with the given attributes, the corresponding look is then
transferred to the input image while preserving the semantic details intact,
giving a photo-realistic manipulation result. As the proposed framework
hallucinates what the scene will look like, it does not require any reference
style image as commonly utilized in most of the appearance or style transfer
approaches. Moreover, it allows to simultaneously manipulate a given scene
according to a diverse set of transient attributes within a single model,
eliminating the need of training multiple networks per each translation task.
Our comprehensive set of qualitative and quantitative results demonstrate the
effectiveness of our approach against the competing methods.Comment: Accepted for publication in ACM Transactions on Graphic
- …