32,169 research outputs found
Building with Drones: Accurate 3D Facade Reconstruction using MAVs
Automatic reconstruction of 3D models from images using multi-view
Structure-from-Motion methods has been one of the most fruitful outcomes of
computer vision. These advances combined with the growing popularity of Micro
Aerial Vehicles as an autonomous imaging platform, have made 3D vision tools
ubiquitous for large number of Architecture, Engineering and Construction
applications among audiences, mostly unskilled in computer vision. However, to
obtain high-resolution and accurate reconstructions from a large-scale object
using SfM, there are many critical constraints on the quality of image data,
which often become sources of inaccuracy as the current 3D reconstruction
pipelines do not facilitate the users to determine the fidelity of input data
during the image acquisition. In this paper, we present and advocate a
closed-loop interactive approach that performs incremental reconstruction in
real-time and gives users an online feedback about the quality parameters like
Ground Sampling Distance (GSD), image redundancy, etc on a surface mesh. We
also propose a novel multi-scale camera network design to prevent scene drift
caused by incremental map building, and release the first multi-scale image
sequence dataset as a benchmark. Further, we evaluate our system on real
outdoor scenes, and show that our interactive pipeline combined with a
multi-scale camera network approach provides compelling accuracy in multi-view
reconstruction tasks when compared against the state-of-the-art methods.Comment: 8 Pages, 2015 IEEE International Conference on Robotics and
Automation (ICRA '15), Seattle, WA, US
ImageSpirit: Verbal Guided Image Parsing
Humans describe images in terms of nouns and adjectives while algorithms
operate on images represented as sets of pixels. Bridging this gap between how
humans would like to access images versus their typical representation is the
goal of image parsing, which involves assigning object and attribute labels to
pixel. In this paper we propose treating nouns as object labels and adjectives
as visual attribute labels. This allows us to formulate the image parsing
problem as one of jointly estimating per-pixel object and attribute labels from
a set of training images. We propose an efficient (interactive time) solution.
Using the extracted labels as handles, our system empowers a user to verbally
refine the results. This enables hands-free parsing of an image into pixel-wise
object/attribute labels that correspond to human semantics. Verbally selecting
objects of interests enables a novel and natural interaction modality that can
possibly be used to interact with new generation devices (e.g. smart phones,
Google Glass, living room devices). We demonstrate our system on a large number
of real-world images with varying complexity. To help understand the tradeoffs
compared to traditional mouse based interactions, results are reported for both
a large scale quantitative evaluation and a user study.Comment: http://mmcheng.net/imagespirit
Active Image-based Modeling with a Toy Drone
Image-based modeling techniques can now generate photo-realistic 3D models
from images. But it is up to users to provide high quality images with good
coverage and view overlap, which makes the data capturing process tedious and
time consuming. We seek to automate data capturing for image-based modeling.
The core of our system is an iterative linear method to solve the multi-view
stereo (MVS) problem quickly and plan the Next-Best-View (NBV) effectively. Our
fast MVS algorithm enables online model reconstruction and quality assessment
to determine the NBVs on the fly. We test our system with a toy unmanned aerial
vehicle (UAV) in simulated, indoor and outdoor experiments. Results show that
our system improves the efficiency of data acquisition and ensures the
completeness of the final model.Comment: To be published on International Conference on Robotics and
Automation 2018, Brisbane, Australia. Project Page:
https://huangrui815.github.io/active-image-based-modeling/ The author's
personal page: http://www.sfu.ca/~rha55
Towards a new generation of transport services adapted to multimedia application
Une connexion d'ordre et de fiabilité partiels (POC, partial order connection) est une connexion de transport autorisée à perdre certains objets mais également à les délivrer dans un ordre éventuellement différent de celui d'émission. L'approche POC établit un lien conceptuel entre les protocoles sans connexion au mieux et les protocoles fiables avec connexion. Le concept de POC est motivé par le fait que dans les réseaux hétérogènes sans connexion tels qu'Internet, les paquets transmis sont susceptibles de se perdre et d'arriver en désordre, entraînant alors une réduction des performances des protocoles usuels. De plus, on montre qu'un protocole associé au transport d'un flux multimédia permet une réduction très sensible de l'utilisation des ressources de communication et de mémorisation ainsi qu'une diminution du temps de transit moyen. Dans cet article, une extension temporelle de POC, nommée TPOC (POC temporisé), est introduite. Elle constitue un cadre conceptuel permettant la prise en compte des exigences de qualité de service des applications multimédias réparties. Une architecture offrant un service TPOC est également introduite et évaluée dans le cadre du transport de vidéo MPEG. Il est ainsi démontré que les connexions POC comblent, non seulement le fossé conceptuel entre les protocoles sans connexion et avec connexion, mais aussi qu'ils surpassent les performances des ces derniers lorsque des données multimédias (telles que la vidéo MPEG) sont transportées
A Survey on Bayesian Deep Learning
A comprehensive artificial intelligence system needs to not only perceive the
environment with different `senses' (e.g., seeing and hearing) but also infer
the world's conditional (or even causal) relations and corresponding
uncertainty. The past decade has seen major advances in many perception tasks
such as visual object recognition and speech recognition using deep learning
models. For higher-level inference, however, probabilistic graphical models
with their Bayesian nature are still more powerful and flexible. In recent
years, Bayesian deep learning has emerged as a unified probabilistic framework
to tightly integrate deep learning and Bayesian models. In this general
framework, the perception of text or images using deep learning can boost the
performance of higher-level inference and in turn, the feedback from the
inference process is able to enhance the perception of text or images. This
survey provides a comprehensive introduction to Bayesian deep learning and
reviews its recent applications on recommender systems, topic models, control,
etc. Besides, we also discuss the relationship and differences between Bayesian
deep learning and other related topics such as Bayesian treatment of neural
networks.Comment: To appear in ACM Computing Surveys (CSUR) 202
A trust-region method for stochastic variational inference with applications to streaming data
Stochastic variational inference allows for fast posterior inference in
complex Bayesian models. However, the algorithm is prone to local optima which
can make the quality of the posterior approximation sensitive to the choice of
hyperparameters and initialization. We address this problem by replacing the
natural gradient step of stochastic varitional inference with a trust-region
update. We show that this leads to generally better results and reduced
sensitivity to hyperparameters. We also describe a new strategy for variational
inference on streaming data and show that here our trust-region method is
crucial for getting good performance.Comment: in Proceedings of the 32nd International Conference on Machine
Learning, 201
- …