32,169 research outputs found

    Building with Drones: Accurate 3D Facade Reconstruction using MAVs

    Full text link
    Automatic reconstruction of 3D models from images using multi-view Structure-from-Motion methods has been one of the most fruitful outcomes of computer vision. These advances combined with the growing popularity of Micro Aerial Vehicles as an autonomous imaging platform, have made 3D vision tools ubiquitous for large number of Architecture, Engineering and Construction applications among audiences, mostly unskilled in computer vision. However, to obtain high-resolution and accurate reconstructions from a large-scale object using SfM, there are many critical constraints on the quality of image data, which often become sources of inaccuracy as the current 3D reconstruction pipelines do not facilitate the users to determine the fidelity of input data during the image acquisition. In this paper, we present and advocate a closed-loop interactive approach that performs incremental reconstruction in real-time and gives users an online feedback about the quality parameters like Ground Sampling Distance (GSD), image redundancy, etc on a surface mesh. We also propose a novel multi-scale camera network design to prevent scene drift caused by incremental map building, and release the first multi-scale image sequence dataset as a benchmark. Further, we evaluate our system on real outdoor scenes, and show that our interactive pipeline combined with a multi-scale camera network approach provides compelling accuracy in multi-view reconstruction tasks when compared against the state-of-the-art methods.Comment: 8 Pages, 2015 IEEE International Conference on Robotics and Automation (ICRA '15), Seattle, WA, US

    ImageSpirit: Verbal Guided Image Parsing

    Get PDF
    Humans describe images in terms of nouns and adjectives while algorithms operate on images represented as sets of pixels. Bridging this gap between how humans would like to access images versus their typical representation is the goal of image parsing, which involves assigning object and attribute labels to pixel. In this paper we propose treating nouns as object labels and adjectives as visual attribute labels. This allows us to formulate the image parsing problem as one of jointly estimating per-pixel object and attribute labels from a set of training images. We propose an efficient (interactive time) solution. Using the extracted labels as handles, our system empowers a user to verbally refine the results. This enables hands-free parsing of an image into pixel-wise object/attribute labels that correspond to human semantics. Verbally selecting objects of interests enables a novel and natural interaction modality that can possibly be used to interact with new generation devices (e.g. smart phones, Google Glass, living room devices). We demonstrate our system on a large number of real-world images with varying complexity. To help understand the tradeoffs compared to traditional mouse based interactions, results are reported for both a large scale quantitative evaluation and a user study.Comment: http://mmcheng.net/imagespirit

    Active Image-based Modeling with a Toy Drone

    Full text link
    Image-based modeling techniques can now generate photo-realistic 3D models from images. But it is up to users to provide high quality images with good coverage and view overlap, which makes the data capturing process tedious and time consuming. We seek to automate data capturing for image-based modeling. The core of our system is an iterative linear method to solve the multi-view stereo (MVS) problem quickly and plan the Next-Best-View (NBV) effectively. Our fast MVS algorithm enables online model reconstruction and quality assessment to determine the NBVs on the fly. We test our system with a toy unmanned aerial vehicle (UAV) in simulated, indoor and outdoor experiments. Results show that our system improves the efficiency of data acquisition and ensures the completeness of the final model.Comment: To be published on International Conference on Robotics and Automation 2018, Brisbane, Australia. Project Page: https://huangrui815.github.io/active-image-based-modeling/ The author's personal page: http://www.sfu.ca/~rha55

    Towards a new generation of transport services adapted to multimedia application

    Get PDF
    Une connexion d'ordre et de fiabilité partiels (POC, partial order connection) est une connexion de transport autorisée à perdre certains objets mais également à les délivrer dans un ordre éventuellement différent de celui d'émission. L'approche POC établit un lien conceptuel entre les protocoles sans connexion au mieux et les protocoles fiables avec connexion. Le concept de POC est motivé par le fait que dans les réseaux hétérogènes sans connexion tels qu'Internet, les paquets transmis sont susceptibles de se perdre et d'arriver en désordre, entraînant alors une réduction des performances des protocoles usuels. De plus, on montre qu'un protocole associé au transport d'un flux multimédia permet une réduction très sensible de l'utilisation des ressources de communication et de mémorisation ainsi qu'une diminution du temps de transit moyen. Dans cet article, une extension temporelle de POC, nommée TPOC (POC temporisé), est introduite. Elle constitue un cadre conceptuel permettant la prise en compte des exigences de qualité de service des applications multimédias réparties. Une architecture offrant un service TPOC est également introduite et évaluée dans le cadre du transport de vidéo MPEG. Il est ainsi démontré que les connexions POC comblent, non seulement le fossé conceptuel entre les protocoles sans connexion et avec connexion, mais aussi qu'ils surpassent les performances des ces derniers lorsque des données multimédias (telles que la vidéo MPEG) sont transportées

    A Survey on Bayesian Deep Learning

    Full text link
    A comprehensive artificial intelligence system needs to not only perceive the environment with different `senses' (e.g., seeing and hearing) but also infer the world's conditional (or even causal) relations and corresponding uncertainty. The past decade has seen major advances in many perception tasks such as visual object recognition and speech recognition using deep learning models. For higher-level inference, however, probabilistic graphical models with their Bayesian nature are still more powerful and flexible. In recent years, Bayesian deep learning has emerged as a unified probabilistic framework to tightly integrate deep learning and Bayesian models. In this general framework, the perception of text or images using deep learning can boost the performance of higher-level inference and in turn, the feedback from the inference process is able to enhance the perception of text or images. This survey provides a comprehensive introduction to Bayesian deep learning and reviews its recent applications on recommender systems, topic models, control, etc. Besides, we also discuss the relationship and differences between Bayesian deep learning and other related topics such as Bayesian treatment of neural networks.Comment: To appear in ACM Computing Surveys (CSUR) 202

    A trust-region method for stochastic variational inference with applications to streaming data

    Full text link
    Stochastic variational inference allows for fast posterior inference in complex Bayesian models. However, the algorithm is prone to local optima which can make the quality of the posterior approximation sensitive to the choice of hyperparameters and initialization. We address this problem by replacing the natural gradient step of stochastic varitional inference with a trust-region update. We show that this leads to generally better results and reduced sensitivity to hyperparameters. We also describe a new strategy for variational inference on streaming data and show that here our trust-region method is crucial for getting good performance.Comment: in Proceedings of the 32nd International Conference on Machine Learning, 201
    corecore