1,107 research outputs found

    Global Auto-regressive Depth Recovery via Iterative Non-local Filtering

    Get PDF
    Existing depth sensing techniques have many shortcomings in terms of resolution, completeness, and accuracy. The performance of 3-D broadcasting systems is therefore limited by the challenges of capturing high-resolution depth data. In this paper, we present a novel framework for obtaining high-quality depth images and multi-view depth videos from simple acquisition systems. We first propose a single depth image recovery algorithm based on auto-regressive (AR) correlations. A fixed-point iteration algorithm under the global AR modeling is derived to efficiently solve the large-scale quadratic programming. Each iteration is equivalent to a nonlocal filtering process with a residue feedback. Then, we extend our framework to an AR-based multi-view depth video recovery framework, where each depth map is recovered from low-quality measurements with the help of the corresponding color image, depth maps from neighboring views, and depth maps of temporally adjacent frames. AR coefficients on nonlocal spatiotemporal neighborhoods in the algorithm are designed to improve the recovery performance. We further discuss the connections between our model and other methods like graph-based tools, and demonstrate that our algorithms enjoy the advantages of both global and local methods. Experimental results on both the Middleburry datasets and other captured datasets finally show that our method is able to improve the performances of depth images and multi-view depth videos recovery compared with state-of-the-art approaches

    Intelligent strategies for mobile robotics in laboratory automation

    Get PDF
    In this thesis a new intelligent framework is presented for the mobile robots in laboratory automation, which includes: a new multi-floor indoor navigation method is presented and an intelligent multi-floor path planning is proposed; a new signal filtering method is presented for the robots to forecast their indoor coordinates; a new human feature based strategy is proposed for the robot-human smart collision avoidance; a new robot power forecasting method is proposed to decide a distributed transportation task; a new blind approach is presented for the arm manipulations for the robots

    Explicit Edge Inconsistency Evaluation Model for Color-Guided Depth Map Enhancement

    Full text link
    © 2016 IEEE. Color-guided depth enhancement is used to refine depth maps according to the assumption that the depth edges and the color edges at the corresponding locations are consistent. In methods on such low-level vision tasks, the Markov random field (MRF), including its variants, is one of the major approaches that have dominated this area for several years. However, the assumption above is not always true. To tackle the problem, the state-of-the-art solutions are to adjust the weighting coefficient inside the smoothness term of the MRF model. These methods lack an explicit evaluation model to quantitatively measure the inconsistency between the depth edge map and the color edge map, so they cannot adaptively control the efforts of the guidance from the color image for depth enhancement, leading to various defects such as texture-copy artifacts and blurring depth edges. In this paper, we propose a quantitative measurement on such inconsistency and explicitly embed it into the smoothness term. The proposed method demonstrates promising experimental results compared with the benchmark and state-of-the-art methods on the Middlebury ToF-Mark, and NYU data sets

    Fifty Years of Noise Modeling and Mitigation in Power-Line Communications.

    Get PDF
    Building on the ubiquity of electric power infrastructure, power line communications (PLC) has been successfully used in diverse application scenarios, including the smart grid and in-home broadband communications systems as well as industrial and home automation. However, the power line channel exhibits deleterious properties, one of which is its hostile noise environment. This article aims for providing a review of noise modeling and mitigation techniques in PLC. Specifically, a comprehensive review of representative noise models developed over the past fifty years is presented, including both the empirical models based on measurement campaigns and simplified mathematical models. Following this, we provide an extensive survey of the suite of noise mitigation schemes, categorizing them into mitigation at the transmitter as well as parametric and non-parametric techniques employed at the receiver. Furthermore, since the accuracy of channel estimation in PLC is affected by noise, we review the literature of joint noise mitigation and channel estimation solutions. Finally, a number of directions are outlined for future research on both noise modeling and mitigation in PLC

    3D Face Recognition

    Get PDF

    Computer vision beyond the visible : image understanding through language

    Get PDF
    In the past decade, deep neural networks have revolutionized computer vision. High performing deep neural architectures trained for visual recognition tasks have pushed the field towards methods relying on learned image representations instead of hand-crafted ones, in the seek of designing end-to-end learning methods to solve challenging tasks, ranging from long-lasting ones such as image classification to newly emerging tasks like image captioning. As this thesis is framed in the context of the rapid evolution of computer vision, we present contributions that are aligned with three major changes in paradigm that the field has recently experienced, namely 1) the power of re-utilizing deep features from pre-trained neural networks for different tasks, 2) the advantage of formulating problems with end-to-end solutions given enough training data, and 3) the growing interest of describing visual data with natural language rather than pre-defined categorical label spaces, which can in turn enable visual understanding beyond scene recognition. The first part of the thesis is dedicated to the problem of visual instance search, where we particularly focus on obtaining meaningful and discriminative image representations which allow efficient and effective retrieval of similar images given a visual query. Contributions in this part of the thesis involve the construction of sparse Bag-of-Words image representations from convolutional features from a pre-trained image classification neural network, and an analysis of the advantages of fine-tuning a pre-trained object detection network using query images as training data. The second part of the thesis presents contributions to the problem of image-to-set prediction, understood as the task of predicting a variable-sized collection of unordered elements for an input image. We conduct a thorough analysis of current methods for multi-label image classification, which are able to solve the task in an end-to-end manner by simultaneously estimating both the label distribution and the set cardinality. Further, we extend the analysis of set prediction methods to semantic instance segmentation, and present an end-to-end recurrent model that is able to predict sets of objects (binary masks and categorical labels) in a sequential manner. Finally, the third part of the dissertation takes insights learned in the previous two parts in order to present deep learning solutions to connect images with natural language in the context of cooking recipes and food images. First, we propose a retrieval-based solution in which the written recipe and the image are encoded into compact representations that allow the retrieval of one given the other. Second, as an alternative to the retrieval approach, we propose a generative model to predict recipes directly from food images, which first predicts ingredients as sets and subsequently generates the rest of the recipe one word at a time by conditioning both on the image and the predicted ingredients.En l'última dècada, les xarxes neuronals profundes han revolucionat el camp de la visió per computador. Els resultats favorables obtinguts amb arquitectures neuronals profundes entrenades per resoldre tasques de reconeixement visual han causat un canvi de paradigma cap al disseny de mètodes basats en representacions d'imatges apreses de manera automàtica, deixant enrere les tècniques tradicionals basades en l'enginyeria de representacions. Aquest canvi ha permès l'aparició de tècniques basades en l'aprenentatge d'extrem a extrem (end-to-end), capaces de resoldre de manera efectiva molts dels problemes tradicionals de la visió per computador (e.g. classificació d'imatges o detecció d'objectes), així com nous problemes emergents com la descripció textual d'imatges (image captioning). Donat el context de la ràpida evolució de la visió per computador en el qual aquesta tesi s'emmarca, presentem contribucions alineades amb tres dels canvis més importants que la visió per computador ha experimentat recentment: 1) la reutilització de representacions extretes de models neuronals pre-entrenades per a tasques auxiliars, 2) els avantatges de formular els problemes amb solucions end-to-end entrenades amb grans bases de dades, i 3) el creixent interès en utilitzar llenguatge natural en lloc de conjunts d'etiquetes categòriques pre-definits per descriure el contingut visual de les imatges, facilitant així l'extracció d'informació visual més enllà del reconeixement de l'escena i els elements que la composen La primera part de la tesi està dedicada al problema de la cerca d'imatges (image retrieval), centrada especialment en l'obtenció de representacions visuals significatives i discriminatòries que permetin la recuperació eficient i efectiva d'imatges donada una consulta formulada amb una imatge d'exemple. Les contribucions en aquesta part de la tesi inclouen la construcció de representacions Bag-of-Words a partir de descriptors locals obtinguts d'una xarxa neuronal entrenada per classificació, així com un estudi dels avantatges d'utilitzar xarxes neuronals per a detecció d'objectes entrenades utilitzant les imatges d'exemple, amb l'objectiu de millorar les capacitats discriminatòries de les representacions obtingudes. La segona part de la tesi presenta contribucions al problema de predicció de conjunts a partir d'imatges (image to set prediction), entès com la tasca de predir una col·lecció no ordenada d'elements de longitud variable donada una imatge d'entrada. En aquest context, presentem una anàlisi exhaustiva dels mètodes actuals per a la classificació multi-etiqueta d'imatges, que són capaços de resoldre la tasca de manera integral calculant simultàniament la distribució probabilística sobre etiquetes i la cardinalitat del conjunt. Seguidament, estenem l'anàlisi dels mètodes de predicció de conjunts a la segmentació d'instàncies semàntiques, presentant un model recurrent capaç de predir conjunts d'objectes (representats per màscares binàries i etiquetes categòriques) de manera seqüencial. Finalment, la tercera part de la tesi estén els coneixements apresos en les dues parts anteriors per presentar solucions d'aprenentatge profund per connectar imatges amb llenguatge natural en el context de receptes de cuina i imatges de plats cuinats. En primer lloc, proposem una solució basada en algoritmes de cerca, on la recepta escrita i la imatge es codifiquen amb representacions compactes que permeten la recuperació d'una donada l'altra. En segon lloc, com a alternativa a la solució basada en algoritmes de cerca, proposem un model generatiu capaç de predir receptes (compostes pels seus ingredients, predits com a conjunts, i instruccions) directament a partir d'imatges de menjar.Postprint (published version
    corecore