28 research outputs found

    Depth-aware convolutional neural networks for accurate 3D pose estimation in RGB-D images

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Most recent approaches to 3D pose estimation from RGB-D images address the problem in a two-stage pipeline. First, they learn a classifier –typically a random forest– to predict the position of each input pixel on the object surface. These estimates are then used to define an energy function that is minimized w.r.t. the object pose. In this paper, we focus on the first stage of the problem and propose a novel classifier based on a depth-aware Convolutional Neural Network. This classifier is able to learn a scale-adaptive regression model that yields very accurate pixel-level predictions, allowing to finally estimate the pose using a simple RANSAC-based scheme, with no need to optimize complex ad hoc energy functions. Our experiments on publicly available datasets show that our approach achieves remarkable improvements over state-of-the-art methods.Peer ReviewedPostprint (author's final draft

    Real time vehicle recognition: a novel method for road detection

    Get PDF
    Knowing the location of the road in an intelligent traffic systems is one of the most used solutions to ease vehicle detection. For this purpose we propose a vehicle recognition algorithm which performs a real time automatic detection of the zones which vehicles occupy. Such algorithm is capable of functioning under extreme conditions such as low resolution, low capture angle and gray scale images.Peer ReviewedPreprin

    Matchability prediction for full-search template matching algorithms

    Get PDF
    While recent approaches have shown that it is possible to do template matching by exhaustively scanning the parameter space, the resulting algorithms are still quite demanding. In this paper we alleviate the computational load of these algorithms by proposing an efficient approach for predicting the match ability of a template, before it is actually performed. This avoids large amounts of unnecessary computations. We learn the match ability of templates by using dense convolutional neural network descriptors that do not require ad-hoc criteria to characterize a template. By using deep learning descriptions of patches we are able to predict match ability over the whole image quite reliably. We will also show how no specific training data is required to solve problems like panorama stitching in which you usually require data from the scene in question. Due to the highly parallelizable nature of this tasks we offer an efficient technique with a negligible computational cost at test time.Peer ReviewedPostprint (author's final draft

    Learning depth-aware deep representations for robotic perception

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Exploiting RGB-D data by means of Convolutional Neural Networks (CNNs) is at the core of a number of robotics applications, including object detection, scene semantic segmentation and grasping. Most existing approaches, however, exploit RGB-D data by simply considering depth as an additional input channel for the network. In this paper we show that the performance of deep architectures can be boosted by introducing DaConv, a novel, general-purpose CNN block which exploits depth to learn scale-aware feature representations. We demonstrate the benefits of DaConv on a variety of robotics oriented tasks, involving affordance detection, object coordinate regression and contour detection in RGB-D images. In each of these experiments we show the potential of the proposed block and how it can be readily integrated into existing CNN architectures.Peer ReviewedPostprint (author's final draft

    Estimación monocular y eficiente de la pose usando modelos 3D complejos

    Get PDF
    Trabajo presentado a las XXXV Jornadas de Automática celebradas en Valencia del 3 al 5 de septiembre de 2014.-- Premio Infaimon a mejor artículo de visión.El siguiente documento presenta un método robusto y eficiente para estimar la pose de una cámara. El método propuesto asume el conocimiento previo de un modelo 3D del entorno, y compara una nueva imagen de entrada únicamente con un conjunto pequeño de imágenes similares seleccionadas previamente por un algoritmo de >Bag of Visual Words>. De esta forma se evita el alto coste computacional de calcular la correspondencia de los puntos 2D de la imagen de entrada contra todos los puntos 3D de un modelo complejo, que en nuestro caso contiene más de 100,000 puntos. La estimación de la pose se lleva a cabo a partir de estas correspondencias 2D-3D utilizando un novedoso algoritmo de PnP que realiza la eliminación de valores atípicos (outliers) sin necesidad de utilizar RANSAC, y que es entre 10 y 100 veces más rápido que los métodos que lo utilizan.Este trabajo ha estado financiado en parte por los proyectos RobTaskCoop DPI2010-17112, ERA-Net Chistera ViSen PCIN-2013-047, y por el proyecto EU ARCAS FP7-ICT-2011-287617.Peer Reviewe

    Efficient monocular pose estimation for complex 3D models

    Get PDF
    Trabajo presentado al ICRA celebrado en Seattle (US) del 26 al 30 de mayo de 2015.We propose a robust and efficient method to estimate the pose of a camera with respect to complex 3D textured models of the environment that can potentially contain more than 100, 000 points. To tackle this problem we follow a top down approach where we combine high-level deep network classifiers with low level geometric approaches to come up with a solution that is fast, robust and accurate. Given an input image, we initially use a pre-trained deep network to compute a rough estimation of the camera pose. This initial estimate constrains the number of 3D model points that can be seen from the camera viewpoint. We then establish 3D-to-2D correspondences between these potentially visible points of the model and the 2D detected image features. Accurate pose estimation is finally obtained from the 2D-to-3D correspondences using a novel PnP algorithm that rejects outliers without the need to use a RANSAC strategy, and which is between 10 and 100 times faster than other methods that use it. Two real experimentsdealing with very large and complex 3D models demonstrate the effectiveness of the approach.This work has been partially funded by the Spanish Ministry of Economy and Competitiveness under projects ERANet Chistera project ViSen PCIN-2013-047, PAU+ DPI2011-27510 and ROBOT-INT-COOP DPI2013-42458-P, and by the EU project ARCAS FP7-ICT-2011-28761.Peer Reviewe

    Estimación monocular y eficiente de la pose usando modelos 3D complejos

    Get PDF
    El siguiente documento presenta un método robusto y eficiente para estimar la pose de una cámara. El método propuesto asume el conocimiento previo de un modelo 3D del entorno, y compara una nueva imagen de entrada únicamente con un conjunto pequeño de imágenes similares seleccionadas previamente por un algoritmo dePeer ReviewedPostprint (author’s final draft

    MSClique: Multiple structure discovery through the maximum weighted clique problem

    Get PDF
    We present a novel approach for feature correspondence and multiple structure discovery in computer vision. In contrast to existing methods, we exploit the fact that point-sets on the same structure usually lie close to each other, thus forming clusters in the image. Given a pair of input images, we initially extract points of interest and extract hierarchical representations by agglomerative clustering. We use the maximum weighted clique problem to find the set of corresponding clusters with maximum number of inliers representing the multiple structures at the correct scales. Our method is parameter-free and only needs two sets of points along with their tentative correspondences, thus being extremely easy to use. We demonstrate the effectiveness of our method in multiple-structure fitting experiments in both publicly available and in-house datasets. As shown in the experiments, our approach finds a higher number of structures containing fewer outliers compared to state-of-the-art methods.Peer ReviewedPostprint (published version

    3D pose estimation in complex environments

    Get PDF
    Although there has been remarkable progress in the pose estimation literature, there are still a number of limitations when existing algorithms must be applied in everyday applications, especially in uncontrolled environments. This thesis has addressed some of these limitations, computing the pose for uncalibrated cameras, computing the pose without knowing the correspondence between 20 and 30 points, computing the pose when the points of interest are unreliable and computing the pose using only depth data. The problems addressed, and consequently their contributions, have been analyzed in order of increasing complexity. At each new stage of the doctoral thesis existing restrictions for obtaining 30 camera pose increased. The thesis has consisted of four parts on which we will define the contributions made to the field of Computer Vision. The first contribution of the doctoral thesis has focused on providing a technique for obtaining the pose of an uncalibrated camera more robust and accurate than existing approaches. By the re-formulation of the equations used in calibrated perspectives methods and by studying numerical stability we obtained an extended equation formulation that offered a closed solution to the problem with increased stability in the presence of noise compared to the state of the art. The second contribution of the thesis has focused on the fact that most algorithms are based on having a set of 20-30 correspondences. This task usually involves the extraction and matching of points of interest. In this thesis it we have developed an algorithm that solves the estimation of correspondences between points and estimate the pose of the camera together, all this in an uncalibrated context. By solving both problems together you can optimize the steps we take much better than by just solving them separately. In articles published as a result of this work we have shown the advantages inherent in this approach. The third contribution of the thesis has been to provide a solution for estimating the pose of the camera in extreme situations where the image quality is very deteriorated. This is possible through the use of learning techniques from high-quality data and 30 models of the environment and the objects. This approach is based on the notion that by learning from high-quality data we can obtain detectors that are able to recognize objects in the worst circumstances because they know in depth what defines the object in question. The fourth contribution of the thesis is the creation of a pose estimation method that does not require color information, only depth. By defining local volumetric dense appearance and performing a dense feature extraction over the depth image. Once the dense feature sampling is obtained we perform an energy minimisation taking into account the pairwise terms between individual features. We obtain accuracy comparable to state of the art methods while performing atan arder of magnitude less time per image. The sum of the above contributions in 30 pose estimation have improved 30 reconstruction tools such as robotic vision and relocation in 30 maps. All contributions have been published in international journals and conferences of reputed scientific prestige in the area.Aunque ha habido un progreso notable en la literatura de estimación de pose, todavía hay un número de limitaciones cuando los algoritmos existentes deben ser aplicados en aplicaciones de uso diario, especialmente en ambientes no controlados. En esta tesis se han abordado algunas de estas limitaciones, la computación de la pose para cámaras no calibradas, la computación de la pose sin conocer la correspondencia entre puntos 20 y 30, la computación de la pose cuando los puntos de interés no son fiables y la computación de la pose usando exclusivamente datos de profundidad. Los problemas abordados, y en consecuencia las contribuciones aportadas, han sido analizados en orden creciente de complejidad. En cada nueva etapa de la tesis doctoral se incrementaban las restricciones existentes para la obtención de la pose 30 de la cámara. La tesis ha constado de cuatro partes sobre las que pasaremos a definir las contribuciones realizadas al área de la Visión por Computador. La primera contribución de la tesis doctoral se ha centrado en ofrecer una técnica para la obtención de la pose de una cámara perspectiva sin calibrar más robusta y precisa que los existentes. Mediante la re-formulación de las ecuaciones perspectivas usadas en métodos calibrados y el estudio de la estabilidad numérica de las mismas se ha obtenido una formulación extendida de las ecuaciones perspectivas que ofrece una solución cerrada al problema y una mayor estabilidad en presencia de ruido. La segunda contribución de la tesis se ha centrado en el hecho de que la mayoría de los algoritmos se basan en tener un conjunto de correspondencias 20-30. Esta tarea implica generalmente la extracción y emparejamiento de puntos de interés. En esta tesis doctoral se ha desarrollado un algoritmo que aborda la estimación de las correspondencias entre puntos y estimación de la pose de la cámara de manera conjunta. Al resolver ambos problemas conjuntamente se puede optimizar los pasos a tomar mucho mejor que resolviéndolos por separado. En los trabajos publicados a raíz de este trabajo se han mostrado las ventajas inherentes a esta aproximación al problema. La tercera contribución de la tesis ha sido la de aportar una solución para la estimación de la pose de la cámara en situaciones extremas en las que la calidad de la imagen se encuentra muy deteriorada. Esto es posible mediante el uso de técnicas de aprendizaje a partir de datos de alta calidad y modelos 30 del entorno y los objetos presentes. Esta aproximación se basa en la noción de que a partir de un aprendizaje sobre datos de alta calidad se pueden obtener detectores que son capaces de reconocer los objetos en las peores circunstancias ya que conocen en profundidad aquello que define al objeto en cuestión. La cuarta contribución de la tesis es la creación de un método de estimación de pose que no requiere de información de color, solamente profundidad. Mediante una definición de apariencia volumétrica local y la extracción densa de características en la imagen de profundidad se obtiene un método comparable en precisión al estado de la cuestión pero un orden de magnitud mas rápido. La suma de las contribuciones anteriores en las tareas de estimación de pose 30 han posibilitado la mejora en las herramientas de reconstrucción 30, visión robótica y relocalización en mapas 30. Todas las contribuciones han sido publicadas en revistas y congresos internacionales y de reputado prestigio científico en el área.Postprint (published version

    Camera Pose Estimation in Complex Environments

    No full text
    Although there has been remarkable progress in the pose estimation literature, there are still a number of limitations when existing algorithms must be applied in everyday applications, especially in uncontrolled environments. This thesis has addressed some of these limitations, computing the pose for uncalibrated cameras, computing the pose without knowing the correspondence between 20 and 30 points, computing the pose when the points of interest are unreliable and computing the pose using only depth data. The problems addressed, and consequently their contributions, have been analyzed in order of increasing complexity
    corecore