516 research outputs found

    Dense 3D Face Correspondence

    Full text link
    We present an algorithm that automatically establishes dense correspondences between a large number of 3D faces. Starting from automatically detected sparse correspondences on the outer boundary of 3D faces, the algorithm triangulates existing correspondences and expands them iteratively by matching points of distinctive surface curvature along the triangle edges. After exhausting keypoint matches, further correspondences are established by generating evenly distributed points within triangles by evolving level set geodesic curves from the centroids of large triangles. A deformable model (K3DM) is constructed from the dense corresponded faces and an algorithm is proposed for morphing the K3DM to fit unseen faces. This algorithm iterates between rigid alignment of an unseen face followed by regularized morphing of the deformable model. We have extensively evaluated the proposed algorithms on synthetic data and real 3D faces from the FRGCv2, Bosphorus, BU3DFE and UND Ear databases using quantitative and qualitative benchmarks. Our algorithm achieved dense correspondences with a mean localisation error of 1.28mm on synthetic faces and detected 1414 anthropometric landmarks on unseen real faces from the FRGCv2 database with 3mm precision. Furthermore, our deformable model fitting algorithm achieved 98.5% face recognition accuracy on the FRGCv2 and 98.6% on Bosphorus database. Our dense model is also able to generalize to unseen datasets.Comment: 24 Pages, 12 Figures, 6 Tables and 3 Algorithm

    Towards Robust Visual Localization in Challenging Conditions

    Get PDF
    Visual localization is a fundamental problem in computer vision, with a multitude of applications in robotics, augmented reality and structure-from-motion. The basic problem is to, based on one or more images, figure out the position and orientation of the camera which captured these images relative to some model of the environment. Current visual localization approaches typically work well when the images to be localized are captured under similar conditions compared to those captured during mapping. However, when the environment exhibits large changes in visual appearance, due to e.g. variations in weather, seasons, day-night or viewpoint, the traditional pipelines break down. The reason is that the local image features used are based on low-level pixel-intensity information, which is not invariant to these transformations: when the environment changes, this will cause a different set of keypoints to be detected, and their descriptors will be different, making the long-term visual localization problem a challenging one. In this thesis, five papers are included, which present work towards solving the problem of long-term visual localization. Two of the articles present ideas for how semantic information may be included to aid in the localization process: one approach relies only on the semantic information for visual localization, and the other shows how the semantics can be used to detect outlier feature correspondences. The third paper considers how the output from a monocular depth-estimation network can be utilized to extract features that are less sensitive to viewpoint changes. The fourth article is a benchmark paper, where we present three new benchmark datasets aimed at evaluating localization algorithms in the context of long-term visual localization. Lastly, the fifth article considers how to perform convolutions on spherical imagery, which in the future might be applied to learning local image features for the localization problem

    Shape Retrieval Methods for Architectural 3D Models

    Get PDF
    This thesis introduces new methods for content-based retrieval of architecture-related 3D models. We thereby consider two different overall types of architectural 3D models. The first type consists of context objects that are used for detailed design and decoration of 3D building model drafts. This includes e.g. furnishing for interior design or barriers and fences for forming the exterior environment. The second type consists of actual building models. To enable efficient content-based retrieval for both model types that is tailored to the user requirements of the architectural domain, type-specific algorithms must be developed. On the one hand, context objects like furnishing that provide similar functions (e.g. seating furniture) often share a similar shape. Nevertheless they might be considered to belong to different object classes from an architectural point of view (e.g. armchair, elbow chair, swivel chair). The differentiation is due to small geometric details and is sometimes only obvious to an expert from the domain. Building models on the other hand are often distinguished according to the underlying floor- and room plans. Topological floor plan properties for example serve as a starting point for telling apart residential and commercial buildings. The first contribution of this thesis is a new meta descriptor for 3D retrieval that combines different types of local shape descriptors using a supervised learning approach. The approach enables the differentiation of object classes according to small geometric details and at the same time integrates expert knowledge from the field of architecture. We evaluate our approach using a database containing arbitrary 3D models as well as on one that only consists of models from the architectural domain. We then further extend our approach by adding a sophisticated shape descriptor localization strategy. Additionally, we exploit knowledge about the spatial relationship of object components to further enhance the retrieval performance. In the second part of the thesis we introduce attributed room connectivity graphs (RCGs) as a means to characterize a 3D building model according to the structure of its underlying floor plans. We first describe how RCGs are inferred from a given building model and discuss how substructures of this graph can be queried efficiently. We then introduce a new descriptor denoted as Bag-of-Attributed-Subgraphs that transforms attributed graphs into a vector-based representation using subgraph embeddings. We finally evaluate the retrieval performance of this new method on a database consisting of building models with different floor plan types. All methods presented in this thesis are aimed at an as automated as possible workflow for indexing and retrieval such that only minimum human interaction is required. Accordingly, only polygon soups are required as inputs which do not need to be manually repaired or structured. Human effort is only needed for offline groundtruth generation to enable supervised learning and for providing information about the orientation of building models and the unit of measurement used for modeling

    Place and Object Recognition for Real-time Visual Mapping

    Get PDF
    Este trabajo aborda dos de las principales dificultades presentes en los sistemas actuales de localización y creación de mapas de forma simultánea (del inglés Simultaneous Localization And Mapping, SLAM): el reconocimiento de lugares ya visitados para cerrar bucles en la trajectoria y crear mapas precisos, y el reconocimiento de objetos para enriquecer los mapas con estructuras de alto nivel y mejorar la interación entre robots y personas. En SLAM visual, las características que se extraen de las imágenes de una secuencia de vídeo se van acumulando con el tiempo, haciendo más laboriosos dos de los aspectos de la detección de bucles: la eliminación de los bucles incorrectos que se detectan entre lugares que tienen una apariencia muy similar, y conseguir un tiempo de ejecución bajo y factible en trayectorias largas. En este trabajo proponemos una técnica basada en vocabularios visuales y en bolsas de palabras para detectar bucles de manera robusta y eficiente, centrándonos en dos ideas principales: 1) aprovechar el origen secuencial de las imágenes de vídeo, y 2) hacer que todo el proceso pueda funcionar a frecuencia de vídeo. Para beneficiarnos del origen secuencial de las imágenes, presentamos una métrica de similaridad normalizada para medir el parecido entre imágenes e incrementar la distintividad de las detecciones correctas. A su vez, agrupamos los emparejamientos de imágenes candidatas a ser bucle para evitar que éstas compitan cuando realmente fueron tomadas desde el mismo lugar. Finalmente, incorporamos una restricción temporal para comprobar la coherencia entre detecciones consecutivas. La eficiencia se logra utilizando índices inversos y directos y características binarias. Un índice inverso acelera la comparación entre imágenes de lugares, y un índice directo, el cálculo de correspondencias de puntos entre éstas. Por primera vez, en este trabajo se han utilizado características binarias para detectar bucles, dando lugar a una solución viable incluso hasta para decenas de miles de imágenes. Los bucles se verifican comprobando la coherencia de la geometría de las escenas emparejadas. Para ello utilizamos varios métodos robustos que funcionan tanto con una como con múltiples cámaras. Presentamos resultados competitivos y sin falsos positivos en distintas secuencias, con imágenes adquiridas tanto a alta como a baja frecuencia, con cámaras frontales y laterales, y utilizando el mismo vocabulario y la misma configuración. Con descriptores binarios, el sistema completo requiere 22 milisegundos por imagen en una secuencia de 26.300 imágenes, resultando un orden de magnitud más rápido que otras técnicas actuales. Se puede utilizar un algoritmo similar al de reconocimiento de lugares para resolver el reconocimiento de objetos en SLAM visual. Detectar objetos en este contexto es particularmente complicado debido a que las distintas ubicaciones, posiciones y tamaños en los que se puede ver un objeto en una imagen son potencialmente infinitos, por lo que suelen ser difíciles de distinguir. Además, esta complejidad se multiplica cuando la comparación ha de hacerse contra varios objetos 3D. Nuestro esfuerzo en este trabajo está orientado a: 1) construir el primer sistema de SLAM visual que puede colocar objectos 3D reales en el mapa, y 2) abordar los problemas de escalabilidad resultantes al tratar con múltiples objetos y vistas de éstos. En este trabajo, presentamos el primer sistema de SLAM monocular que reconoce objetos 3D, los inserta en el mapa y refina su posición en el espacio 3D a medida que el mapa se va construyendo, incluso cuando los objetos dejan de estar en el campo de visión de la cámara. Esto se logra en tiempo real con modelos de objetos compuestos por información tridimensional y múltiples imágenes representando varios puntos de vista del objeto. Después nos centramos en la escalabilidad de la etapa del reconocimiento de los objetos 3D. Presentamos una técnica rápida para segmentar imágenes en regiones de interés para detectar objetos pequeños o lejanos. Tras ello, proponemos sustituir el modelo de objetos de vistas independientes por un modelado con una única bolsa de palabras de características binarias asociadas a puntos 3D. Creamos también una base de datos que incorpora índices inversos y directos para aprovechar sus ventajas a la hora de recuperar rápidamente tanto objetos candidatos a ser detectados como correspondencias de puntos, tal y como hacían en el caso de la detección de bucles. Los resultados experimentales muestran que nuestro sistema funciona en tiempo real en un entorno de escritorio con cámara en mano y en una habitación con una cámara montada sobre un robot autónomo. Las mejoras en el proceso de reconocimiento obtienen resultados satisfactorios, sin detecciones erróneas y con un tiempo de ejecución medio de 28 milisegundos por imagen con una base de datos de 20 objetos 3D

    NON-LINEAR AND SPARSE REPRESENTATIONS FOR MULTI-MODAL RECOGNITION

    Get PDF
    In the first part of this dissertation, we address the problem of representing 2D and 3D shapes. In particular, we introduce a novel implicit shape representation based on Support Vector Machine (SVM) theory. Each shape is represented by an analytic decision function obtained by training an SVM, with a Radial Basis Function (RBF) kernel, so that the interior shape points are given higher values. This empowers support vector shape (SVS) with multifold advantages. First, the representation uses a sparse subset of feature points determined by the support vectors, which significantly improves the discriminative power against noise, fragmentation and other artifacts that often come with the data. Second, the use of the RBF kernel provides scale, rotation, and translation invariant features, and allows a shape to be represented accurately regardless of its complexity. Finally, the decision function can be used to select reliable feature points. These features are described using gradients computed from highly consistent decision functions instead of conventional edges. Our experiments on 2D and 3D shapes demonstrate promising results. The availability of inexpensive 3D sensors like Kinect necessitates the design of new representation for this type of data. We present a 3D feature descriptor that represents local topologies within a set of folded concentric rings by distances from local points to a projection plane. This feature, called as Concentric Ring Signature (CORS), possesses similar computational advantages to point signatures yet provides more accurate matches. CORS produces compact and discriminative descriptors, which makes it more robust to noise and occlusions. It is also well-known to computer vision researchers that there is no universal representation that is optimal for all types of data or tasks. Sparsity has proved to be a good criterion for working with natural images. This motivates us to develop efficient sparse and non-linear learning techniques for automatically extracting useful information from visual data. Specifically, we present dictionary learning methods for sparse and redundant representations in a high-dimensional feature space. Using the kernel method, we describe how the well-known dictionary learning approaches such as the method of optimal directions and KSVD can be made non-linear. We analyse their kernel constructions and demonstrate their effectiveness through several experiments on classification problems. It is shown that non-linear dictionary learning approaches can provide significantly better discrimination compared to their linear counterparts and kernel PCA, especially when the data is corrupted by different types of degradations. Visual descriptors are often high dimensional. This results in high computational complexity for sparse learning algorithms. Motivated by this observation, we introduce a novel framework, called sparse embedding (SE), for simultaneous dimensionality reduction and dictionary learning. We formulate an optimization problem for learning a transformation from the original signal domain to a lower-dimensional one in a way that preserves the sparse structure of data. We propose an efficient optimization algorithm and present its non-linear extension based on the kernel methods. One of the key features of our method is that it is computationally efficient as the learning is done in the lower-dimensional space and it discards the irrelevant part of the signal that derails the dictionary learning process. Various experiments show that our method is able to capture the meaningful structure of data and can perform significantly better than many competitive algorithms on signal recovery and object classification tasks. In many practical applications, we are often confronted with the situation where the data that we use to train our models are different from that presented during the testing. In the final part of this dissertation, we present a novel framework for domain adaptation using a sparse and hierarchical network (DASH-N), which makes use of the old data to improve the performance of a system operating on a new domain. Our network jointly learns a hierarchy of features together with transformations that rectify the mismatch between different domains. The building block of DASH-N is the latent sparse representation. It employs a dimensionality reduction step that can prevent the data dimension from increasing too fast as traversing deeper into the hierarchy. Experimental results show that our method consistently outperforms the current state-of-the-art by a significant margin. Moreover, we found that a multi-layer {DASH-N} has an edge over the single-layer DASH-N

    Similarity reasoning for local surface analysis and recognition

    Get PDF
    This thesis addresses the similarity assessment of digital shapes, contributing to the analysis of surface characteristics that are independent of the global shape but are crucial to identify a model as belonging to the same manufacture, the same origin/culture or the same typology (color, common decorations, common feature elements, compatible style elements, etc.). To face this problem, the interpretation of the local surface properties is crucial. We go beyond the retrieval of models or surface patches in a collection of models, facing the recognition of geometric patterns across digital models with different overall shape. To address this challenging problem, the use of both engineered and learning-based descriptions are investigated, building one of the first contributions towards the localization and identification of geometric patterns on digital surfaces. Finally, the recognition of patterns adds a further perspective in the exploration of (large) 3D data collections, especially in the cultural heritage domain. Our work contributes to the definition of methods able to locally characterize the geometric and colorimetric surface decorations. Moreover, we showcase our benchmarking activity carried out in recent years on the identification of geometric features and the retrieval of digital models completely characterized by geometric or colorimetric patterns

    Localization, Mapping and SLAM in Marine and Underwater Environments

    Get PDF
    The use of robots in marine and underwater applications is growing rapidly. These applications share the common requirement of modeling the environment and estimating the robots’ pose. Although there are several mapping, SLAM, target detection and localization methods, marine and underwater environments have several challenging characteristics, such as poor visibility, water currents, communication issues, sonar inaccuracies or unstructured environments, that have to be considered. The purpose of this Special Issue is to present the current research trends in the topics of underwater localization, mapping, SLAM, and target detection and localization. To this end, we have collected seven articles from leading researchers in the field, and present the different approaches and methods currently being investigated to improve the performance of underwater robots
    • …
    corecore