Search CORE

107 research outputs found

Biologically motivated keypoint detection for RGB-D data

Author: Filipe Sílvio Brás
Publication venue
Publication date: 01/11/2016
Field of study

With the emerging interest in active vision, computer vision researchers have been increasingly concerned with the mechanisms of attention. Therefore, several visual attention computational models inspired by the human visual system, have been developed, aiming at the detection of regions of interest in images. This thesis is focused on selective visual attention, which provides a mechanism for the brain to focus computational resources on an object at a time, guided by low-level image properties (Bottom-Up attention). The task of recognizing objects in different locations is achieved by focusing on different locations, one at a time. Given the computational requirements of the models proposed, the research in this area has been mainly of theoretical interest. More recently, psychologists, neurobiologists and engineers have developed cooperation's and this has resulted in considerable benefits. The first objective of this doctoral work is to bring together concepts and ideas from these different research areas, providing a study of the biological research on human visual system and a discussion of the interdisciplinary knowledge in this area, as well as the state-of-art on computational models of visual attention (bottom-up). Normally, the visual attention is referred by engineers as saliency: when people fix their look in a particular region of the image, that's because that region is salient. In this research work, saliency methods are presented based on their classification (biological plausible, computational or hybrid) and in a chronological order. A few salient structures can be used for applications like object registration, retrieval or data simplification, being possible to consider these few salient structures as keypoints when aiming at performing object recognition. Generally, object recognition algorithms use a large number of descriptors extracted in a dense set of points, which comes along with very high computational cost, preventing real-time processing. To avoid the problem of the computational complexity required, the features have to be extracted from a small set of points, usually called keypoints. The use of keypoint-based detectors allows the reduction of the processing time and the redundancy in the data. Local descriptors extracted from images have been extensively reported in the computer vision literature. Since there is a large set of keypoint detectors, this suggests the need of a comparative evaluation between them. In this way, we propose to do a description of 2D and 3D keypoint detectors, 3D descriptors and an evaluation of existing 3D keypoint detectors in a public available point cloud library with 3D real objects. The invariance of the 3D keypoint detectors was evaluated according to rotations, scale changes and translations. This evaluation reports the robustness of a particular detector for changes of point-of-view and the criteria used are the absolute and the relative repeatability rate. In our experiments, the method that achieved better repeatability rate was the ISS3D method. The analysis of the human visual system and saliency maps detectors with biological inspiration led to the idea of making an extension for a keypoint detector based on the color information in the retina. Such proposal produced a 2D keypoint detector inspired by the behavior of the early visual system. Our method is a color extension of the BIMP keypoint detector, where we include both color and intensity channels of an image: color information is included in a biological plausible way and multi-scale image features are combined into a single keypoints map. This detector is compared against state-of-art detectors and found particularly well-suited for tasks such as category and object recognition. The recognition process is performed by comparing the extracted 3D descriptors in the locations indicated by the keypoints after mapping the 2D keypoints locations to the 3D space. The evaluation allowed us to obtain the best pair keypoint detector/descriptor on a RGB-D object dataset. Using our keypoint detector and the SHOTCOLOR descriptor a good category recognition rate and object recognition rate were obtained, and it is with the PFHRGB descriptor that we obtain the best results. A 3D recognition system involves the choice of keypoint detector and descriptor. A new method for the detection of 3D keypoints on point clouds is presented and a benchmarking is performed between each pair of 3D keypoint detector and 3D descriptor to evaluate their performance on object and category recognition. These evaluations are done in a public database of real 3D objects. Our keypoint detector is inspired by the behavior and neural architecture of the primate visual system: the 3D keypoints are extracted based on a bottom-up 3D saliency map, which is a map that encodes the saliency of objects in the visual environment. The saliency map is determined by computing conspicuity maps (a combination across different modalities) of the orientation, intensity and color information, in a bottom-up and in a purely stimulusdriven manner. These three conspicuity maps are fused into a 3D saliency map and, finally, the focus of attention (or "keypoint location") is sequentially directed to the most salient points in this map. Inhibiting this location automatically allows the system to attend to the next most salient location. The main conclusions are: with a similar average number of keypoints, our 3D keypoint detector outperforms the other eight 3D keypoint detectors evaluated by achiving the best result in 32 of the evaluated metrics in the category and object recognition experiments, when the second best detector only obtained the best result in 8 of these metrics. The unique drawback is the computational time, since BIK-BUS is slower than the other detectors. Given that differences are big in terms of recognition performance, size and time requirements, the selection of the keypoint detector and descriptor has to be matched to the desired task and we give some directions to facilitate this choice. After proposing the 3D keypoint detector, the research focused on a robust detection and tracking method for 3D objects by using keypoint information in a particle filter. This method consists of three distinct steps: Segmentation, Tracking Initialization and Tracking. The segmentation is made to remove all the background information, reducing the number of points for further processing. In the initialization, we use a keypoint detector with biological inspiration. The information of the object that we want to follow is given by the extracted keypoints. The particle filter does the tracking of the keypoints, so with that we can predict where the keypoints will be in the next frame. In a recognition system, one of the problems is the computational cost of keypoint detectors with this we intend to solve this problem. The experiments with PFBIKTracking method are done indoors in an office/home environment, where personal robots are expected to operate. The Tracking Error evaluates the stability of the general tracking method. We also quantitatively evaluate this method using a "Tracking Error". Our evaluation is done by the computation of the keypoint and particle centroid. Comparing our system that the tracking method which exists in the Point Cloud Library, we archive better results, with a much smaller number of points and computational time. Our method is faster and more robust to occlusion when compared to the OpenniTracker.Com o interesse emergente na visão ativa, os investigadores de visão computacional têm estado cada vez mais preocupados com os mecanismos de atenção. Por isso, uma série de modelos computacionais de atenção visual, inspirado no sistema visual humano, têm sido desenvolvidos. Esses modelos têm como objetivo detetar regiões de interesse nas imagens. Esta tese está focada na atenção visual seletiva, que fornece um mecanismo para que o cérebro concentre os recursos computacionais num objeto de cada vez, guiado pelas propriedades de baixo nível da imagem (atenção Bottom-Up). A tarefa de reconhecimento de objetos em diferentes locais é conseguida através da concentração em diferentes locais, um de cada vez. Dados os requisitos computacionais dos modelos propostos, a investigação nesta área tem sido principalmente de interesse teórico. Mais recentemente, psicólogos, neurobiólogos e engenheiros desenvolveram cooperações e isso resultou em benefícios consideráveis. No início deste trabalho, o objetivo é reunir os conceitos e ideias a partir dessas diferentes áreas de investigação. Desta forma, é fornecido o estudo sobre a investigação da biologia do sistema visual humano e uma discussão sobre o conhecimento interdisciplinar da matéria, bem como um estado de arte dos modelos computacionais de atenção visual (bottom-up). Normalmente, a atenção visual é denominada pelos engenheiros como saliência, se as pessoas fixam o olhar numa determinada região da imagem é porque esta região é saliente. Neste trabalho de investigação, os métodos saliência são apresentados em função da sua classificação (biologicamente plausível, computacional ou híbrido) e numa ordem cronológica. Algumas estruturas salientes podem ser usadas, em vez do objeto todo, em aplicações tais como registo de objetos, recuperação ou simplificação de dados. É possível considerar estas poucas estruturas salientes como pontos-chave, com o objetivo de executar o reconhecimento de objetos. De um modo geral, os algoritmos de reconhecimento de objetos utilizam um grande número de descritores extraídos num denso conjunto de pontos. Com isso, estes têm um custo computacional muito elevado, impedindo que o processamento seja realizado em tempo real. A fim de evitar o problema da complexidade computacional requerido, as características devem ser extraídas a partir de um pequeno conjunto de pontos, geralmente chamados pontoschave. O uso de detetores de pontos-chave permite a redução do tempo de processamento e a quantidade de redundância dos dados. Os descritores locais extraídos a partir das imagens têm sido amplamente reportados na literatura de visão por computador. Uma vez que existe um grande conjunto de detetores de pontos-chave, sugere a necessidade de uma avaliação comparativa entre eles. Desta forma, propomos a fazer uma descrição dos detetores de pontos-chave 2D e 3D, dos descritores 3D e uma avaliação dos detetores de pontos-chave 3D existentes numa biblioteca de pública disponível e com objetos 3D reais. A invariância dos detetores de pontoschave 3D foi avaliada de acordo com variações nas rotações, mudanças de escala e translações. Essa avaliação retrata a robustez de um determinado detetor no que diz respeito às mudanças de ponto-de-vista e os critérios utilizados são as taxas de repetibilidade absoluta e relativa. Nas experiências realizadas, o método que apresentou melhor taxa de repetibilidade foi o método ISS3D. Com a análise do sistema visual humano e dos detetores de mapas de saliência com inspiração biológica, surgiu a ideia de se fazer uma extensão para um detetor de ponto-chave com base na informação de cor na retina. A proposta produziu um detetor de ponto-chave 2D inspirado pelo comportamento do sistema visual. O nosso método é uma extensão com base na cor do detetor de ponto-chave BIMP, onde se incluem os canais de cor e de intensidade de uma imagem. A informação de cor é incluída de forma biológica plausível e as características multi-escala da imagem são combinadas num único mapas de pontos-chave. Este detetor é comparado com os detetores de estado-da-arte e é particularmente adequado para tarefas como o reconhecimento de categorias e de objetos. O processo de reconhecimento é realizado comparando os descritores 3D extraídos nos locais indicados pelos pontos-chave. Para isso, as localizações do pontos-chave 2D têm de ser convertido para o espaço 3D. Isto foi possível porque o conjunto de dados usado contém a localização de cada ponto de no espaço 2D e 3D. A avaliação permitiu-nos obter o melhor par detetor de ponto-chave/descritor num RGB-D object dataset. Usando o nosso detetor de ponto-chave e o descritor SHOTCOLOR, obtemos uma noa taxa de reconhecimento de categorias e para o reconhecimento de objetos é com o descritor PFHRGB que obtemos os melhores resultados. Um sistema de reconhecimento 3D envolve a escolha de detetor de ponto-chave e descritor, por isso é apresentado um novo método para a deteção de pontos-chave em nuvens de pontos 3D e uma análise comparativa é realizada entre cada par de detetor de ponto-chave 3D e descritor 3D para avaliar o desempenho no reconhecimento de categorias e de objetos. Estas avaliações são feitas numa base de dados pública de objetos 3D reais. O nosso detetor de ponto-chave é inspirado no comportamento e na arquitetura neural do sistema visual dos primatas. Os pontos-chave 3D são extraídas com base num mapa de saliências 3D bottom-up, ou seja, um mapa que codifica a saliência dos objetos no ambiente visual. O mapa de saliência é determinada pelo cálculo dos mapas de conspicuidade (uma combinação entre diferentes modalidades) da orientação, intensidade e informações de cor de forma bottom-up e puramente orientada para o estímulo. Estes três mapas de conspicuidade são fundidos num mapa de saliência 3D e, finalmente, o foco de atenção (ou "localização do ponto-chave") está sequencialmente direcionado para os pontos mais salientes deste mapa. Inibir este local permite que o sistema automaticamente orientado para próximo local mais saliente. As principais conclusões são: com um número médio similar de pontos-chave, o nosso detetor de ponto-chave 3D supera os outros oito detetores de pontos-chave 3D avaliados, obtendo o melhor resultado em 32 das métricas avaliadas nas experiências do reconhecimento das categorias e dos objetos, quando o segundo melhor detetor obteve apenas o melhor resultado em 8 dessas métricas. A única desvantagem é o tempo computacional, uma vez que BIK-BUS é mais lento do que os outros detetores. Dado que existem grandes diferenças em termos de desempenho no reconhecimento, de tamanho e de tempo, a seleção do detetor de ponto-chave e descritor tem de ser interligada com a tarefa desejada e nós damos algumas orientações para facilitar esta escolha neste trabalho de investigação. Depois de propor um detetor de ponto-chave 3D, a investigação incidiu sobre um método robusto de deteção e tracking de objetos 3D usando as informações dos pontos-chave num filtro de partículas. Este método consiste em três etapas distintas: Segmentação, Inicialização do Tracking e Tracking. A segmentação é feita de modo a remover toda a informação de fundo, a fim de reduzir o número de pontos para processamento futuro. Na inicialização, usamos um detetor de ponto-chave com inspiração biológica. A informação do objeto que queremos seguir é dada pelos pontos-chave extraídos. O filtro de partículas faz o acompanhamento dos pontoschave, de modo a se poder prever onde os pontos-chave estarão no próximo frame. As experiências com método PFBIK-Tracking são feitas no interior, num ambiente de escritório/casa, onde se espera que robôs pessoais possam operar. Também avaliado quantitativamente este método utilizando um "Tracking Error". A avaliação passa pelo cálculo das centróides dos pontos-chave e das partículas. Comparando o nosso sistema com o método de tracking que existe na biblioteca usada no desenvolvimento, nós obtemos melhores resultados, com um número muito menor de pontos e custo computacional. O nosso método é mais rápido e mais robusto em termos de oclusão, quando comparado com o OpenniTracker

UBibliorum repositorio digital da ubi

Vision-Based 2D and 3D Human Activity Recognition

Author: Holte Michael Boelstoft
Publication venue: Department of Architecture, Design & Media Technology, Aalborg University
Publication date: 01/01/2012
Field of study

VBN

Automatic Alignment of 3D Multi-Sensor Point Clouds

Author: Persad Ravi Ancil
Publication venue
Publication date: 01/03/2018
Field of study

Automatic 3D point cloud alignment is a major research topic in photogrammetry, computer vision and computer graphics. In this research, two keypoint feature matching approaches have been developed and proposed for the automatic alignment of 3D point clouds, which have been acquired from different sensor platforms and are in different 3D conformal coordinate systems. The first proposed approach is based on 3D keypoint feature matching. First, surface curvature information is utilized for scale-invariant 3D keypoint extraction. Adaptive non-maxima suppression (ANMS) is then applied to retain the most distinct and well-distributed set of keypoints. Afterwards, every keypoint is characterized by a scale, rotation and translation invariant 3D surface descriptor, called the radial geodesic distance-slope histogram. Similar keypoints descriptors on the source and target datasets are then matched using bipartite graph matching, followed by a modified-RANSAC for outlier removal. The second proposed method is based on 2D keypoint matching performed on height map images of the 3D point clouds. Height map images are generated by projecting the 3D point clouds onto a planimetric plane. Afterwards, a multi-scale wavelet 2D keypoint detector with ANMS is proposed to extract keypoints on the height maps. Then, a scale, rotation and translation-invariant 2D descriptor referred to as the Gabor, Log-Polar-Rapid Transform descriptor is computed for all keypoints. Finally, source and target height map keypoint correspondences are determined using a bi-directional nearest neighbour matching, together with the modified-RANSAC for outlier removal. Each method is assessed on multi-sensor, urban and non-urban 3D point cloud datasets. Results show that unlike the 3D-based method, the height map-based approach is able to align source and target datasets with differences in point density, point distribution and missing point data. Findings also show that the 3D-based method obtained lower transformation errors and a greater number of correspondences when the source and target have similar point characteristics. The 3D-based approach attained absolute mean alignment differences in the range of 0.23m to 2.81m, whereas the height map approach had a range from 0.17m to 1.21m. These differences meet the proximity requirements of the data characteristics and the further application of fine co-registration approaches

YorkSpace

Stereo Visual SLAM for Mobile Robots Navigation

Author: Moreno Dueñas Francisco Ángel
Publication venue: Servicio de Publicaciones y Divulgación Científica
Publication date: 01/01/2015
Field of study

Esta tesis está enfocada a la combinación de los campos de la robótica móvil y la visión por computador, con el objetivo de desarrollar métodos que permitan a un robot móvil localizarse dentro de su entorno mientras construye un mapa del mismo, utilizando como única entrada un conjunto de imágenes. Este problema se denomina SLAM visual (por las siglas en inglés de "Simultaneous Localization And Mapping") y es un tema que aún continúa abierto a pesar del gran esfuerzo investigador realizado en los últimos años. En concreto, en esta tesis utilizamos cámaras estéreo para capturar, simultáneamente, dos imágenes desde posiciones ligeramente diferentes, proporcionando así información 3D de forma directa. De entre los problemas de localización de robots, en esta tesis abordamos dos de ellos: el seguimiento de robots y la localización y mapeado simultáneo (o SLAM). El primero de ellos no tiene en cuenta el mapa del entorno sino que calcula la trayectoria del robot mediante la composición incremental de las estimaciones de su movimiento entre instantes de tiempo consecutivos. Cuando se usan imágenes para calcular esta trayectoria, el problema toma el nombre de "odometría visual", y su resolución es más sencilla que la del SLAM visual. De hecho, a menudo se integra como parte de un sistema de SLAM completo. Esta tesis contribuye con la propuesta de dos sistemas de odometría visual. Uno de ellos está basado en un solución cerrada y eficiente mientras que el otro está basado en un proceso de optimización no-lineal que implementa un nuevo método de detección y eliminación rápida de espurios. Los métodos de SLAM, por su parte, también abordan la construcción de un mapa del entorno con el objetivo de mejorar sensiblemente la localización del robot, evitando de esta forma la acumulación de error en la que incurre la odometría visual. Además, el mapa construido puede ser empleado para hacer frente a situaciones exigentes como la recuperación de la localización tras la pérdida del robot o realizar localización global. En esta tesis se presentan dos sistemas completos de SLAM visual. Uno de ellos se ha implementado dentro del marco de los filtros probabilísticos no parámetricos, mientras que el otro está basado en un método nuevo de "bundle adjustment" relativo que ha sido integrado con algunas técnicas recientes de visión por computador. Otra contribución de esta tesis es la publicación de dos colecciones de datos que contienen imágenes estéreo capturadas en entornos urbanos sin modificar, así como una estimación del camino real del robot basada en GPS (denominada "ground truth"). Estas colecciones sirven como banco de pruebas para validar métodos de odometría y SLAM visual

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Málaga

A Fast Modal Space Transform for Robust Nonrigid Shape Retrieval

Author: Ye J
Yu Y
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Nonrigid or deformable 3D objects are common in many application domains. Retrieval of such objects in large databases based on shape similarity is still a challenging problem. In this paper, we take advantages of functional operators as characterizations of shape deformation, and further propose a framework to design novel shape signatures for encoding nonrigid geometries. Our approach constructs a context-aware integral kernel operator on a manifold, then applies modal analysis to map this operator into a low-frequency functional representation, called fast functional transform, and finally computes its spectrum as the shape signature. In a nutshell, our method is fast, isometry-invariant, discriminative, smooth and numerically stable with respect to multiple types of perturbations. Experimental results demonstrate that our new shape signature for nonrigid objects can outperform all methods participating in the nonrigid track of the SHREC’11 contest. It is also the second best performing method in the real human model track of SHREC’14.postprin

HKU Scholars Hub

Investigating human-perceptual properties of "shapes" using 3D shapes and 2D fonts

Author: Power Luther
Publication venue: Lancaster University
Publication date: 01/01/2020
Field of study

Shapes are generally used to convey meaning. They are used in video games, films and other multimedia, in diverse ways. 3D shapes may be destined for virtual scenes or represent objects to be constructed in the real-world. Fonts add character to an otherwise plain block of text, allowing the writer to make important points more visually prominent or distinct from other text. They can indicate the structure of a document, at a glance. Rather than studying shapes through traditional geometric shape descriptors, we provide alternative methods to describe and analyse shapes, from a lens of human perception. This is done via the concepts of Schelling Points and Image Specificity. Schelling Points are choices people make when they aim to match with what they expect others to choose but cannot communicate with others to determine an answer. We study whole mesh selections in this setting, where Schelling Meshes are the most frequently selected shapes. The key idea behind image Specificity is that different images evoke different descriptions; but ‘Specific’ images yield more consistent descriptions than others. We apply Specificity to 2D fonts. We show that each concept can be learned and predict them for fonts and 3D shapes, respectively, using a depth image-based convolutional neural network. Results are shown for a range of fonts and 3D shapes and we demonstrate that font Specificity and the Schelling meshes concept are useful for visualisation, clustering, and search applications. Overall, we find that each concept represents similarities between their respective type of shape, even when there are discontinuities between the shape geometries themselves. The ‘context’ of these similarities is in some kind of abstract or subjective meaning which is consistent among different people

Lancaster E-Prints

Irish Machine Vision and Image Processing Conference Proceedings 2017

Author
Publication venue: Irish Pattern Recognition & Classification Society
Publication date: 30/08/2017
Field of study

MURAL - Maynooth University Research Archive Library

Analysis of 3D objects at multiple scales (application to shape matching)

Author: MELLADO Nicolas
REUTER Patrick
SCHLICK Christophe
Publication venue
Publication date: 01/01/2012
Field of study

Depuis quelques années, l évolution des techniques d acquisition a entraîné une généralisation de l utilisation d objets 3D très dense, représentés par des nuages de points de plusieurs millions de sommets. Au vu de la complexité de ces données, il est souvent nécessaire de les analyser pour en extraire les structures les plus pertinentes, potentiellement définies à plusieurs échelles. Parmi les nombreuses méthodes traditionnellement utilisées pour analyser des signaux numériques, l analyse dite scale-space est aujourd hui un standard pour l étude des courbes et des images. Cependant, son adaptation aux données 3D pose des problèmes d instabilité et nécessite une information de connectivité, qui n est pas directement définie dans les cas des nuages de points. Dans cette thèse, nous présentons une suite d outils mathématiques pour l analyse des objets 3D, sous le nom de Growing Least Squares (GLS). Nous proposons de représenter la géométrie décrite par un nuage de points via une primitive du second ordre ajustée par une minimisation aux moindres carrés, et cela à pour plusieurs échelles. Cette description est ensuite derivée analytiquement pour extraire de manière continue les structures les plus pertinentes à la fois en espace et en échelle. Nous montrons par plusieurs exemples et comparaisons que cette représentation et les outils associés définissent une solution efficace pour l analyse des nuages de points à plusieurs échelles. Un défi intéressant est l analyse d objets 3D acquis dans le cadre de l étude du patrimoine culturel. Dans cette thèse, nous nous étudions les données générées par l acquisition des fragments des statues entourant par le passé le Phare d Alexandrie, Septième Merveille du Monde. Plus précisément, nous nous intéressons au réassemblage d objets fracturés en peu de fragments (une dizaine), mais avec de nombreuses parties manquantes ou fortement dégradées par l action du temps. Nous proposons un formalisme pour la conception de systèmes d assemblage virtuel semi-automatiques, permettant de combiner à la fois les connaissances des archéologues et la précision des algorithmes d assemblage. Nous présentons deux systèmes basés sur cette conception, et nous montrons leur efficacité dans des cas concrets.Over the last decades, the evolution of acquisition techniques yields the generalization of detailed 3D objects, represented as huge point sets composed of millions of vertices. The complexity of the involved data often requires to analyze them for the extraction and characterization of pertinent structures, which are potentially defined at multiple scales. Amongthe wide variety of methods proposed to analyze digital signals, the scale-space analysis istoday a standard for the study of 2D curves and images. However, its adaptation to 3D dataleads to instabilities and requires connectivity information, which is not directly availablewhen dealing with point sets.In this thesis, we present a new multi-scale analysis framework that we call the GrowingLeast Squares (GLS). It consists of a robust local geometric descriptor that can be evaluatedon point sets at multiple scales using an efficient second-order fitting procedure. We proposeto analytically differentiate this descriptor to extract continuously the pertinent structuresin scale-space. We show that this representation and the associated toolbox define an effi-cient way to analyze 3D objects represented as point sets at multiple scales. To this end, we demonstrate its relevance in various application scenarios.A challenging application is the analysis of acquired 3D objects coming from the CulturalHeritage field. In this thesis, we study a real-world dataset composed of the fragments ofthe statues that were surrounding the legendary Alexandria Lighthouse. In particular, wefocus on the problem of fractured object reassembly, consisting of few fragments (up to aboutten), but with missing parts due to erosion or deterioration. We propose a semi-automaticformalism to combine both the archaeologist s knowledge and the accuracy of geometricmatching algorithms during the reassembly process. We use it to design two systems, andwe show their efficiency in concrete cases.BORDEAUX1-Bib.electronique (335229901) / SudocSudocFranceF

OpenGrey Repository

Analysis of 3D objects at multiple scales (application to shape matching)

Author: Mato S. Y.
Mi J. X.
Zhang H.
Zhao J. T.
Zhou X.
章慧
Publication venue
Publication date: 01/01/2004
Field of study

OpenGrey Repository

MPG.PuRe

Xiamen University Institutional Repository