988 research outputs found

    Data-Driven Shape Analysis and Processing

    Full text link
    Data-driven methods play an increasingly important role in discovering geometric, structural, and semantic relationships between 3D shapes in collections, and applying this analysis to support intelligent modeling, editing, and visualization of geometric data. In contrast to traditional approaches, a key feature of data-driven approaches is that they aggregate information from a collection of shapes to improve the analysis and processing of individual shapes. In addition, they are able to learn models that reason about properties and relationships of shapes without relying on hard-coded rules or explicitly programmed instructions. We provide an overview of the main concepts and components of these techniques, and discuss their application to shape classification, segmentation, matching, reconstruction, modeling and exploration, as well as scene analysis and synthesis, through reviewing the literature and relating the existing works with both qualitative and numerical comparisons. We conclude our report with ideas that can inspire future research in data-driven shape analysis and processing.Comment: 10 pages, 19 figure

    Event-based Vision: A Survey

    Get PDF
    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world

    Indoor localization and navigation for blind persons using visual landmarks and a GIS

    Get PDF
    In an unfamiliar environment we spot and explore all available information which might guide us to a desired location. This largely unconscious processing is done by our trained sensory a nd cognitive systems. These recognize and memorize sets of landmarks which allow us to create a mental map of the envi ronment, and this map enables us to navigate by exploiting very few but the most important landmarks stored in our memory. We present a system which integrates a geographic information system of a building with visu al landmarks for localizing the user in the building and for tracing and validating a route for the user's navigation. Hence, the developed system complements the white cane for improving the user's autonomy during indoor navigation. Although de signed for visually impaired persons, the system can be used by any person for wayfinding in a complex building

    In-hand object detection and tracking using 2D and 3D information

    Get PDF
    As robots are introduced increasingly in human-inhabited areas, they would need a perception system able to detect the actions the humans around it are performing. This information is crucial in order to act accordingly in this changing environment. Humans utilize different objects and tools in various tasks and hence, one of the most useful informations that could be extracted to recognize the actions are the objects that the person is using. As an example, if a person is holding a book, he is probably reading. The information about the objects the humans are holding is useful to determine the activities they are undergoing. This thesis presents a system that is able to track the user’s hand and learn and recognize the object being held. When instructed to learn, the software extracts key information about the object and stores it with a unique identification number for later recognition. If the user triggers the recognition mode, the system compares the current object’s information with the data previously stored and outputs the best match. The system uses both 2D and 3D descriptors to improve the recognition stage. In order to reduce the noise, there are two separate matching procedures for 2D and 3D that output a preliminary prediction at a rate of 30 predictions per second. Finally, a weighted average is performed with these 30 predictions for both 2D and 3D and the final prediction of the system is obtained. The experiments carried out to validate the system reveal that it is capable of recognizing objects from a pool of 6 different objects with a F1 score value near 80% for each case. The experiments demonstrate that the system performs better when combines the information of 2D and 3D descriptors than when used 2D or 3D descriptors separately. The performance tests show that the system is able to run on real time with minimum computer requirements of roughly one physical core (at 2.4GHz) and less than 1 GB of RAM memory. Also, it is possible to implement the software in a distributed system since the bandwidth measurements carried out disclose a maximum bandwidth lower than 7 MB/s. This system is, to the best of my knowledge, the first in the art to implement an in-hand object learning and recognition algorithm using 2D and 3D information. The introduction of both types of data and the inclusion of a posterior decision step improves the robustness and the accuracy of the system. The software developed in this thesis is to serve as a building block for further research on the topic in order to create a more natural human-robot interaction an understanding. This creation of a human- like interaction with the environment for robots is a crucial step towards their complete autonomy and acceptance in human areas.La tendencia a introducir robots asistenciales en nuestra vida cotidiana es cada vez mayor. Esto hace necesaria la incorporación de un sistema de percepción en los robots capaz de detectar las tareas que las personas están realizando. Para ello, el reconocimiento de los objetos que se utilizan es una de las informaciones más útiles que se pueden extraer. Por ejemplo, si una persona está sosteniendo un libro, probablemente esté leyendo. La información acerca de los objetos que las personas utilizan sirve para identificar lo que están haciendo. Esta tesis presenta un sistema que es capaz de seguir la mano del usuario y aprender y reconocer el objeto que ésta sostiene. Durante el modo de aprendizaje, el programa extrae información importante sobre el objeto y la guarda con un número de identificación único. El modo de reconocimiento, por su parte, compara la información extraída del objeto actual con la guardada previamente. La salida del sistema es el número de identificación del objeto aprendido más parecido al actual. El sistema utiliza descriptores 2D y 3D para mejorar la fase de reconocimiento. Para reducir el ruido, se compara la información 2D y 3D por separado y se extrae una predicción preliminar a una velocidad de 30 predicciones por segundo. Posteriormente, se realiza una media ponderada de esas 30 predicciones para obtener el resultado final. Los experimentos realizados para validar el sistema revelan que es capaz de reconocer objetos de un conjunto total de 6 con un valor F cercano al 80% en todos los casos. Los resultados demuestran que el valor F obtenido por el sistema es mejor que aquel obtenido por las predicciones individuales en 2D y 3D. Los tests de rendimiento que se han realizado en el sistema indican que es capaz de operar en tiempo real. Para ello necesita un ordenador con unos requerimientos mínimos de un núcleo (a 2.4 GHz) y 1 GB de memoria RAM. También señalan que es posible implementar el programa en un sistema distribuído debido a que el máximo de ancho de banda obtenido es menor de 7 MB/s. Este sistema es, según los datos de que dispongo, el primero en incorporar un reconocimiento y aprendizaje de objetos sostenidos por una mano utilizando información 2D y 3D. La introducción de ambos tipos de datos y de una posterior etapa de decisión mejora la robustez y la precisión del sistema. El programa desarrollado en esta tesis sirve como un primer paso para incentivar la investigación en este campo, con la intención de crear una interacción más natural entre humanos y robots. La introducción en los robots de una capacidad de relación con el entorno similar a la humana es un paso decisivo hacia su completa autonomía y su aceptación en áreas habitadas por humanos.Ingeniería Electrónica Industrial y Automátic

    Multi-scale lines and edges in V1 and beyond: brightness, object categorization and recognition, and consciousness

    Get PDF
    In this paper we present an improved model for line and edge detection in cortical area V1. This model is based on responses of simple and complex cells, and it is multi-scale with no free parameters. We illustrate the use of the multi-scale line/edge representation in different processes: visual reconstruction or brightness perception, automatic scale selection and object segregation. A two-level object categorization scenario is tested in which pre-categorization is based on coarse scales only and final categorization on coarse plus fine scales. We also present a multi-scale object and face recognition model. Processing schemes are discussed in the framework of a complete cortical architecture. The fact that brightness perception and object recognition may be based on the same symbolic image representation is an indication that the entire (visual) cortex is involved in consciousness

    A Multicamera System for Gesture Tracking With Three Dimensional Hand Pose Estimation

    Get PDF
    The goal of any visual tracking system is to successfully detect then follow an object of interest through a sequence of images. The difficulty of tracking an object depends on the dynamics, the motion and the characteristics of the object as well as on the environ ment. For example, tracking an articulated, self-occluding object such as a signing hand has proven to be a very difficult problem. The focus of this work is on tracking and pose estimation with applications to hand gesture interpretation. An approach that attempts to integrate the simplicity of a region tracker with single hand 3D pose estimation methods is presented. Additionally, this work delves into the pose estimation problem. This is ac complished by both analyzing hand templates composed of their morphological skeleton, and addressing the skeleton\u27s inherent instability. Ligature points along the skeleton are flagged in order to determine their effect on skeletal instabilities. Tested on real data, the analysis finds the flagging of ligature points to proportionally increase the match strength of high similarity image-template pairs by about 6%. The effectiveness of this approach is further demonstrated in a real-time multicamera hand tracking system that tracks hand gestures through three-dimensional space as well as estimate the three-dimensional pose of the hand

    A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"

    Full text link
    Recently, technologies such as face detection, facial landmark localisation and face recognition and verification have matured enough to provide effective and efficient solutions for imagery captured under arbitrary conditions (referred to as "in-the-wild"). This is partially attributed to the fact that comprehensive "in-the-wild" benchmarks have been developed for face detection, landmark localisation and recognition/verification. A very important technology that has not been thoroughly evaluated yet is deformable face tracking "in-the-wild". Until now, the performance has mainly been assessed qualitatively by visually assessing the result of a deformable face tracking technology on short videos. In this paper, we perform the first, to the best of our knowledge, thorough evaluation of state-of-the-art deformable face tracking pipelines using the recently introduced 300VW benchmark. We evaluate many different architectures focusing mainly on the task of on-line deformable face tracking. In particular, we compare the following general strategies: (a) generic face detection plus generic facial landmark localisation, (b) generic model free tracking plus generic facial landmark localisation, as well as (c) hybrid approaches using state-of-the-art face detection, model free tracking and facial landmark localisation technologies. Our evaluation reveals future avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second authorshi
    corecore