988 research outputs found
Data-Driven Shape Analysis and Processing
Data-driven methods play an increasingly important role in discovering
geometric, structural, and semantic relationships between 3D shapes in
collections, and applying this analysis to support intelligent modeling,
editing, and visualization of geometric data. In contrast to traditional
approaches, a key feature of data-driven approaches is that they aggregate
information from a collection of shapes to improve the analysis and processing
of individual shapes. In addition, they are able to learn models that reason
about properties and relationships of shapes without relying on hard-coded
rules or explicitly programmed instructions. We provide an overview of the main
concepts and components of these techniques, and discuss their application to
shape classification, segmentation, matching, reconstruction, modeling and
exploration, as well as scene analysis and synthesis, through reviewing the
literature and relating the existing works with both qualitative and numerical
comparisons. We conclude our report with ideas that can inspire future research
in data-driven shape analysis and processing.Comment: 10 pages, 19 figure
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
Indoor localization and navigation for blind persons using visual landmarks and a GIS
In an unfamiliar environment we spot and explore all available information which might guide us to a desired location. This
largely unconscious processing is done by our trained sensory a
nd cognitive systems. These recognize and memorize sets of
landmarks which allow us to create a mental map of the envi
ronment, and this map enables us to navigate by exploiting
very few but the most important landmarks stored in our
memory. We present a system which integrates a geographic
information system of a building with visu
al landmarks for localizing the user in the building and for tracing and validating
a route for the user's navigation. Hence, the developed
system complements the white cane for improving the user's
autonomy during indoor navigation. Although de
signed for visually impaired persons, the system can be used by any person
for wayfinding in a complex building
In-hand object detection and tracking using 2D and 3D information
As robots are introduced increasingly in human-inhabited areas, they would need a perception system able to detect the actions the humans around it are performing. This information is crucial in order to act accordingly in this changing environment. Humans utilize different objects and tools in various tasks and hence, one of the most useful informations that could be extracted to recognize the actions are the objects that the person is using. As an example, if a person is holding a book, he is probably reading. The information about the objects the humans are holding is useful to determine the activities they are undergoing. This thesis presents a system that is able to track the user’s hand and learn and recognize the object being held. When instructed to learn, the software extracts key information about the object and stores it with a unique identification number for later recognition. If the user triggers the recognition mode, the system compares the current object’s information with the data previously stored and outputs the best match. The system uses both 2D and 3D descriptors to improve the recognition stage. In order to reduce the noise, there are two separate matching procedures for 2D and 3D that output a preliminary prediction at a rate of 30 predictions per second. Finally, a weighted average is performed with these 30 predictions for both 2D and 3D and the final prediction of the system is obtained. The experiments carried out to validate the system reveal that it is capable of recognizing objects from a pool of 6 different objects with a F1 score value near 80% for each case. The experiments demonstrate that the system performs better when combines the information of 2D and 3D descriptors than when used 2D or 3D descriptors separately. The performance tests show that the system is able to run on real time with minimum computer requirements of roughly one physical core (at 2.4GHz) and less than 1 GB of RAM memory. Also, it is possible to implement the software in a distributed system since the bandwidth measurements carried out disclose a maximum bandwidth lower than 7 MB/s. This system is, to the best of my knowledge, the first in the art to implement an in-hand object learning and recognition algorithm using 2D and 3D information. The introduction of both types of data and the inclusion of a posterior decision step improves the robustness and the accuracy of the system. The software developed in this thesis is to serve as a building block for further research on the topic in order to create a more natural human-robot interaction an understanding. This creation of a human- like interaction with the environment for robots is a crucial step towards their complete autonomy and acceptance in human areas.La tendencia a introducir robots asistenciales en nuestra vida cotidiana es cada vez mayor. Esto hace
necesaria la incorporación de un sistema de percepción en los robots capaz de detectar las tareas que
las personas están realizando. Para ello, el reconocimiento de los objetos que se utilizan es una de las
informaciones más útiles que se pueden extraer. Por ejemplo, si una persona está sosteniendo un libro,
probablemente esté leyendo. La información acerca de los objetos que las personas utilizan sirve para
identificar lo que están haciendo.
Esta tesis presenta un sistema que es capaz de seguir la mano del usuario y aprender y reconocer el
objeto que ésta sostiene. Durante el modo de aprendizaje, el programa extrae información importante
sobre el objeto y la guarda con un número de identificación único. El modo de reconocimiento, por su
parte, compara la información extraída del objeto actual con la guardada previamente. La salida del
sistema es el número de identificación del objeto aprendido más parecido al actual.
El sistema utiliza descriptores 2D y 3D para mejorar la fase de reconocimiento. Para reducir el ruido,
se compara la información 2D y 3D por separado y se extrae una predicción preliminar a una velocidad
de 30 predicciones por segundo. Posteriormente, se realiza una media ponderada de esas 30 predicciones
para obtener el resultado final.
Los experimentos realizados para validar el sistema revelan que es capaz de reconocer objetos de un
conjunto total de 6 con un valor F cercano al 80% en todos los casos. Los resultados demuestran que el
valor F obtenido por el sistema es mejor que aquel obtenido por las predicciones individuales en 2D y
3D. Los tests de rendimiento que se han realizado en el sistema indican que es capaz de operar en tiempo
real. Para ello necesita un ordenador con unos requerimientos mínimos de un núcleo (a 2.4 GHz) y 1 GB
de memoria RAM. También señalan que es posible implementar el programa en un sistema distribuído
debido a que el máximo de ancho de banda obtenido es menor de 7 MB/s.
Este sistema es, según los datos de que dispongo, el primero en incorporar un reconocimiento y
aprendizaje de objetos sostenidos por una mano utilizando información 2D y 3D. La introducción de
ambos tipos de datos y de una posterior etapa de decisión mejora la robustez y la precisión del sistema.
El programa desarrollado en esta tesis sirve como un primer paso para incentivar la investigación en este
campo, con la intención de crear una interacción más natural entre humanos y robots. La introducción
en los robots de una capacidad de relación con el entorno similar a la humana es un paso decisivo hacia
su completa autonomía y su aceptación en áreas habitadas por humanos.Ingeniería Electrónica Industrial y Automátic
Multi-scale lines and edges in V1 and beyond: brightness, object categorization and recognition, and consciousness
In this paper we present an improved model for line and edge detection in cortical area V1. This model is based on responses of simple and complex cells, and it is multi-scale with no free parameters. We illustrate the use of the multi-scale line/edge representation in different processes: visual reconstruction or brightness perception, automatic scale selection and object segregation. A two-level object categorization scenario is tested in which pre-categorization is based on coarse scales only and final categorization on coarse plus fine scales. We also present a multi-scale object and face recognition model. Processing schemes are discussed in the framework of a complete cortical architecture. The fact that brightness perception and object recognition may be based on the same symbolic image representation is an indication that the entire (visual) cortex is involved in consciousness
A Multicamera System for Gesture Tracking With Three Dimensional Hand Pose Estimation
The goal of any visual tracking system is to successfully detect then follow an object of interest through a sequence of images. The difficulty of tracking an object depends on the dynamics, the motion and the characteristics of the object as well as on the environ ment. For example, tracking an articulated, self-occluding object such as a signing hand has proven to be a very difficult problem. The focus of this work is on tracking and pose estimation with applications to hand gesture interpretation. An approach that attempts to integrate the simplicity of a region tracker with single hand 3D pose estimation methods is presented. Additionally, this work delves into the pose estimation problem. This is ac complished by both analyzing hand templates composed of their morphological skeleton, and addressing the skeleton\u27s inherent instability. Ligature points along the skeleton are flagged in order to determine their effect on skeletal instabilities. Tested on real data, the analysis finds the flagging of ligature points to proportionally increase the match strength of high similarity image-template pairs by about 6%. The effectiveness of this approach is further demonstrated in a real-time multicamera hand tracking system that tracks hand gestures through three-dimensional space as well as estimate the three-dimensional pose of the hand
A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"
Recently, technologies such as face detection, facial landmark localisation
and face recognition and verification have matured enough to provide effective
and efficient solutions for imagery captured under arbitrary conditions
(referred to as "in-the-wild"). This is partially attributed to the fact that
comprehensive "in-the-wild" benchmarks have been developed for face detection,
landmark localisation and recognition/verification. A very important technology
that has not been thoroughly evaluated yet is deformable face tracking
"in-the-wild". Until now, the performance has mainly been assessed
qualitatively by visually assessing the result of a deformable face tracking
technology on short videos. In this paper, we perform the first, to the best of
our knowledge, thorough evaluation of state-of-the-art deformable face tracking
pipelines using the recently introduced 300VW benchmark. We evaluate many
different architectures focusing mainly on the task of on-line deformable face
tracking. In particular, we compare the following general strategies: (a)
generic face detection plus generic facial landmark localisation, (b) generic
model free tracking plus generic facial landmark localisation, as well as (c)
hybrid approaches using state-of-the-art face detection, model free tracking
and facial landmark localisation technologies. Our evaluation reveals future
avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second
authorshi
- …