1,501 research outputs found
The Evolution of First Person Vision Methods: A Survey
The emergence of new wearable technologies such as action cameras and
smart-glasses has increased the interest of computer vision scientists in the
First Person perspective. Nowadays, this field is attracting attention and
investments of companies aiming to develop commercial devices with First Person
Vision recording capabilities. Due to this interest, an increasing demand of
methods to process these videos, possibly in real-time, is expected. Current
approaches present a particular combinations of different image features and
quantitative methods to accomplish specific objectives like object detection,
activity recognition, user machine interaction and so on. This paper summarizes
the evolution of the state of the art in First Person Vision video analysis
between 1997 and 2014, highlighting, among others, most commonly used features,
methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart
Glasses, Computer Vision, Video Analytics, Human-machine Interactio
Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age
Simultaneous Localization and Mapping (SLAM)consists in the concurrent
construction of a model of the environment (the map), and the estimation of the
state of the robot moving within it. The SLAM community has made astonishing
progress over the last 30 years, enabling large-scale real-world applications,
and witnessing a steady transition of this technology to industry. We survey
the current state of SLAM. We start by presenting what is now the de-facto
standard formulation for SLAM. We then review related work, covering a broad
set of topics including robustness and scalability in long-term mapping, metric
and semantic representations for mapping, theoretical performance guarantees,
active SLAM and exploration, and other new frontiers. This paper simultaneously
serves as a position paper and tutorial to those who are users of SLAM. By
looking at the published research with a critical eye, we delineate open
challenges and new research issues, that still deserve careful scientific
investigation. The paper also contains the authors' take on two questions that
often animate discussions during robotics conferences: Do robots need SLAM? and
Is SLAM solved
Augmented indoor hybrid maps using catadioptric vision
En este Trabajo de Fin de Máster se presenta un nuevo método para crear mapas semánticos a partir de secuencias de imágenes omnidireccionales. El objetivo es diseñar el nivel superior de un mapa jerárquico: mapa semántico o mapa topológico aumentado, aprovechando y adaptando este tipo de cámaras. La segmentación de la secuencia de imágenes se realiza distinguiendo entre Lugares y Transiciones, poniendo especial énfasis en la detección de estas Transiciones ya que aportan una información muy útil e importante al mapa. Dentro de los Lugares se hace una clasificación más detallada entre pasillos y habitaciones de distintos tipos. Y dentro de las Transiciones distinguiremos entre puertas, jambas, escaleras y ascensores, que son los principales tipos de Transiciones que aparecen en escenarios de interior. Para la segmentación del espacio en estos tipos de áreas se han utilizado solo descriptores de imagen globales, en concreto Gist. La gran ventaja de usar este tipo de descriptores es la mayor eficiencia y compacidad frente al uso de descriptores locales. Además para mantener la consistencia espacio-temporal de la secuencia de imágenes, se hace uso de un modelo probabilÃstico: Modelo Oculto de Markov (HMM). A pesar de la simplicidad del método, los resultados muestran cómo es capaz de realizar una segmentación de la secuencia de imágenes en clusters con significado para las personas. Todos los experimentos se han llevado a cabo utilizando nuestro nuevo data set de imágenes omnidireccionales, capturado con una cámara montada en un casco, por lo que la secuencia sigue el movimiento de una persona durante su desplazamiento dentro de un edificio. El data set se encuentra público en Internet para que pueda ser utilizado en otras investigaciones
An Outlook into the Future of Egocentric Vision
What will the future be? We wonder! In this survey, we explore the gap
between current research in egocentric vision and the ever-anticipated future,
where wearable computing, with outward facing cameras and digital overlays, is
expected to be integrated in our every day lives. To understand this gap, the
article starts by envisaging the future through character-based stories,
showcasing through examples the limitations of current technology. We then
provide a mapping between this future and previously defined research tasks.
For each task, we survey its seminal works, current state-of-the-art
methodologies and available datasets, then reflect on shortcomings that limit
its applicability to future research. Note that this survey focuses on software
models for egocentric vision, independent of any specific hardware. The paper
concludes with recommendations for areas of immediate explorations so as to
unlock our path to the future always-on, personalised and life-enhancing
egocentric vision.Comment: We invite comments, suggestions and corrections here:
https://openreview.net/forum?id=V3974SUk1
Fireground location understanding by semantic linking of visual objects and building information models
This paper presents an outline for improved localization and situational awareness in fire emergency situations based on semantic technology and computer vision techniques. The novelty of our methodology lies in the semantic linking of video object recognition results from visual and thermal cameras with Building Information Models (BIM). The current limitations and possibilities of certain building information streams in the context of fire safety or fire incident management are addressed in this paper. Furthermore, our data management tools match higher-level semantic metadata descriptors of BIM and deep-learning based visual object recognition and classification networks. Based on these matches, estimations can be generated of camera, objects and event positions in the BIM model, transforming it from a static source of information into a rich, dynamic data provider. Previous work has already investigated the possibilities to link BIM and low-cost point sensors for fireground understanding, but these approaches did not take into account the benefits of video analysis and recent developments in semantics and feature learning research. Finally, the strengths of the proposed approach compared to the state-of-the-art is its (semi -)automatic workflow, generic and modular setup and multi-modal strategy, which allows to automatically create situational awareness, to improve localization and to facilitate the overall fire understanding
- …