174 research outputs found
Graph-Based Classification of Omnidirectional Images
Omnidirectional cameras are widely used in such areas as robotics and virtual
reality as they provide a wide field of view. Their images are often processed
with classical methods, which might unfortunately lead to non-optimal solutions
as these methods are designed for planar images that have different geometrical
properties than omnidirectional ones. In this paper we study image
classification task by taking into account the specific geometry of
omnidirectional cameras with graph-based representations. In particular, we
extend deep learning architectures to data on graphs; we propose a principled
way of graph construction such that convolutional filters respond similarly for
the same pattern on different positions of the image regardless of lens
distortions. Our experiments show that the proposed method outperforms current
techniques for the omnidirectional image classification problem
Creation and maintenance of visual incremental maps and hierarchical localization.
Over the last few years, the presence of the mobile robotics has considerably
increased in a wide variety of environments. It is common to find robots that carry
out repetitive and specific applications and also, they can be used for working at
dangerous environments and to perform precise tasks. These robots can be
found in a variety of social environments, such as industry, household,
educational and health scenarios. For that reason, they need a specific and
continuous research and improvement work. Specifically, autonomous mobile
robots require a very precise technology to perform tasks without human
assistance.
To perform tasks autonomously, the robots must be able to navigate in an
unknown environment. For that reason, the autonomous mobile robots must be
able to address the mapping and localization tasks: they must create a model of
the environment and estimate their position and orientation.
This PhD thesis proposes and analyses different methods to carry out the map
creation and the localization tasks in indoor environments. To address these
tasks only visual information is used, specifically, omnidirectional images, with a
360º field of view. Throughout the chapters of this document solutions for
autonomous navigation tasks are proposed, they are solved using
transformations in the images captured by a vision system mounted on the robot.
Firstly, the thesis focuses on the study of the global appearance descriptors in
the localization task. The global appearance descriptors are algorithms that
transform an image globally, into a unique vector. In these works, a deep
comparative study is performed. In the experiments different global appearance
descriptors are used along with omnidirectional images and the results are
compared. The main goal is to obtain an optimized algorithm to estimate the robot
position and orientation in real indoor environments. The experiments take place
with real conditions, so some visual changes in the scenes can occur, such as
camera defects, furniture or people movements and changes in the lighting
conditions. The computational cost is also studied; the idea is that the robot has
to localize the robot in an accurate mode, but also, it has to be fast enought.
Additionally, a second application, whose goal is to carry out an incremental
mapping in indoor environments, is presented. This application uses the best
global appearance descriptors used in the localization task, but this time they are
constructed with the purpose of solving the mapping problem using an
incremental clustering technique. The application clusters a batch of images that
are visually similar; every group of images or cluster is expected to identify a zone
of the environment. The shape and size of the cluster can vary while the robot is
visiting the different rooms. Nowadays. different algorithms can be used to obtain
the clusters, but all these solutions usually work properly when they work ‘offline’,
starting from the whole set of data to cluster. The main idea of this study is
to obtain the map incrementally while the robot explores the new environment.
Carrying out the mapping incrementally while the robot is still visiting the area is very interesting since having the map separated into nodes with relationships of
similitude between them can be used subsequently for the hierarchical
localization tasks, and also, to recognize environments already visited in the
model.
Finally, this PhD thesis includes an analysis of deep learning techniques for
localization tasks. Particularly, siamese networks have been studied. Siamese
networks are based on classic convolutional networks, but they permit evaluating
two images simultaneously. These networks output a similarity value between the
input images, and that information can be used for the localization tasks.
Throughout this work the technique is presented, the possible architectures are
analysed and the results after the experiments are shown and compared. Using
the siamese networks, the localization in real operation conditions and
environments is solved, focusing on improving the performance against
illumination changes on the scene. During the experiments the room retrieval
problem, the hierarchical localization and the absolute localization have been
solved.Durante los últimos años, la presencia de la robótica móvil ha aumentado
substancialmente en una gran variedad de entornos y escenarios. Es habitual
encontrar el uso de robots para llevar a cabo aplicaciones repetitivas y
especÃficas, asà como tareas en entornos peligrosos o con resultados que deben
ser muy precisos. Dichos robots se pueden encontrar tanto en ámbitos
industriales como en familiares, educativos y de salud; por ello, requieren un
trabajo especÃfico y continuo de investigación y mejora. En concreto, los robots
móviles autónomos requieren de una tecnologÃa precisa para desarrollar tareas
sin ayuda del ser humano.
Para realizar tareas de manera autónoma, los robots deben ser capaces de
navegar por un entorno ‘a priori’ desconocido. Por tanto, los robots móviles
autónomos deben ser capaces de realizar la tarea de creación de mapas,
creando un modelo del entorno y la tarea de localización, esto es estimar su
posición y orientación.
La presente tesis plantea un diseño y análisis de diferentes métodos para realizar
las tareas de creación de mapas y localización en entornos de interior. Para estas
tareas se emplea únicamente información visual, en concreto, imágenes
omnidireccionales, con un campo de visión de 360º. En los capÃtulos de este
trabajo se plantean soluciones a las tareas de navegación autónoma del robot
mediante transformaciones en las imágenes que este es capaz de captar.
En cuanto a los trabajos realizados, en primer lugar, se presenta un estudio de
descriptores de apariencia global en tareas de localización. Los descriptores de
apariencia global son transformaciones capaces de obtener un único vector que
describa globalmente una imagen. En este trabajo se realiza un estudio
exhaustivo de diferentes métodos de apariencia global adaptando su uso a
imágenes omnidireccionales. Se trata de obtener un algoritmo optimizado para
estimar la posición y orientación del robot en entornos reales de oficina, donde
puede surgir cambios visuales en el entorno como movimientos de cámara, de
mobiliario o de iluminación en la escena. También se evalúa el tiempo empleado
para realizar esta estimación, ya que el trabajo de un robot debe ser preciso,
pero también factible en cuanto a tiempos de computación.
Además, se presenta una segunda aplicación donde el estudio se centra en la
creación de mapas de entornos de interior de manera incremental. Esta
aplicación hace uso de los descriptores de apariencia global estudiados para la
tarea de localización, pero en este caso se utilizan para la construcción de mapas
utilizando la técnica de ‘clustering’ incremental. En esta aplicación, conjuntos de
imágenes visualmente similares se agrupan en un único grupo. La forma y
cantidad de grupos es variable conforme el robot avanza en el entorno.
Actualmente, existen diferentes algoritmos para obtener la separación de un
entorno en nodos, pero las soluciones efectivas se realizan de manera ‘off-line’,
es decir, a posteriori una vez se tienen todas las imágenes captadas. El trabajo
presentado permite realizar esta tarea de manera incremental mientras el robot explora el nuevo entorno. Realizar esta tarea mientras se visita el resto del
entorno puede ser muy interesante ya que tener el mapa separado por nodos
con relaciones de proximidad entre ellos se puede ir utilizando para tareas de
localización jerárquica. Además, es posible reconocer entornos ya visitados o
similares a nodos pasados.
Por último, la tesis también incluye el estudio de técnicas de aprendizaje
profundo (‘deep learning’) para tareas de localización. En concreto, se estudia el
uso de las redes siamesas, una técnica poco explorada en robótica móvil, que
está basada en las clásicas redes convolucionales, pero en la que dos imágenes
son evaluadas al mismo tiempo. Estas redes dan un valor de similitud entre el
par de imágenes de entrada, lo que permite realizar tareas de localización visual.
En este trabajo se expone esta técnica, se presentan las estructuras que pueden
tener estas redes y los resultados tras la experimentación. Se evalúa la tarea de
localización en entornos heterogéneos en los que el principal problema viene
dado por cambios en la iluminación de la escena. Con las redes siamesas se
trata de resolver el problema de estimación de estancia, el problema de
localización jerárquica y el de localización absoluta
Mobile Robots Navigation
Mobile robots navigation includes different interrelated activities: (i) perception, as obtaining and interpreting sensory information; (ii) exploration, as the strategy that guides the robot to select the next direction to go; (iii) mapping, involving the construction of a spatial representation by using the sensory information perceived; (iv) localization, as the strategy to estimate the robot position within the spatial map; (v) path planning, as the strategy to find a path towards a goal location being optimal or not; and (vi) path execution, where motor actions are determined and adapted to environmental changes. The book addresses those activities by integrating results from the research work of several authors all over the world. Research cases are documented in 32 chapters organized within 7 categories next described
Visual Place Recognition under Severe Viewpoint and Appearance Changes
Over the last decade, the eagerness of the robotic and computer vision research communities unfolded extensive advancements in long-term robotic vision. Visual localization is the constituent of this active research domain; an ability of an object to correctly localize itself while mapping the environment simultaneously, technically termed as Simultaneous Localization and Mapping (SLAM).
Visual Place Recognition (VPR), a core component of SLAM is a well-known paradigm. In layman terms, at a certain place/location within an environment, a robot needs to decide whether it’s the same place experienced before? Visual Place Recognition utilizing Convolutional Neural Networks (CNNs) has made a major contribution in the last few years. However, the image retrieval-based VPR becomes more challenging when the same places experience strong viewpoint and seasonal transitions. This thesis concentrates on improving the retrieval performance of VPR system, generally targeting the place correspondence.
Despite the remarkable performances of state-of-the-art deep CNNs for VPR, the significant computation- and memory-overhead limit their practical deployment for resource constrained mobile robots. This thesis investigates the utility of shallow CNNs for power-efficient VPR applications. The proposed VPR frameworks focus on novel image regions that can contribute in recognizing places under dubious environment and viewpoint variations.
Employing challenging place recognition benchmark datasets, this thesis further illustrates and evaluates the robustness of shallow CNN-based regional features against viewpoint and appearance changes coupled with dynamic instances, such as pedestrians, vehicles etc. Finally, the presented computation-efficient and light-weight VPR methodologies have shown boostup in matching performance in terms of Area under Precision-Recall curves (AUC-PR curves) over state-of-the-art deep neural network based place recognition and SLAM algorithms
Information-theoretic environment modeling for mobile robot localization
To enhance robotic computational efficiency without degenerating accuracy, it is imperative to fit the right and exact amount of information in its simplest form to the investigated task. This thesis conforms to this reasoning in environment model building and robot localization. It puts forth an approach towards building maps and localizing a mobile robot efficiently with respect to unknown, unstructured and moderately dynamic environments. For this, the environment is modeled on an information-theoretic basis, more specifically in terms of its transmission property. Subsequently, the presented environment model, which does not specifically adhere to classical geometric modeling, succeeds in solving the environment disambiguation effectively.
The proposed solution lays out a two-level hierarchical structure for localization. The structure makes use of extracted features, which are stored in two different resolutions in a single hybrid feature-map. This enables dual coarse-topological and fine-geometric localization modalities.
The first level in the hierarchy describes the environment topologically, where a defined set of places is described by a probabilistic feature representation. A conditional entropy-based criterion is proposed to quantify the transinformation between the feature and the place domains. This criterion provides a double benefit of pruning the large dimensional feature space, and at the same time selecting the best discriminative features that overcome environment aliasing problems. Features with the highest transinformation are filtered and compressed to form a coarse resolution feature-map (codebook). Localization at this level is conducted through place matching.
In the second level of the hierarchy, the map is viewed in high-resolution, as consisting of non-compressed entropy-processed features. These features are additionally tagged with their position information. Given the identified topological place provided by the first level, fine localization corresponding to the second level is executed using feature triangulation. To enhance the triangulation accuracy, redundant features are used and two metric evaluating criteria are employ-ed; one for dynamic features and mismatches detection, and another for feature selection.
The proposed approach and methods have been tested in realistic indoor environments using a vision sensor and the Scale Invariant Feature Transform local feature extraction. Through experiments, it is demonstrated that an information-theoretic modeling approach is highly efficient in attaining combined accuracy and computational efficiency performances for localization. It has also been proven that the approach is capable of modeling environments with a high degree of unstructuredness, perceptual aliasing, and dynamic variations (illumination conditions; scene dynamics). The merit of employing this modeling type is that environment features are evaluated quantitatively, while at the same time qualitative conclusions are generated about feature selection and performance in a robot localization task. In this way, the accuracy of localization can be adapted in accordance with the available resources.
The experimental results also show that the hybrid topological-metric map provides sufficient information to localize a mobile robot on two scales, independent of the robot motion model. The codebook exhibits fast and accurate topological localization at significant compression ratios. The hierarchical localization framework demonstrates robustness and optimized space and time complexities. This, in turn, provides scalability to large environments application and real-time employment adequacies
Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs
Humans are able to form a complex mental model of the environment they move
in. This mental model captures geometric and semantic aspects of the scene,
describes the environment at multiple levels of abstractions (e.g., objects,
rooms, buildings), includes static and dynamic entities and their relations
(e.g., a person is in a room at a given time). In contrast, current robots'
internal representations still provide a partial and fragmented understanding
of the environment, either in the form of a sparse or dense set of geometric
primitives (e.g., points, lines, planes, voxels) or as a collection of objects.
This paper attempts to reduce the gap between robot and human perception by
introducing a novel representation, a 3D Dynamic Scene Graph(DSG), that
seamlessly captures metric and semantic aspects of a dynamic environment. A DSG
is a layered graph where nodes represent spatial concepts at different levels
of abstraction, and edges represent spatio-temporal relations among nodes. Our
second contribution is Kimera, the first fully automatic method to build a DSG
from visual-inertial data. Kimera includes state-of-the-art techniques for
visual-inertial SLAM, metric-semantic 3D reconstruction, object localization,
human pose and shape estimation, and scene parsing. Our third contribution is a
comprehensive evaluation of Kimera in real-life datasets and photo-realistic
simulations, including a newly released dataset, uHumans2, which simulates a
collection of crowded indoor and outdoor scenes. Our evaluation shows that
Kimera achieves state-of-the-art performance in visual-inertial SLAM, estimates
an accurate 3D metric-semantic mesh model in real-time, and builds a DSG of a
complex indoor environment with tens of objects and humans in minutes. Our
final contribution shows how to use a DSG for real-time hierarchical semantic
path-planning. The core modules in Kimera are open-source.Comment: 34 pages, 25 figures, 9 tables. arXiv admin note: text overlap with
arXiv:2002.0628
Minimal Learning Machine: Theoretical Results and Clustering-Based Reference Point Selection
The Minimal Learning Machine (MLM) is a nonlinear supervised approach based
on learning a linear mapping between distance matrices computed in the input
and output data spaces, where distances are calculated using a subset of points
called reference points. Its simple formulation has attracted several recent
works on extensions and applications. In this paper, we aim to address some
open questions related to the MLM. First, we detail theoretical aspects that
assure the interpolation and universal approximation capabilities of the MLM,
which were previously only empirically verified. Second, we identify the task
of selecting reference points as having major importance for the MLM's
generalization capability. Several clustering-based methods for reference point
selection in regression scenarios are then proposed and analyzed. Based on an
extensive empirical evaluation, we conclude that the evaluated methods are both
scalable and useful. Specifically, for a small number of reference points, the
clustering-based methods outperformed the standard random selection of the
original MLM formulation.Comment: 29 pages, Accepted to JML
Survey on Leveraging Uncertainty Estimation Towards Trustworthy Deep Neural Networks: The Case of Reject Option and Post-training Processing
Although neural networks (especially deep neural networks) have achieved
\textit{better-than-human} performance in many fields, their real-world
deployment is still questionable due to the lack of awareness about the
limitation in their knowledge. To incorporate such awareness in the machine
learning model, prediction with reject option (also known as selective
classification or classification with abstention) has been proposed in
literature. In this paper, we present a systematic review of the prediction
with the reject option in the context of various neural networks. To the best
of our knowledge, this is the first study focusing on this aspect of neural
networks. Moreover, we discuss different novel loss functions related to the
reject option and post-training processing (if any) of network output for
generating suitable measurements for knowledge awareness of the model. Finally,
we address the application of the rejection option in reducing the prediction
time for the real-time problems and present a comprehensive summary of the
techniques related to the reject option in the context of extensive variety of
neural networks. Our code is available on GitHub:
\url{https://github.com/MehediHasanTutul/Reject_option
- …