Search CORE

174 research outputs found

Graph-Based Classification of Omnidirectional Images

Author: Frossard Pascal
Khasanova Renata
Publication venue
Publication date: 26/07/2017
Field of study

Omnidirectional cameras are widely used in such areas as robotics and virtual reality as they provide a wide field of view. Their images are often processed with classical methods, which might unfortunately lead to non-optimal solutions as these methods are designed for planar images that have different geometrical properties than omnidirectional ones. In this paper we study image classification task by taking into account the specific geometry of omnidirectional cameras with graph-based representations. In particular, we extend deep learning architectures to data on graphs; we propose a principled way of graph construction such that convolutional filters respond similarly for the same pattern on different positions of the image regardless of lens distortions. Our experiments show that the proposed method outperforms current techniques for the omnidirectional image classification problem

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Place Recognition for Mobile Robot in Changing Environments

Author: Cao Juan
Publication venue
Publication date: 01/01/2016
Field of study

Aberystwyth Research Portal

Creation and maintenance of visual incremental maps and hierarchical localization.

Author: Román Erades Vicente
Publication venue: 'Universidad Miguel Hernandez de Elche'
Publication date: 04/10/2022
Field of study

Over the last few years, the presence of the mobile robotics has considerably increased in a wide variety of environments. It is common to find robots that carry out repetitive and specific applications and also, they can be used for working at dangerous environments and to perform precise tasks. These robots can be found in a variety of social environments, such as industry, household, educational and health scenarios. For that reason, they need a specific and continuous research and improvement work. Specifically, autonomous mobile robots require a very precise technology to perform tasks without human assistance. To perform tasks autonomously, the robots must be able to navigate in an unknown environment. For that reason, the autonomous mobile robots must be able to address the mapping and localization tasks: they must create a model of the environment and estimate their position and orientation. This PhD thesis proposes and analyses different methods to carry out the map creation and the localization tasks in indoor environments. To address these tasks only visual information is used, specifically, omnidirectional images, with a 360º field of view. Throughout the chapters of this document solutions for autonomous navigation tasks are proposed, they are solved using transformations in the images captured by a vision system mounted on the robot. Firstly, the thesis focuses on the study of the global appearance descriptors in the localization task. The global appearance descriptors are algorithms that transform an image globally, into a unique vector. In these works, a deep comparative study is performed. In the experiments different global appearance descriptors are used along with omnidirectional images and the results are compared. The main goal is to obtain an optimized algorithm to estimate the robot position and orientation in real indoor environments. The experiments take place with real conditions, so some visual changes in the scenes can occur, such as camera defects, furniture or people movements and changes in the lighting conditions. The computational cost is also studied; the idea is that the robot has to localize the robot in an accurate mode, but also, it has to be fast enought. Additionally, a second application, whose goal is to carry out an incremental mapping in indoor environments, is presented. This application uses the best global appearance descriptors used in the localization task, but this time they are constructed with the purpose of solving the mapping problem using an incremental clustering technique. The application clusters a batch of images that are visually similar; every group of images or cluster is expected to identify a zone of the environment. The shape and size of the cluster can vary while the robot is visiting the different rooms. Nowadays. different algorithms can be used to obtain the clusters, but all these solutions usually work properly when they work ‘offline’, starting from the whole set of data to cluster. The main idea of this study is to obtain the map incrementally while the robot explores the new environment. Carrying out the mapping incrementally while the robot is still visiting the area is very interesting since having the map separated into nodes with relationships of similitude between them can be used subsequently for the hierarchical localization tasks, and also, to recognize environments already visited in the model. Finally, this PhD thesis includes an analysis of deep learning techniques for localization tasks. Particularly, siamese networks have been studied. Siamese networks are based on classic convolutional networks, but they permit evaluating two images simultaneously. These networks output a similarity value between the input images, and that information can be used for the localization tasks. Throughout this work the technique is presented, the possible architectures are analysed and the results after the experiments are shown and compared. Using the siamese networks, the localization in real operation conditions and environments is solved, focusing on improving the performance against illumination changes on the scene. During the experiments the room retrieval problem, the hierarchical localization and the absolute localization have been solved.Durante los últimos años, la presencia de la robótica móvil ha aumentado substancialmente en una gran variedad de entornos y escenarios. Es habitual encontrar el uso de robots para llevar a cabo aplicaciones repetitivas y específicas, así como tareas en entornos peligrosos o con resultados que deben ser muy precisos. Dichos robots se pueden encontrar tanto en ámbitos industriales como en familiares, educativos y de salud; por ello, requieren un trabajo específico y continuo de investigación y mejora. En concreto, los robots móviles autónomos requieren de una tecnología precisa para desarrollar tareas sin ayuda del ser humano. Para realizar tareas de manera autónoma, los robots deben ser capaces de navegar por un entorno ‘a priori’ desconocido. Por tanto, los robots móviles autónomos deben ser capaces de realizar la tarea de creación de mapas, creando un modelo del entorno y la tarea de localización, esto es estimar su posición y orientación. La presente tesis plantea un diseño y análisis de diferentes métodos para realizar las tareas de creación de mapas y localización en entornos de interior. Para estas tareas se emplea únicamente información visual, en concreto, imágenes omnidireccionales, con un campo de visión de 360º. En los capítulos de este trabajo se plantean soluciones a las tareas de navegación autónoma del robot mediante transformaciones en las imágenes que este es capaz de captar. En cuanto a los trabajos realizados, en primer lugar, se presenta un estudio de descriptores de apariencia global en tareas de localización. Los descriptores de apariencia global son transformaciones capaces de obtener un único vector que describa globalmente una imagen. En este trabajo se realiza un estudio exhaustivo de diferentes métodos de apariencia global adaptando su uso a imágenes omnidireccionales. Se trata de obtener un algoritmo optimizado para estimar la posición y orientación del robot en entornos reales de oficina, donde puede surgir cambios visuales en el entorno como movimientos de cámara, de mobiliario o de iluminación en la escena. También se evalúa el tiempo empleado para realizar esta estimación, ya que el trabajo de un robot debe ser preciso, pero también factible en cuanto a tiempos de computación. Además, se presenta una segunda aplicación donde el estudio se centra en la creación de mapas de entornos de interior de manera incremental. Esta aplicación hace uso de los descriptores de apariencia global estudiados para la tarea de localización, pero en este caso se utilizan para la construcción de mapas utilizando la técnica de ‘clustering’ incremental. En esta aplicación, conjuntos de imágenes visualmente similares se agrupan en un único grupo. La forma y cantidad de grupos es variable conforme el robot avanza en el entorno. Actualmente, existen diferentes algoritmos para obtener la separación de un entorno en nodos, pero las soluciones efectivas se realizan de manera ‘off-line’, es decir, a posteriori una vez se tienen todas las imágenes captadas. El trabajo presentado permite realizar esta tarea de manera incremental mientras el robot explora el nuevo entorno. Realizar esta tarea mientras se visita el resto del entorno puede ser muy interesante ya que tener el mapa separado por nodos con relaciones de proximidad entre ellos se puede ir utilizando para tareas de localización jerárquica. Además, es posible reconocer entornos ya visitados o similares a nodos pasados. Por último, la tesis también incluye el estudio de técnicas de aprendizaje profundo (‘deep learning’) para tareas de localización. En concreto, se estudia el uso de las redes siamesas, una técnica poco explorada en robótica móvil, que está basada en las clásicas redes convolucionales, pero en la que dos imágenes son evaluadas al mismo tiempo. Estas redes dan un valor de similitud entre el par de imágenes de entrada, lo que permite realizar tareas de localización visual. En este trabajo se expone esta técnica, se presentan las estructuras que pueden tener estas redes y los resultados tras la experimentación. Se evalúa la tarea de localización en entornos heterogéneos en los que el principal problema viene dado por cambios en la iluminación de la escena. Con las redes siamesas se trata de resolver el problema de estimación de estancia, el problema de localización jerárquica y el de localización absoluta

RediUMH (Universidad Miguel Hernández)

Mobile Robots Navigation

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Mobile robots navigation includes different interrelated activities: (i) perception, as obtaining and interpreting sensory information; (ii) exploration, as the strategy that guides the robot to select the next direction to go; (iii) mapping, involving the construction of a spatial representation by using the sensory information perceived; (iv) localization, as the strategy to estimate the robot position within the spatial map; (v) path planning, as the strategy to find a path towards a goal location being optimal or not; and (vi) path execution, where motor actions are determined and adapted to environmental changes. The book addresses those activities by integrating results from the research work of several authors all over the world. Research cases are documented in 32 chapters organized within 7 categories next described

Directory of Open Access Books (DOAB)

Visual Place Recognition under Severe Viewpoint and Appearance Changes

Author: Khaliq Ahmad
Publication venue
Publication date: 01/01/2020
Field of study

Over the last decade, the eagerness of the robotic and computer vision research communities unfolded extensive advancements in long-term robotic vision. Visual localization is the constituent of this active research domain; an ability of an object to correctly localize itself while mapping the environment simultaneously, technically termed as Simultaneous Localization and Mapping (SLAM). Visual Place Recognition (VPR), a core component of SLAM is a well-known paradigm. In layman terms, at a certain place/location within an environment, a robot needs to decide whether it’s the same place experienced before? Visual Place Recognition utilizing Convolutional Neural Networks (CNNs) has made a major contribution in the last few years. However, the image retrieval-based VPR becomes more challenging when the same places experience strong viewpoint and seasonal transitions. This thesis concentrates on improving the retrieval performance of VPR system, generally targeting the place correspondence. Despite the remarkable performances of state-of-the-art deep CNNs for VPR, the significant computation- and memory-overhead limit their practical deployment for resource constrained mobile robots. This thesis investigates the utility of shallow CNNs for power-efficient VPR applications. The proposed VPR frameworks focus on novel image regions that can contribute in recognizing places under dubious environment and viewpoint variations. Employing challenging place recognition benchmark datasets, this thesis further illustrates and evaluates the robustness of shallow CNN-based regional features against viewpoint and appearance changes coupled with dynamic instances, such as pedestrians, vehicles etc. Finally, the presented computation-efficient and light-weight VPR methodologies have shown boostup in matching performance in terms of Area under Precision-Recall curves (AUC-PR curves) over state-of-the-art deep neural network based place recognition and SLAM algorithms

University of Essex Research Repository

Information-theoretic environment modeling for mobile robot localization

Author: Elasmar Sherine Rady Abdel Ghany
Publication venue: Universität Mannheim
Publication date: 01/01/2012
Field of study

To enhance robotic computational efficiency without degenerating accuracy, it is imperative to fit the right and exact amount of information in its simplest form to the investigated task. This thesis conforms to this reasoning in environment model building and robot localization. It puts forth an approach towards building maps and localizing a mobile robot efficiently with respect to unknown, unstructured and moderately dynamic environments. For this, the environment is modeled on an information-theoretic basis, more specifically in terms of its transmission property. Subsequently, the presented environment model, which does not specifically adhere to classical geometric modeling, succeeds in solving the environment disambiguation effectively. The proposed solution lays out a two-level hierarchical structure for localization. The structure makes use of extracted features, which are stored in two different resolutions in a single hybrid feature-map. This enables dual coarse-topological and fine-geometric localization modalities. The first level in the hierarchy describes the environment topologically, where a defined set of places is described by a probabilistic feature representation. A conditional entropy-based criterion is proposed to quantify the transinformation between the feature and the place domains. This criterion provides a double benefit of pruning the large dimensional feature space, and at the same time selecting the best discriminative features that overcome environment aliasing problems. Features with the highest transinformation are filtered and compressed to form a coarse resolution feature-map (codebook). Localization at this level is conducted through place matching. In the second level of the hierarchy, the map is viewed in high-resolution, as consisting of non-compressed entropy-processed features. These features are additionally tagged with their position information. Given the identified topological place provided by the first level, fine localization corresponding to the second level is executed using feature triangulation. To enhance the triangulation accuracy, redundant features are used and two metric evaluating criteria are employ-ed; one for dynamic features and mismatches detection, and another for feature selection. The proposed approach and methods have been tested in realistic indoor environments using a vision sensor and the Scale Invariant Feature Transform local feature extraction. Through experiments, it is demonstrated that an information-theoretic modeling approach is highly efficient in attaining combined accuracy and computational efficiency performances for localization. It has also been proven that the approach is capable of modeling environments with a high degree of unstructuredness, perceptual aliasing, and dynamic variations (illumination conditions; scene dynamics). The merit of employing this modeling type is that environment features are evaluated quantitatively, while at the same time qualitative conclusions are generated about feature selection and performance in a robot localization task. In this way, the accuracy of localization can be adapted in accordance with the available resources. The experimental results also show that the hybrid topological-metric map provides sufficient information to localize a mobile robot on two scales, independent of the robot motion model. The codebook exhibits fast and accurate topological localization at significant compression ratios. The hierarchical localization framework demonstrates robustness and optimized space and time complexities. This, in turn, provides scalability to large environments application and real-time employment adequacies

MAnnheim DOCument Server

Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs

Author: Abate Marcus
Carlone Luca
Chang Yun
Gupta Arjun
Hughes Nathan
Rosinol Antoni
Shi Jingnan
Violette Andrew
Publication venue
Publication date: 24/01/2021
Field of study

Humans are able to form a complex mental model of the environment they move in. This mental model captures geometric and semantic aspects of the scene, describes the environment at multiple levels of abstractions (e.g., objects, rooms, buildings), includes static and dynamic entities and their relations (e.g., a person is in a room at a given time). In contrast, current robots' internal representations still provide a partial and fragmented understanding of the environment, either in the form of a sparse or dense set of geometric primitives (e.g., points, lines, planes, voxels) or as a collection of objects. This paper attempts to reduce the gap between robot and human perception by introducing a novel representation, a 3D Dynamic Scene Graph(DSG), that seamlessly captures metric and semantic aspects of a dynamic environment. A DSG is a layered graph where nodes represent spatial concepts at different levels of abstraction, and edges represent spatio-temporal relations among nodes. Our second contribution is Kimera, the first fully automatic method to build a DSG from visual-inertial data. Kimera includes state-of-the-art techniques for visual-inertial SLAM, metric-semantic 3D reconstruction, object localization, human pose and shape estimation, and scene parsing. Our third contribution is a comprehensive evaluation of Kimera in real-life datasets and photo-realistic simulations, including a newly released dataset, uHumans2, which simulates a collection of crowded indoor and outdoor scenes. Our evaluation shows that Kimera achieves state-of-the-art performance in visual-inertial SLAM, estimates an accurate 3D metric-semantic mesh model in real-time, and builds a DSG of a complex indoor environment with tens of objects and humans in minutes. Our final contribution shows how to use a DSG for real-time hierarchical semantic path-planning. The core modules in Kimera are open-source.Comment: 34 pages, 25 figures, 9 tables. arXiv admin note: text overlap with arXiv:2002.0628

arXiv.org e-Print Archive

DSpace@MIT

Minimal Learning Machine: Theoretical Results and Clustering-Based Reference Point Selection

Author: Alencar Alisson S. C.
Gomes João P. P.
Hämäläinen Joonas
Júnior Amauri H. Souza
Kärkkäinen Tommi
Mattos César L. C.
Publication venue
Publication date: 01/01/2020
Field of study

The Minimal Learning Machine (MLM) is a nonlinear supervised approach based on learning a linear mapping between distance matrices computed in the input and output data spaces, where distances are calculated using a subset of points called reference points. Its simple formulation has attracted several recent works on extensions and applications. In this paper, we aim to address some open questions related to the MLM. First, we detail theoretical aspects that assure the interpolation and universal approximation capabilities of the MLM, which were previously only empirically verified. Second, we identify the task of selecting reference points as having major importance for the MLM's generalization capability. Several clustering-based methods for reference point selection in regression scenarios are then proposed and analyzed. Based on an extensive empirical evaluation, we conclude that the evaluated methods are both scalable and useful. Specifically, for a small number of reference points, the clustering-based methods outperformed the standard random selection of the original MLM formulation.Comment: 29 pages, Accepted to JML

arXiv.org e-Print Archive

Jyväskylä University Digital Archive

Survey on Leveraging Uncertainty Estimation Towards Trustworthy Deep Neural Networks: The Case of Reject Option and Post-training Processing

Author: Abdar Moloud
Aickelin Uwe
Hasan Mehedi
Hossain Ibrahim
Khosravi Abbas
Lio' Pietro
Nahavandi Saeid
Rahman Ashikur
Publication venue
Publication date: 10/04/2023
Field of study

Although neural networks (especially deep neural networks) have achieved \textit{better-than-human} performance in many fields, their real-world deployment is still questionable due to the lack of awareness about the limitation in their knowledge. To incorporate such awareness in the machine learning model, prediction with reject option (also known as selective classification or classification with abstention) has been proposed in literature. In this paper, we present a systematic review of the prediction with the reject option in the context of various neural networks. To the best of our knowledge, this is the first study focusing on this aspect of neural networks. Moreover, we discuss different novel loss functions related to the reject option and post-training processing (if any) of network output for generating suitable measurements for knowledge awareness of the model. Finally, we address the application of the rejection option in reducing the prediction time for the real-time problems and present a comprehensive summary of the techniques related to the reject option in the context of extensive variety of neural networks. Our code is available on GitHub: \url{https://github.com/MehediHasanTutul/Reject_option

arXiv.org e-Print Archive

Extracting contextual information from egocentric videos

Author: Süveges Tamás
Publication venue
Publication date: 01/01/2021
Field of study

University of Dundee Online Publications