58 research outputs found
An Outlook into the Future of Egocentric Vision
What will the future be? We wonder! In this survey, we explore the gap
between current research in egocentric vision and the ever-anticipated future,
where wearable computing, with outward facing cameras and digital overlays, is
expected to be integrated in our every day lives. To understand this gap, the
article starts by envisaging the future through character-based stories,
showcasing through examples the limitations of current technology. We then
provide a mapping between this future and previously defined research tasks.
For each task, we survey its seminal works, current state-of-the-art
methodologies and available datasets, then reflect on shortcomings that limit
its applicability to future research. Note that this survey focuses on software
models for egocentric vision, independent of any specific hardware. The paper
concludes with recommendations for areas of immediate explorations so as to
unlock our path to the future always-on, personalised and life-enhancing
egocentric vision.Comment: We invite comments, suggestions and corrections here:
https://openreview.net/forum?id=V3974SUk1
Dense Visual Simultaneous Localisation and Mapping in Collaborative and Outdoor Scenarios
Dense visual simultaneous localisation and mapping (SLAM) systems can produce 3D
reconstructions that are digital facsimiles of the physical space they describe. Systems that
can produce dense maps with this level of fidelity in real time provide foundational spatial
reasoning capabilities for many downstream tasks in autonomous robotics. Over the past
15 years, mapping small scale, indoor environments, such as desks and buildings, with a
single slow moving, hand-held sensor has been one of the central focuses of dense visual
SLAM research.
However, most dense visual SLAM systems exhibit a number of limitations which
mean they cannot be directly applied in collaborative or outdoors settings. The contribution
of this thesis is to address these limitations with the development of new systems and
algorithms for collaborative dense mapping, efficient dense alternation and outdoors
operation with fast camera motion and wide field of view (FOV) cameras. We use
ElasticFusion, a state-of-the-art dense SLAM system, as our starting point where each of
these contributions is implemented as a novel extension to the system.
We first present a collaborative dense SLAM system that allows a number of
cameras starting with unknown initial relative positions to maintain local maps with the
original ElasticFusion algorithm. Visual place recognition across local maps results in
constraints that allow maps to be aligned into a common global reference frame, facilitating
collaborative mapping and tracking of multiple cameras within a shared map.
Within dense alternation based SLAM systems, the standard approach is to fuse
every frame into the dense model without considering whether the information contained
within the frame is already captured by the dense map and therefore redundant. As the
number of cameras or the scale of the map increases, this approach becomes inefficient. In
our second contribution, we address this inefficiency by introducing a novel information
theoretic approach to keyframe selection that allows the system to avoid processing
redundant information. We implement the procedure within ElasticFusion, demonstrating
a marked reduction in the number of frames required by the system to estimate an accurate,
denoised surface reconstruction.
Before dense SLAM techniques can be applied in outdoor scenarios we must
first address their reliance on active depth cameras, and their lack of suitability to fast
camera motion. In our third contribution we present an outdoor dense SLAM system. The system overcomes the need for an active sensor by employing neural network-based depth
inference to predict the geometry of the scene as it appears in each image. To address the
issue of camera tracking during fast motion we employ a hybrid architecture, combining
elements of both dense and sparse SLAM systems to perform camera tracking and to
achieve globally consistent dense mapping.
Automotive applications present a particularly important setting for dense visual
SLAM systems. Such applications are characterised by their use of wide FOV cameras and
are therefore not accurately modelled by the standard pinhole camera model. The fourth
contribution of this thesis is to extend the above hybrid sparse-dense monocular SLAM
system to cater for large FOV fisheye imagery. This is achieved by reformulating the
mapping pipeline in terms of the Kannala-Brandt fisheye camera model. To estimate depth,
we introduce a new version of the PackNet depth estimation neural network (Guizilini et
al., 2020) adapted for fisheye inputs.
To demonstrate the effectiveness of our contributions, we present experimental
results, computed by processing the synthetic ICL-NUIM dataset of Handa et al. (2014) as
well as the real-world TUM-RGBD dataset of Sturm et al. (2012). For outdoor SLAM we
show the results of our system processing the autonomous driving KITTI and KITTI-360
datasets of Geiger et al. (2012a) and Liao et al. (2021) respectively
Measuring knowledge sharing processes through social network analysis within construction organisations
The construction industry is a knowledge intensive and information dependent industry. Organisations risk losing valuable knowledge, when the employees leave them. Therefore, construction organisations need to nurture opportunities to disseminate knowledge through strengthening knowledge-sharing networks. This study aimed at evaluating the formal and informal knowledge sharing methods in social networks within Australian construction organisations and identifying how knowledge sharing could be improved. Data were collected from two estimating teams in two case studies. The collected data through semi-structured interviews were analysed using UCINET, a Social Network Analysis (SNA) tool, and SNA measures. The findings revealed that one case study consisted of influencers, while the other demonstrated an optimal knowledge sharing structure in both formal and informal knowledge sharing methods. Social networks could vary based on the organisation as well as the individuals’ behaviour. Identifying networks with specific issues and taking steps to strengthen networks will enable
to achieve optimum knowledge sharing processes. This research offers knowledge sharing good practices for construction organisations to optimise their knowledge sharing processes
The 45th Australasian Universities Building Education Association Conference: Global Challenges in a Disrupted World: Smart, Sustainable and Resilient Approaches in the Built Environment, Conference Proceedings, 23 - 25 November 2022, Western Sydney University, Kingswood Campus, Sydney, Australia
This is the proceedings of the 45th Australasian Universities Building Education Association (AUBEA) conference which will be hosted by Western Sydney University in November 2022. The conference is organised by the School of Engineering, Design, and Built Environment in collaboration with the Centre for Smart Modern Construction, Western Sydney University. This year’s conference theme is “Global Challenges in a Disrupted World: Smart, Sustainable and Resilient Approaches in the Built Environment”, and expects to publish over a hundred double-blind peer review papers under the proceedings
Advances in Intelligent Vehicle Control
This book is a printed edition of the Special Issue Advances in Intelligent Vehicle Control that was published in the journal Sensors. It presents a collection of eleven papers that covers a range of topics, such as the development of intelligent control algorithms for active safety systems, smart sensors, and intelligent and efficient driving. The contributions presented in these papers can serve as useful tools for researchers who are interested in new vehicle technology and in the improvement of vehicle control systems
Machine Learning for Multi-Robot Semantic Simultaneous Localization and Mapping
RÉSUMÉ
L’automatisation et la robotique prennent une place de plus en plus importante dans notre vie quotidienne, avec de nombreuses utilisations possibles. Les robots pourraient nous épargner des tâches dangereuses et pénibles, ou rendre des choses impossibles jusqu’à maintenant possibles. Pour que les robots s’intègrent en toute sécurité dans notre monde et dans de nouveaux environnements inconnus, il est clef qu’ils soient équipés d’une capacité de per-ception, et en particulier qu’ils puissent se localiser par rapport à leur entourage. Afin d’être réellement indépendants, les robots doivent pouvoir le faire en se basant uniquement sur leurs propres capteurs, les plus couramment utilisés étant les caméras. Une solution pour obtenir de telles estimations est d’utiliser un algorithme de cartographie et localisa-tion simultanée (SLAM), dans lequel le robot va simultanément construire une carte de son environnement et estimer son propre état. Le SLAM avec un seul robot a fait l’objet de nombreux travaux scientifiques, et est désormais considéré comme un domaine de recherche mature. Cependant, l’utilisation d’une équipe de robots peut o˙rir plusieurs avantages en termes de robustesse, d’eÿcacité et de performances pour de nombreuses tâches. Dans ce cas, des algorithmes de SLAM multi-robots sont nécessaires pour permettre à chaque robot de bénéficier de l’expérience de toute l’équipe. Le SLAM multi-robot peut s’appuyer sur des solutions SLAM classiques, mais nécessite des adaptations et fait face à des contraintes de calculs et de communications supplémentaires. Un défi particulier dans le SLAM multi-robots est la nécessité pour les robots de trouver des fermetures de boucles inter-robots: des liens entre les trajectoires de di˙érents robots qui peuvent être trouvés lorsqu’ils visitent le même endroit. Deux catégories d’approches sont possibles pour détecter les fermetures de boucles inter-robots. Dans les méthodes indirectes, les robots communiquent pour vérifier s’ils ont cartographié un espace commun, puis tentent de trouver des fermetures de boucles à partir des données recueillies par chacun des robots dans cet espace. Dans les méthodes directes, les robots s’appuient directement sur les données de leurs capteurs pour estimer les fermetures de boucles. Chaque approche a des avantages et des inconvénients, mais les méthodes indi-rectes ont été plus étudiées récemment. Ce mémoire s’appuie sur les avancées récentes de la vision par ordinateur pour présenter des contributions à chaque catégorie d’approches pour la détection de fermetures de boucles inter-robots. Une première contribution est présentée pour la détection de fermetures de boucles indirecte dans une équipe de robots entièrement en communication. Elle utilise des constellations, une représentation sémantique compacte de l’environnement basée sur les objets qui le compose.----------ABSTRACT
Automation and robotics are becoming more and more common in our daily lives, with many possible applications. Deploying robots in the world can extend what humans are capable of doing, and can save us from dangerous and strenuous tasks. For robots to be safely sent out in our real world, and in new unknown environments, one key capability they need is to perceive their environment, and particularly to localize themselves with respect to their surroundings. To truly be able to be deployed anywhere, robots should be able to do so relying only on their sensors, the most commonly used being cameras. One way to generate such an estimate is by using a simultaneous localization and mapping (SLAM) algorithm, in which the robot will concurrently build a map of its environment and estimate its state within it. Single-robot SLAM has been extensively researched and is now considered a mature field. However, using a team of robots can provide several benefits in terms of robustness, eÿciency, and performance for many tasks. In this case, multi-robot SLAM algorithms are required to allow each robot to benefit from the whole team’s experience. Multi-robot SLAM can build on top of single-robot SLAM solutions, but requires adaptations and faces computation and communication constraints. One particular challenge that arises in multi-robot SLAM is the need for robots to find inter-robot loop closures: relationships between trajectories of di˙erent robots that can be found when they visit the same place. Two categories of approaches are possible to detect inter-robot loop closures. In indirect methods, robots communicate to find if they have mapped the same area, and then attempt to find loop closures using data gathered by each robot in the place that was jointly visited. In direct methods, robots directly rely on data they gather from their sensors to estimate the loop closures. Each approach has its own benefits and challenges, with indirect methods being more popular in recent works. This thesis builds on recent computer vision advancements to present contributions to each category of approaches for inter-robot loop closure detection. A first approach is presented for indirect loop closure detection in a team of fully connected robots. It relies on constellations, a compact semantic representation of the environment based on objects that are in it. Descriptors and comparison methods for constellations are designed to robustly recognize places based on their constellation with minimal data exchange. These are used in a decentralized place recognition mechanism that is scalable as the size of the team increases. The proposed method performs comparably to state-of-the-art solutions in terms of performance and data exchanges require, while being more meaningful and interpretable
Camera Re-Localization with Data Augmentation by Image Rendering and Image-to-Image Translation
Die Selbstlokalisierung von Automobilen, Robotern oder unbemannten Luftfahrzeugen sowie die Selbstlokalisierung von Fußgängern ist und wird für eine Vielzahl an Anwendungen von hohem Interesse sein.
Eine Hauptaufgabe ist die autonome Navigation von solchen Fahrzeugen, wobei die Lokalisierung in der umgebenden Szene eine SchlĂĽsselkomponente darstellt.
Da Kameras etablierte fest verbaute Sensoren in Automobilen, Robotern und unbemannten Luftfahrzeugen sind, ist der Mehraufwand diese auch fĂĽr Aufgaben der Lokalisierung zu verwenden gering bis gar nicht vorhanden.
Das gleiche gilt für die Selbstlokalisierung von Fußgängern, bei der Smartphones als mobile Plattformen für Kameras zum Einsatz kommen.
Kamera-Relokalisierung, bei der die Pose einer Kamera bezüglich einer festen Umgebung bestimmt wird, ist ein wertvoller Prozess um eine Lösung oder Unterstützung der Lokalisierung für Fahrzeuge oder Fußgänger darzustellen.
Kameras sind zudem kostengĂĽnstige Sensoren welche im Alltag von Menschen und Maschinen etabliert sind.
Die Unterstützung von Kamera-Relokalisierung ist nicht auf Anwendungen bezüglich der Navigation begrenzt, sondern kann allgemein zur Unterstützung von Bildanalyse oder Bildverarbeitung wie Szenenrekonstruktion, Detektion, Klassifizierung oder ähnlichen Anwendungen genutzt werden.
FĂĽr diese Zwecke, befasst sich diese Arbeit mit der Verbesserung des Prozesses der Kamera-Relokalisierung.
Da Convolutional Neural Networks (CNNs) und hybride Lösungen um die Posen von Kameras zu bestimmen in den letzten Jahren mit etablierten manuell entworfenen Methoden konkurrieren, ist der Fokus in dieser Thesis auf erstere Methoden gesetzt.
Die Hauptbeiträge dieser Arbeit beinhalten den Entwurf eines CNN zur Schätzung von Kameraposen, wobei der Schwerpunkt auf einer flachen Architektur liegt, die den Anforderungen an mobile Plattformen genügt.
Dieses Netzwerk erreicht Genauigkeiten in gleichem Grad wie tiefere CNNs mit umfangreicheren Modelgrößen.
Desweiteren ist die Performanz von CNNs stark von der Quantität und Qualität der zugrundeliegenden Trainingsdaten, die für die Optimierung genutzt werden, abhängig.
Daher, befassen sich die weiteren Beiträge dieser Thesis mit dem Rendern von Bildern und Bild-zu-Bild Umwandlungen zur Erweiterung solcher Trainingsdaten. Das generelle Erweitern solcher Trainingsdaten wird Data Augmentation (DA) genannt.
FĂĽr das Rendern von Bildern zur nĂĽtzlichen Erweiterung von Trainingsdaten werden 3D Modelle genutzt.
Generative Adversarial Networks (GANs) dienen zur Bild-zu-Bild Umwandlung. Während das Rendern von Bildern die Quantität in einem Bilddatensatz erhöht, verbessert die Bild-zu-Bild Umwandlung die Qualität dieser gerenderten Daten.
Experimente werden sowohl mit erweiterten Datensätzen aus gerenderten Bildern als auch mit umgewandelten Bildern durchgeführt.
Beide Ansätze der DA tragen zur Verbesserung der Genauigkeit der Lokalisierung bei.
Somit werden in dieser Arbeit Kamera-Relokalisierung mit modernsten Methoden durch DA verbessert
CAPRICORN: Communication Aware Place Recognition using Interpretable Constellations of Objects in Robot Networks
Using multiple robots for exploring and mapping environments can provide
improved robustness and performance, but it can be difficult to implement. In
particular, limited communication bandwidth is a considerable constraint when a
robot needs to determine if it has visited a location that was previously
explored by another robot, as it requires for robots to share descriptions of
places they have visited. One way to compress this description is to use
constellations, groups of 3D points that correspond to the estimate of a set of
relative object positions. Constellations maintain the same pattern from
different viewpoints and can be robust to illumination changes or dynamic
elements. We present a method to extract from these constellations compact
spatial and semantic descriptors of the objects in a scene. We use this
representation in a 2-step decentralized loop closure verification: first, we
distribute the compact semantic descriptors to determine which other robots
might have seen scenes with similar objects; then we query matching robots with
the full constellation to validate the match using geometric information. The
proposed method requires less memory, is more interpretable than global image
descriptors, and could be useful for other tasks and interactions with the
environment. We validate our system's performance on a TUM RGB-D SLAM sequence
and show its benefits in terms of bandwidth requirements.Comment: 8 pages, 6 figures, 1 table. 2020 IEEE International Conference on
Robotics and Automation (ICRA
- …