20 research outputs found

    A Novel Robust Scene Change Detection Algorithm for Autonomous Robots Using Mixtures of Gaussians

    Get PDF
    Interest in change detection techniques has considerably increased during recent years in the field of autonomous robotics. This is partly because changes in a robot's working environment are useful for several robotic skills (e.g., spatial cognition, modelling or navigation) and applications (e.g., surveillance or guidance robots). Changes are usually detected by comparing current data provided by the robot's sensors with a previously known map or model of the environment. When the data consists of a large point cloud, dealing with it is a computationally expensive task, mainly due to the amount of points and the redundancy. Using Gaussian Mixture Models (GMM) instead of raw point clouds leads to a more compact feature space that can be used to efficiently process the input data. This allows us to successfully segment the set of 3D points acquired by the sensor and reduce the computational load of the change detection algorithm. However, the segmentation of the environment as a Mixture of Gaussians has some problems that need to be properly addressed. In this paper, a novel change detection algorithm is described in order to improve the robustness and computational cost of previous approaches. The proposal is based on the classic Expectation Maximization (EM) algorithm, for which different selection criteria are evaluated. As demonstrated in the experimental results section, the proposed change detection algorithm achieves the detection of changes in the robot's working environment faster and more accurately than similar approaches

    Detection-by-Localization: Maintenance-Free Change Object Detector

    Full text link
    Recent researches demonstrate that self-localization performance is a very useful measure of likelihood-of-change (LoC) for change detection. In this paper, this "detection-by-localization" scheme is studied in a novel generalized task of object-level change detection. In our framework, a given query image is segmented into object-level subimages (termed "scene parts"), which are then converted to subimage-level pixel-wise LoC maps via the detection-by-localization scheme. Our approach models a self-localization system as a ranking function, outputting a ranked list of reference images, without requiring relevance score. Thanks to this new setting, we can generalize our approach to a broad class of self-localization systems. Our ranking based self-localization model allows to fuse self-localization results from different modalities via an unsupervised rank fusion derived from a field of multi-modal information retrieval (MMR).Comment: 7 pages, 3 figures, Technical repor

    Real-Time Structure and Object Aware Semantic SLAM

    Get PDF
    Simultaneous Localization And Mapping (SLAM) is one of the fundamental problems in mobile robotics and addresses the reconstruction of a previously unseen environment while simultaneously localising a mobile robot with respect to it. For visual-SLAM, the simplest representation of the map is a collection of 3D points that is sparse and efficient to compute and update, particularly for large-scale environments, however it lacks semantic information and is not useful for high-level tasks such as robotic grasping and manipulation. Although methods to compute denser representations have been proposed, these reconstructions remain equivalent to a collection of points and therefore carry no additional semantic information or relationship. Man-made environments contain many structures and objects that carry high-level semantics and can potentially act as landmarks of a SLAM map, while encapsulating semantic information as opposed to a set of points. For instance, planes are good representations for feature deprived regions, where they provide information complimentary to points and can also model dominant planar layouts of the environment with very few parameters. Furthermore, a generic representation for previously unseen objects can be used as a general landmark that carries semantics in the reconstructed map. Integrating visual semantic understanding and geometric reconstruction has been studied before, however due to various reasons, including high- level geometric entities in the SLAM framework has been restricted to a slow, offline structure-from-motion context, or high-level entities merely act as regulators for points in the map instead of independent landmarks. One of those critical reasons is the lack of proper mathematical representation for high-level landmarks and the other main reasons are the challenge of detection and tracking of these landmarks and formulating an observation model – a mapping between corresponding image observable quantities and estimated parameters of the representations. In this work, we address these challenges to achieve an online real-time SLAM framework with scalable maps consisting of both sparse points and high-level structural and semantic landmarks such as planes and objects. We explicitly target real-time performance and keep that as a beacon which influences critically the representation choice and all the modules of our SLAM system. In the context of factor graphs, we propose novel representations for structural entities as planes and general unseen and not-predefined objects as bounded dual quadrics that decompose to permit clean, fast and effective real-time implementation that is amenable to the nonlinear leastsquare formulation and respects the sparsity pattern of the SLAM problem. In this representation we are not concerned with high-fidelity reconstruction of individual objects, but rather to represent the general layout and orientation of objects in the environment. Also the minimal representations of planes is explored leading to a representation that can be constructed and updated online in a least-squares framework. Another challenge that we address in this work is to marry high-level landmark detections based on deep-learned frameworks, with geometric SLAM systems. Due to the recent success of CNN-based object detections and also depth and surface normal estimations from single image, it is feasible now to detect and estimate these semantic landmarks from single RGB images, therefore leading us seamlessly from RGB-D SLAM system to pure monocular SLAM thanks to the real-time predictions of the trained CNN and appropriate representations. Furthermore, to benefit from deep-learned priors, we incorporate high-fidelity single-image reconstructions and hallucinations of objects on top of the coarse quadrics to enrich the sparse map semantically, while constraining the shape of the coarse quadrics even more. Pertinent to our beacon, proposed landmark representations in the map also provide the potential for imposing additional constraints and priors that carry crucial semantic information about the scene, without incurring great extra computational cost. In this work, we have explored and proposed constraints such as priors on the extent and shape of the objects, point-plane regularizer, plane-plane (Manhattan assumption), and plane-object (supporting affordance) constraints. We evaluate our proposed SLAM system extensively using different input sensor modalities from RGB-D to monocular in almost all publicly available benchmarks both indoors and outdoors to show its applicability as a general-purpose SLAM solution. The extensive experiments show the efficacy of our SLAM through different comparisons and ablation studies including high-level structures and objects with imposed constraints among them in various scenarios. In particular, the estimated camera trajectories have been improved significantly in varied sequences of visual SLAM datasets and also our own captured sequences with UR5 robotic arm equipped with a depth camera. In addition to more accurate camera trajectories, our system yields enriched sparse maps with semantically meaningful planar structures and generic objects in the scene along with their mutual relationshipsThesis (Ph.D.) -- University of Adelaide, School of Computer Science, 201

    Behaviour-driven motion synthesis

    Get PDF
    Heightened demand for alternatives to human exposure to strenuous and repetitive labour, as well as to hazardous environments, has led to an increased interest in real-world deployment of robotic agents. Targeted applications require robots to be adept at synthesising complex motions rapidly across a wide range of tasks and environments. To this end, this thesis proposes leveraging abstractions of the problem at hand to ease and speed up the solving. We formalise abstractions to hint relevant robotic behaviour to a family of planning problems, and integrate them tightly into the motion synthesis process to make real-world deployment in complex environments practical. We investigate three principal challenges of this proposition. Firstly, we argue that behavioural samples in form of trajectories are of particular interest to guide robotic motion synthesis. We formalise a framework with behavioural semantic annotation that enables the storage and bootstrap of sets of problem-relevant trajectories. Secondly, in the core of this thesis, we study strategies to exploit behavioural samples in task instantiations that differ significantly from those stored in the framework. We present two novel strategies to efficiently leverage offline-computed problem behavioural samples: (i) online modulation based on geometry-tuned potential fields, and (ii) experience-guided exploration based on trajectory segmentation and malleability. Thirdly, we demonstrate that behavioural hints can be extracted on-the-fly to tackle highlyconstrained, ever-changing complex problems, from which there is no prior knowledge. We propose a multi-layer planner that first solves a simplified version of the problem at hand, to then inform the search for a solution in the constrained space. Our contributions on efficient motion synthesis via behaviour guidance augment the robots’ capabilities to deal with more complex planning problems, and do so more effectively than related approaches in the literature by computing better quality paths in lower response time. We demonstrate our contributions, in both laboratory experiments and field trials, on a spectrum of planning problems and robotic platforms ranging from high-dimensional humanoids and robotic arms with a focus on autonomous manipulation in resembling environments, to high-dimensional kinematic motion planning with a focus on autonomous safe navigation in unknown environments. While this thesis was motivated by challenges on motion synthesis, we have explored the applicability of our findings on disparate robotic fields, such as grasp and task planning. We have made some of our contributions open-source hoping they will be of use to the robotics community at large.The CDT in Robotics and Autonomous Systems at Heriot-Watt University and The University of EdinburghThe ORCA Hub EPSRC project (EP/R026173/1)The Scottish Informatics and Computer Science Alliance (SICSA

    Combining shape and color. A bottom-up approach to evaluate object similarities

    Get PDF
    The objective of the present work is to develop a bottom-up approach to estimate the similarity between two unknown objects. Given a set of digital images, we want to identify the main objects and to determine whether they are similar or not. In the last decades many object recognition and classification strategies, driven by higher-level activities, have been successfully developed. The peculiarity of this work, instead, is the attempt to work without any training phase nor a priori knowledge about the objects or their context. Indeed, if we suppose to be in an unstructured and completely unknown environment, usually we have to deal with novel objects never seen before; under these hypothesis, it would be very useful to define some kind of similarity among the instances under analysis (even if we do not know which category they belong to). To obtain this result, we start observing that human beings use a lot of information and analyze very different aspects to achieve object recognition: shape, position, color and so on. Hence we try to reproduce part of this process, combining different methodologies (each working on a specific characteristic) to obtain a more meaningful idea of similarity. Mainly inspired by the human conception of representation, we identify two main characteristics and we called them the implicit and explicit models. The term "explicit" is used to account for the main traits of what, in the human representation, connotes a principal source of information regarding a category, a sort of a visual synecdoche (corresponding to the shape); the term "implicit", on the other hand, accounts for the object rendered by shadows and lights, colors and volumetric impression, a sort of a visual metonymy (corresponding to the chromatic characteristics). During the work, we had to face several problems and we tried to define specific solutions. In particular, our contributions are about: - defining a bottom-up approach for image segmentation (which does not rely on any a priori knowledge); - combining different features to evaluate objects similarity (particularly focusiing on shape and color); - defining a generic distance (similarity) measure between objects (without any attempt to identify the possible category they belong to); - analyzing the consequences of using the number of modes as an estimation of the number of mixture’s components (in the Expectation-Maximization algorithm)
    corecore