53 research outputs found

    Recursive Inference for Prediction of Objects in Urban Environments

    Get PDF
    Abstract Future advancements in robotic navigation and mapping rest to a large extent on robust, efficient and more advanced semantic understanding of the surrounding environment. The existing semantic mapping approaches typically consider small number of semantic categories, require complex inference or large number of training examples to achieve desirable performance. In the proposed work we present an efficient approach for predicting locations of generic objects in urban environments by means of semantic segmentation of a video into object and nonobject categories. We exploit widely available exemplars of non-object categories (such as road, buildings, vegetation) and use geometric cues which are indicative of the presence of object boundaries to gather the evidence about objects regardless of their category. We formulate the object/non-object semantic segmentation problem in the Conditional Random Field framework, where the structure of the graph is induced by a minimum spanning tree computed over a 3D point cloud, yielding an efficient algorithm for an exact inference. The chosen 3D representation naturally lends itself for on-line recursive belief updates with a simple soft data association mechanism. We carry out extensive experiments on videos of urban environments acquired by a moving vehicle and show quantitatively and qualitatively the benefits of our proposal.

    Unsupervised learning for long-term autonomy

    Get PDF
    This thesis investigates methods to enable a robot to build and maintain an environment model in an automatic manner. Such capabilities are especially important in long-term autonomy, where robots operate for extended periods of time without human intervention. In such scenarios we can no longer assume that the environment and the models will remain static. Rather changes are expected and the robot needs to adapt to the new, unseen, circumstances automatically. The approach described in this thesis is based on clustering the robotā€™s sensing information. This provides a compact representation of the data which can be updated as more information becomes available. The work builds on affinity propagation (Frey and Dueck, 2007), a recent clustering method which obtains high quality clusters while only requiring similarities between pairs of points, and importantly, selecting the number of clusters automatically. This is essential for real autonomy as we typically do not know ā€œa prioriā€ how many clusters best represent the data. The contributions of this thesis a three fold. First a self-supervised method capable of learning a visual appearance model in long-term autonomy settings is presented. Secondly, affinity propagation is extended to handle multiple sensor modalities, often occurring in robotics, in a principle way. Third, a method for joint clustering and outlier selection is proposed which selects a user defined number of outlier while clustering the data. This is solved using an extension of affinity propagation as well as a Lagrangian duality approach which provides guarantees on the optimality of the solution

    Scene Parsing using Multiple Modalities

    Get PDF
    Scene parsing is the task of assigning a semantic class label to the elements of a scene. It has many applications in autonomous systems when we need to understand the visual data captured from our environment. Different sensing modalities, such as RGB cameras, multi-spectral cameras and Lidar sensors, can be beneļ¬cial when pursuing this goal. Scene analysis using multiple modalities aims at leveraging complementary information captured by multiple sensing modalities. When multiple modalities are used together, the strength of each modality can combat the weaknesses of other modalities. Therefore, working with multiple modalities enables us to use powerful tools for scene analysis. However, possible gains of using multiple modalities come with new challenges such as dealing with misalignments between different modalities. In this thesis, our aim is to take advantage of multiple modalities to improve outdoor scene parsing and address the associated challenges. We initially investigate the potential of multi-spectral imaging for outdoor scene analysis. Our approach is to combine the discriminative strength of the multi-spectral signature in each pixel and the corresponding nature of the surrounding texture. Many materials appearing similar if viewed by a common RGB camera, will show discriminating properties if viewed by a camera capturing a greater number of separated wavelengths. When using imagery data for scene parsing, a number of challenges stem from, e.g., color saturation, shadow and occlusion. To address such challenges, we focus on scene parsing using multiple modalities, panoramic RGB images and 3D Lidar data in particular, and propose a multi-view approach to select the best 2D view that describes each element in the 3D point cloud data. Keeping our focus on using multiple modalities, we then introduce a multi-modal graphical model to address the problems of scene parsing using 2D3D data exhibiting extensive many-to-one correspondences. Existing methods often impose a hard correspondence between the 2D and 3D data, where the 2D and 3D corresponding regions are forced to receive identical labels. This results in performance degradation due to misalignments, 3D-2D projection errors and occlusions. We address this issue by deļ¬ning a graph over the entire set of data that models soft correspondences between the two modalities. This graph encourages each region in a modality to leverage the information from its corresponding regions in the other modality to better estimate its class label. Finally, we introduce latent nodes to explicitly model inconsistencies between the modalities. The latent nodes allow us not only to leverage information from various domains in order to improve the labeling of the modalities, but also to cut the edges between inconsistent regions. To eliminate the need for hand tuning the parameters of our model, we propose to learn potential functions from training data. In addition, to demonstrate the beneļ¬ts of the proposed approaches on publicly available multi-modality datasets, we introduce a new multi-modal dataset of panoramic images and 3D point cloud data captured from outdoor scenes (NICTA/2D3D Dataset)

    Visual Perception For Robotic Spatial Understanding

    Get PDF
    Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct attention to most of it, until it becomes useful. Intelligent systems, especially mobile robots, have no such biologically engineered vision mechanism to take for granted. In contrast, we must devise algorithmic methods of taking raw sensor data and converting it to something useful very quickly. Vision is such a necessary part of building a robot or any intelligent system that is meant to interact with the world that it is somewhat surprising we don\u27t have off-the-shelf libraries for this capability. Why is this? The simple answer is that the problem is extremely difficult. There has been progress, but the current state of the art is impressive and depressing at the same time. We now have neural networks that can recognize many objects in 2D images, in some cases performing better than a human. Some algorithms can also provide bounding boxes or pixel-level masks to localize the object. We have visual odometry and mapping algorithms that can build reasonably detailed maps over long distances with the right hardware and conditions. On the other hand, we have robots with many sensors and no efficient way to compute their relative extrinsic poses for integrating the data in a single frame. The same networks that produce good object segmentations and labels in a controlled benchmark still miss obvious objects in the real world and have no mechanism for learning on the fly while the robot is exploring. Finally, while we can detect pose for very specific objects, we don\u27t yet have a mechanism that detects pose that generalizes well over categories or that can describe new objects efficiently. We contribute algorithms in four of the areas mentioned above. First, we describe a practical and effective system for calibrating many sensors on a robot with up to 3 different modalities. Second, we present our approach to visual odometry and mapping that exploits the unique capabilities of RGB-D sensors to efficiently build detailed representations of an environment. Third, we describe a 3-D over-segmentation technique that utilizes the models and ego-motion output in the previous step to generate temporally consistent segmentations with camera motion. Finally, we develop a synthesized dataset of chair objects with part labels and investigate the influence of parts on RGB-D based object pose recognition using a novel network architecture we call PartNet

    Parsing outdoor scenes from streamed 3D laser data using online clustering and incremental belief updates

    Get PDF
    In this paper, we address the problem of continually parsing a stream of 3D point cloud data acquired from a laser sensor mounted on a road vehicle. We leverage an online star clustering algorithm coupled with an incremental belief update in an evolving undirected graphical model. The fusion of these techniques allows the robot to parse streamed data and to continually improve its understanding of the world. The core competency produced is an ability to infer object classes from similarities based on appearance and shape features, and to concurrently combine that with a spatial smoothing algorithm incorporating geometric consistency. This formulation of feature-space star clustering modulating the potentials of a spatial graphical model is entirely novel. In our method, the two sources of information: feature similarity and geometrical consistency are fed continually into the system, improving the belief over the class distributions as new data arrives. The algorithm obviates the need for hand-labeled training data and makes no a-priori assumptions on the number or characteristics of object categories. Rather, they are learnt incrementally over time from streamed input data. In experiments performed on real 3D laser data from an outdoor scene, we show that our approach is capable of obtaining an ever-improving unsupervised scene categorization. Copyright Ā© 2012, Association for the Advancement of Artificial Intelligence. All rights reserved

    EG-ICE 2021 Workshop on Intelligent Computing in Engineering

    Get PDF
    The 28th EG-ICE International Workshop 2021 brings together international experts working at the interface between advanced computing and modern engineering challenges. Many engineering tasks require open-world resolutions to support multi-actor collaboration, coping with approximate models, providing effective engineer-computer interaction, search in multi-dimensional solution spaces, accommodating uncertainty, including specialist domain knowledge, performing sensor-data interpretation and dealing with incomplete knowledge. While results from computer science provide much initial support for resolution, adaptation is unavoidable and most importantly, feedback from addressing engineering challenges drives fundamental computer-science research. Competence and knowledge transfer goes both ways

    Deep Neural Networks for Visual Bridge Inspections and Defect Visualisation in Civil Engineering

    Get PDF
    • ā€¦
    corecore