2,011 research outputs found

    Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs

    Full text link
    Humans are able to form a complex mental model of the environment they move in. This mental model captures geometric and semantic aspects of the scene, describes the environment at multiple levels of abstractions (e.g., objects, rooms, buildings), includes static and dynamic entities and their relations (e.g., a person is in a room at a given time). In contrast, current robots' internal representations still provide a partial and fragmented understanding of the environment, either in the form of a sparse or dense set of geometric primitives (e.g., points, lines, planes, voxels) or as a collection of objects. This paper attempts to reduce the gap between robot and human perception by introducing a novel representation, a 3D Dynamic Scene Graph(DSG), that seamlessly captures metric and semantic aspects of a dynamic environment. A DSG is a layered graph where nodes represent spatial concepts at different levels of abstraction, and edges represent spatio-temporal relations among nodes. Our second contribution is Kimera, the first fully automatic method to build a DSG from visual-inertial data. Kimera includes state-of-the-art techniques for visual-inertial SLAM, metric-semantic 3D reconstruction, object localization, human pose and shape estimation, and scene parsing. Our third contribution is a comprehensive evaluation of Kimera in real-life datasets and photo-realistic simulations, including a newly released dataset, uHumans2, which simulates a collection of crowded indoor and outdoor scenes. Our evaluation shows that Kimera achieves state-of-the-art performance in visual-inertial SLAM, estimates an accurate 3D metric-semantic mesh model in real-time, and builds a DSG of a complex indoor environment with tens of objects and humans in minutes. Our final contribution shows how to use a DSG for real-time hierarchical semantic path-planning. The core modules in Kimera are open-source.Comment: 34 pages, 25 figures, 9 tables. arXiv admin note: text overlap with arXiv:2002.0628

    RF-compass: Robot object manipulation using RFIDs

    Get PDF
    Modern robots have to interact with their environment, search for objects, and move them around. Yet, for a robot to pick up an object, it needs to identify the object's orientation and locate it to within centimeter-scale accuracy. Existing systems that provide such information are either very expensive (e.g., the VICON motion capture system valued at hundreds of thousands of dollars) and/or suffer from occlusion and narrow field of view (e.g., computer vision approaches). This paper presents RF-Compass, an RFID-based system for robot navigation and object manipulation. RFIDs are low-cost and work in non-line-of-sight scenarios, allowing them to address the limitations of existing solutions. Given an RFID-tagged object, RF-Compass accurately navigates a robot equipped with RFIDs toward the object. Further, it locates the center of the object to within a few centimeters and identifies its orientation so that the robot may pick it up. RF-Compass's key innovation is an iterative algorithm formulated as a convex optimization problem. The algorithm uses the RFID signals to partition the space and keeps refining the partitions based on the robot's consecutive moves.We have implemented RF-Compass using USRP software radios and evaluated it with commercial RFIDs and a KUKA youBot robot. For the task of furniture assembly, RF-Compass can locate furniture parts to a median of 1.28 cm, and identify their orientation to a median of 3.3 degrees.National Science Foundation (U.S.

    On the Challenges of Open World Recognitionunder Shifting Visual Domains

    Get PDF
    Robotic visual systems operating in the wild must act in unconstrained scenarios, under different environmental conditions while facing a variety of semantic concepts, including unknown ones. To this end, recent works tried to empower visual object recognition methods with the capability to i) detect unseen concepts and ii) extended their knowledge over time, as images of new semantic classes arrive. This setting, called Open World Recognition (OWR), has the goal to produce systems capable of breaking the semantic limits present in the initial training set. However, this training set imposes to the system not only its own semantic limits, but also environmental ones, due to its bias toward certain acquisition conditions that do not necessarily reflect the high variability of the real-world. This discrepancy between training and test distribution is called domain-shift. This work investigates whether OWR algorithms are effective under domain-shift, presenting the first benchmark setup for assessing fairly the performances of OWR algorithms, with and without domain-shift. We then use this benchmark to conduct analyses in various scenarios, showing how existing OWR algorithms indeed suffer a severe performance degradation when train and test distributions differ. Our analysis shows that this degradation is only slightly mitigated by coupling OWR with domain generalization techniques, indicating that the mere plug-and-play of existing algorithms is not enough to recognize new and unknown categories in unseen domains. Our results clearly point toward open issues and future research directions, that need to be investigated for building robot visual systems able to function reliably under these challenging yet very real conditions. Code available at https://github.com/DarioFontanel/OWR-VisualDomainsComment: RAL/ICRA 202

    A fruit recognition method for automatic harvesting

    Get PDF
    Automation of harvesting is always one of the hottest topics in greenhouse operation. But before this, a reliable method of identifying mature fruit clusters on plants is required. This thesis presents a method to detect and recognize mature tomato fruit clusters on a complex-structured tomato plant containing clutter and occlusion in a tomato greenhouse. A color stereo vision camera is applied as the vision sensor. The proposed method performs a 3D reconstruction with the data collected by the stereo camera to create a 3D environment for further processing. The Color Layer Growing (CLG) method is introduced to segment the mature fruits from the leaves, stalks, background and noise. Target fruit clusters can then be located by depth segmentation. The experimental data was collected from a tomato greenhouse and the method is justified by the experimental results

    Computational intelligence approaches to robotics, automation, and control [Volume guest editors]

    Get PDF
    No abstract available

    A brain-machine interface for assistive robotic control

    Get PDF
    Brain-machine interfaces (BMIs) are the only currently viable means of communication for many individuals suffering from locked-in syndrome (LIS) – profound paralysis that results in severely limited or total loss of voluntary motor control. By inferring user intent from task-modulated neurological signals and then translating those intentions into actions, BMIs can enable LIS patients increased autonomy. Significant effort has been devoted to developing BMIs over the last three decades, but only recently have the combined advances in hardware, software, and methodology provided a setting to realize the translation of this research from the lab into practical, real-world applications. Non-invasive methods, such as those based on the electroencephalogram (EEG), offer the only feasible solution for practical use at the moment, but suffer from limited communication rates and susceptibility to environmental noise. Maximization of the efficacy of each decoded intention, therefore, is critical. This thesis addresses the challenge of implementing a BMI intended for practical use with a focus on an autonomous assistive robot application. First an adaptive EEG- based BMI strategy is developed that relies upon code-modulated visual evoked potentials (c-VEPs) to infer user intent. As voluntary gaze control is typically not available to LIS patients, c-VEP decoding methods under both gaze-dependent and gaze- independent scenarios are explored. Adaptive decoding strategies in both offline and online task conditions are evaluated, and a novel approach to assess ongoing online BMI performance is introduced. Next, an adaptive neural network-based system for assistive robot control is presented that employs exploratory learning to achieve the coordinated motor planning needed to navigate toward, reach for, and grasp distant objects. Exploratory learning, or “learning by doing,” is an unsupervised method in which the robot is able to build an internal model for motor planning and coordination based on real-time sensory inputs received during exploration. Finally, a software platform intended for practical BMI application use is developed and evaluated. Using online c-VEP methods, users control a simple 2D cursor control game, a basic augmentative and alternative communication tool, and an assistive robot, both manually and via high-level goal-oriented commands
    • …
    corecore