1,258 research outputs found

    Interactive Perception for Cluttered Environments

    Get PDF
    Robotics research tends to focus upon either non-contact sensing or machine manipulation, but not both. This paper explores the benefits of combining the two by addressing the problem of extracting and classifying unknown objects within a cluttered environment, such as found in recycling and service robot applications. In the proposed approach, a pile of objects lies on a flat background, and the goal of the robot is to sift through the pile and classify each object so that it can be studied further. One object should be removed at a time with minimal disturbance to the other objects. We propose an algorithm, based upon graph-based segmentation and stereo matching, that automatically computes a desired grasp point that enables the objects to be removed one at a time. The algorithm then isolates each object to be classified by color, shape and flexibility. Experiments on a number of different objects demonstrate the ability of classifying each item through interaction and labeling them for further use and study

    Developmental Bayesian Optimization of Black-Box with Visual Similarity-Based Transfer Learning

    Full text link
    We present a developmental framework based on a long-term memory and reasoning mechanisms (Vision Similarity and Bayesian Optimisation). This architecture allows a robot to optimize autonomously hyper-parameters that need to be tuned from any action and/or vision module, treated as a black-box. The learning can take advantage of past experiences (stored in the episodic and procedural memories) in order to warm-start the exploration using a set of hyper-parameters previously optimized from objects similar to the new unknown one (stored in a semantic memory). As example, the system has been used to optimized 9 continuous hyper-parameters of a professional software (Kamido) both in simulation and with a real robot (industrial robotic arm Fanuc) with a total of 13 different objects. The robot is able to find a good object-specific optimization in 68 (simulation) or 40 (real) trials. In simulation, we demonstrate the benefit of the transfer learning based on visual similarity, as opposed to an amnesic learning (i.e. learning from scratch all the time). Moreover, with the real robot, we show that the method consistently outperforms the manual optimization from an expert with less than 2 hours of training time to achieve more than 88% of success

    Mechanical Search: Multi-Step Retrieval of a Target Object Occluded by Clutter

    Full text link
    When operating in unstructured environments such as warehouses, homes, and retail centers, robots are frequently required to interactively search for and retrieve specific objects from cluttered bins, shelves, or tables. Mechanical Search describes the class of tasks where the goal is to locate and extract a known target object. In this paper, we formalize Mechanical Search and study a version where distractor objects are heaped over the target object in a bin. The robot uses an RGBD perception system and control policies to iteratively select, parameterize, and perform one of 3 actions -- push, suction, grasp -- until the target object is extracted, or either a time limit is exceeded, or no high confidence push or grasp is available. We present a study of 5 algorithmic policies for mechanical search, with 15,000 simulated trials and 300 physical trials for heaps ranging from 10 to 20 objects. Results suggest that success can be achieved in this long-horizon task with algorithmic policies in over 95% of instances and that the number of actions required scales approximately linearly with the size of the heap. Code and supplementary material can be found at http://ai.stanford.edu/mech-search .Comment: To appear in IEEE International Conference on Robotics and Automation (ICRA), 2019. 9 pages with 4 figure

    Conjunctive Visual and Auditory Development via Real-Time Dialogue

    Get PDF
    Human developmental learning is capable of dealing with the dynamic visual world, speech-based dialogue, and their complex real-time association. However, the architecture that realizes this for robotic cognitive development has not been reported in the past. This paper takes up this challenge. The proposed architecture does not require a strict coupling between visual and auditory stimuli. Two major operations contribute to the “abstraction” process: multiscale temporal priming and high-dimensional numeric abstraction through internal responses with reduced variance. As a basic principle of developmental learning, the programmer does not know the nature of the world events at the time of programming and, thus, hand-designed task-specific representation is not possible. We successfully tested the architecture on the SAIL robot under an unprecedented challenging multimodal interaction mode: use real-time speech dialogue as a teaching source for simultaneous and incremental visual learning and language acquisition, while the robot is viewing a dynamic world that contains a rotating object to which the dialogue is referring

    Interactive singulation of objects from a pile

    Get PDF
    Abstract—Interaction with unstructured groups of objects allows a robot to discover and manipulate novel items in cluttered environments. We present a framework for interactive singulation of individual items from a pile. The proposed framework provides an overall approach for tasks involving operation on multiple objects, such as counting, arranging, or sorting items in a pile. A perception module combined with pushing actions accumulates evidence of singulated items over multiple pile interactions. A decision module scores the likelihood of a single-item pile to a multiple-item pile based on the magnitude of motion and matching determined from the perception module. Three variations of the singulation framework were evaluated on a physical robot for an arrangement task. The proposed interactive singulation method with adaptive pushing reduces the grasp errors on non-singulated piles compared to alternative methods without the perception and decision modules. This work contributes the general pile interaction framework, a specific method for integrating perception and action plans with grasp decisions, and an experimental evaluation of the cost trade-offs for different singulation methods. I

    Exploring Natural User Abstractions For Shared Perceptual Manipulator Task Modeling & Recovery

    Get PDF
    State-of-the-art domestic robot assistants are essentially autonomous mobile manipulators capable of exerting human-scale precision grasps. To maximize utility and economy, non-technical end-users would need to be nearly as efficient as trained roboticists in control and collaboration of manipulation task behaviors. However, it remains a significant challenge given that many WIMP-style tools require superficial proficiency in robotics, 3D graphics, and computer science for rapid task modeling and recovery. But research on robot-centric collaboration has garnered momentum in recent years; robots are now planning in partially observable environments that maintain geometries and semantic maps, presenting opportunities for non-experts to cooperatively control task behavior with autonomous-planning agents exploiting the knowledge. However, as autonomous systems are not immune to errors under perceptual difficulty, a human-in-the-loop is needed to bias autonomous-planning towards recovery conditions that resume the task and avoid similar errors. In this work, we explore interactive techniques allowing non-technical users to model task behaviors and perceive cooperatively with a service robot under robot-centric collaboration. We evaluate stylus and touch modalities that users can intuitively and effectively convey natural abstractions of high-level tasks, semantic revisions, and geometries about the world. Experiments are conducted with \u27pick-and-place\u27 tasks in an ideal \u27Blocks World\u27 environment using a Kinova JACO six degree-of-freedom manipulator. Possibilities for the architecture and interface are demonstrated with the following features; (1) Semantic \u27Object\u27 and \u27Location\u27 grounding that describe function and ambiguous geometries (2) Task specification with an unordered list of goal predicates, and (3) Guiding task recovery with implied scene geometries and trajectory via symmetry cues and configuration space abstraction. Empirical results from four user studies show our interface was much preferred than the control condition, demonstrating high learnability and ease-of-use that enable our non-technical participants to model complex tasks, provide effective recovery assistance, and teleoperative control

    Scene Segmentation and Object Classification for Place Recognition

    Get PDF
    This dissertation tries to solve the place recognition and loop closing problem in a way similar to human visual system. First, a novel image segmentation algorithm is developed. The image segmentation algorithm is based on a Perceptual Organization model, which allows the image segmentation algorithm to ‘perceive’ the special structural relations among the constituent parts of an unknown object and hence to group them together without object-specific knowledge. Then a new object recognition method is developed. Based on the fairly accurate segmentations generated by the image segmentation algorithm, an informative object description that includes not only the appearance (colors and textures), but also the parts layout and shape information is built. Then a novel feature selection algorithm is developed. The feature selection method can select a subset of features that best describes the characteristics of an object class. Classifiers trained with the selected features can classify objects with high accuracy. In next step, a subset of the salient objects in a scene is selected as landmark objects to label the place. The landmark objects are highly distinctive and widely visible. Each landmark object is represented by a list of SIFT descriptors extracted from the object surface. This object representation allows us to reliably recognize an object under certain viewpoint changes. To achieve efficient scene-matching, an indexing structure is developed. Both texture feature and color feature of objects are used as indexing features. The texture feature and the color feature are viewpoint-invariant and hence can be used to effectively find the candidate objects with similar surface characteristics to a query object. Experimental results show that the object-based place recognition and loop detection method can efficiently recognize a place in a large complex outdoor environment

    Active Vision for Scene Understanding

    Get PDF
    Visual perception is one of the most important sources of information for both humans and robots. A particular challenge is the acquisition and interpretation of complex unstructured scenes. This work contributes to active vision for humanoid robots. A semantic model of the scene is created, which is extended by successively changing the robot\u27s view in order to explore interaction possibilities of the scene
    corecore