3,083 research outputs found

    Cumulative object categorization in clutter

    Get PDF
    In this paper we present an approach based on scene- or part-graphs for geometrically categorizing touching and occluded objects. We use additive RGBD feature descriptors and hashing of graph configuration parameters for describing the spatial arrangement of constituent parts. The presented experiments quantify that this method outperforms our earlier part-voting and sliding window classification. We evaluated our approach on cluttered scenes, and by using a 3D dataset containing over 15000 Kinect scans of over 100 objects which were grouped into general geometric categories. Additionally, color, geometric, and combined features were compared for categorization tasks

    Detection and depth estimation for domestic waste in outdoor environments by sensors fusion

    Get PDF
    In this work, we estimate the depth in which domestic waste are located in space from a mobile robot in outdoor scenarios. As we are doing this calculus on a broad range of space (0.3 - 6.0 m), we use RGB-D camera and LiDAR fusion. With this aim and range, we compare several methods such as average, nearest, median and center point, applied to those which are inside a reduced or non-reduced Bounding Box (BB). These BB are obtained from segmentation and detection methods which are representative of these techniques like Yolact, SOLO, You Only Look Once (YOLO)v5, YOLOv6 and YOLOv7. Results shown that, applying a detection method with the average technique and a reduction of BB of 40%, returns the same output as segmenting the object and applying the average method. Indeed, the detection method is faster and lighter in comparison with the segmentation one. The committed median error in the conducted experiments was 0.0298 ± 0.0544 m.Research work was funded by the Valencian Regional Government and FEDER through the PROMETEO/2021/075 project and the Spanish Government through the Formación del Personal Investigador [Research Staff Formation (FPI)] under Grant PRE2019-088069. The computer facilities were provided through the IDIFEFER/2020/003 project

    Efficient 3D Segmentation, Registration and Mapping for Mobile Robots

    Get PDF
    Sometimes simple is better! For certain situations and tasks, simple but robust methods can achieve the same or better results in the same or less time than related sophisticated approaches. In the context of robots operating in real-world environments, key challenges are perceiving objects of interest and obstacles as well as building maps of the environment and localizing therein. The goal of this thesis is to carefully analyze such problem formulations, to deduce valid assumptions and simplifications, and to develop simple solutions that are both robust and fast. All approaches make use of sensors capturing 3D information, such as consumer RGBD cameras. Comparative evaluations show the performance of the developed approaches. For identifying objects and regions of interest in manipulation tasks, a real-time object segmentation pipeline is proposed. It exploits several common assumptions of manipulation tasks such as objects being on horizontal support surfaces (and well separated). It achieves real-time performance by using particularly efficient approximations in the individual processing steps, subsampling the input data where possible, and processing only relevant subsets of the data. The resulting pipeline segments 3D input data with up to 30Hz. In order to obtain complete segmentations of the 3D input data, a second pipeline is proposed that approximates the sampled surface, smooths the underlying data, and segments the smoothed surface into coherent regions belonging to the same geometric primitive. It uses different primitive models and can reliably segment input data into planes, cylinders and spheres. A thorough comparative evaluation shows state-of-the-art performance while computing such segmentations in near real-time. The second part of the thesis addresses the registration of 3D input data, i.e., consistently aligning input captured from different view poses. Several methods are presented for different types of input data. For the particular application of mapping with micro aerial vehicles where the 3D input data is particularly sparse, a pipeline is proposed that uses the same approximate surface reconstruction to exploit the measurement topology and a surface-to-surface registration algorithm that robustly aligns the data. Optimization of the resulting graph of determined view poses then yields globally consistent 3D maps. For sequences of RGBD data this pipeline is extended to include additional subsampling steps and an initial alignment of the data in local windows in the pose graph. In both cases, comparative evaluations show a robust and fast alignment of the input data

    Autonomous Sweet Pepper Harvesting for Protected Cropping Systems

    Full text link
    In this letter, we present a new robotic harvester (Harvey) that can autonomously harvest sweet pepper in protected cropping environments. Our approach combines effective vision algorithms with a novel end-effector design to enable successful harvesting of sweet peppers. Initial field trials in protected cropping environments, with two cultivar, demonstrate the efficacy of this approach achieving a 46% success rate for unmodified crop, and 58% for modified crop. Furthermore, for the more favourable cultivar we were also able to detach 90% of sweet peppers, indicating that improvements in the grasping success rate would result in greatly improved harvesting performance

    Indexing, browsing and searching of digital video

    Get PDF
    Video is a communications medium that normally brings together moving pictures with a synchronised audio track into a discrete piece or pieces of information. The size of a “piece ” of video can variously be referred to as a frame, a shot, a scene, a clip, a programme or an episode, and these are distinguished by their lengths and by their composition. We shall return to the definition of each of these in section 4 this chapter. In modern society, video is ver

    Unsupervised Segmentation of Action Segments in Egocentric Videos using Gaze

    Full text link
    Unsupervised segmentation of action segments in egocentric videos is a desirable feature in tasks such as activity recognition and content-based video retrieval. Reducing the search space into a finite set of action segments facilitates a faster and less noisy matching. However, there exist a substantial gap in machine understanding of natural temporal cuts during a continuous human activity. This work reports on a novel gaze-based approach for segmenting action segments in videos captured using an egocentric camera. Gaze is used to locate the region-of-interest inside a frame. By tracking two simple motion-based parameters inside successive regions-of-interest, we discover a finite set of temporal cuts. We present several results using combinations (of the two parameters) on a dataset, i.e., BRISGAZE-ACTIONS. The dataset contains egocentric videos depicting several daily-living activities. The quality of the temporal cuts is further improved by implementing two entropy measures.Comment: To appear in 2017 IEEE International Conference On Signal and Image Processing Application
    corecore