151 research outputs found

    Recovering 6D Object Pose: A Review and Multi-modal Analysis

    Full text link
    A large number of studies analyse object detection and pose estimation at visual level in 2D, discussing the effects of challenges such as occlusion, clutter, texture, etc., on the performances of the methods, which work in the context of RGB modality. Interpreting the depth data, the study in this paper presents thorough multi-modal analyses. It discusses the above-mentioned challenges for full 6D object pose estimation in RGB-D images comparing the performances of several 6D detectors in order to answer the following questions: What is the current position of the computer vision community for maintaining "automation" in robotic manipulation? What next steps should the community take for improving "autonomy" in robotics while handling objects? Our findings include: (i) reasonably accurate results are obtained on textured-objects at varying viewpoints with cluttered backgrounds. (ii) Heavy existence of occlusion and clutter severely affects the detectors, and similar-looking distractors is the biggest challenge in recovering instances' 6D. (iii) Template-based methods and random forest-based learning algorithms underlie object detection and 6D pose estimation. Recent paradigm is to learn deep discriminative feature representations and to adopt CNNs taking RGB images as input. (iv) Depending on the availability of large-scale 6D annotated depth datasets, feature representations can be learnt on these datasets, and then the learnt representations can be customized for the 6D problem

    Smarter irrigation scheduling in the sugarcane farming system using the Internet of Things

    Get PDF
    Better irrigation practices can lead to improved yields through less water stress and reduced water usage to deliver economic benefits for farmers. More and more sugarcane growers are transitioning to automated irrigation in the Burdekin and other regions. Automated irrigation systems can save farmers a significant amount of time by remotely turning on and off pumps and valves. However, the system could be improved if it could be integrated with tools that factor in the weather, crop growing conditions, water deficit, and crop stress, to improve irrigation use efficiency. IrrigWeb is a decision-support tool that is turned to as a solution to this problem. IrrigWeb uses CANEGRO to help farmers decide when to irrigate and how much to apply. Farmers can then use this information to plan their irrigation management. However, managing irrigation is a considerable time investment for Burdekin farmers. A tool is needed to integrate the auto-irrigation system (e.g., WiSA) and IrrigWeb to provide a smarter irrigation solution. An uplink program (WiSA to IrrigWeb) has been successfully developed and implemented as part of a pilot study. It saves farmers a significant amount of time by uploading irrigation and rainfall data automatically instead of the farmer having to input them manually. This paper focuses on developing a smarter irrigation-scheduling tool that connects IrrigWeb to WiSA. A downlink program was developed to download, calculate and apply irrigation schedules automatically. In this process, sugarcane irrigators will spend less time manually setting up irrigation schedules as it will happen automatically. The simulation results demonstrated that the downlink program could improve the scheduling by incorporating practical limitations, such as pumping capacity or pumping time constraints, that are found on the farm

    Linguistic Structure Guided Context Modeling for Referring Image Segmentation

    Full text link
    Referring image segmentation aims to predict the foreground mask of the object referred by a natural language sentence. Multimodal context of the sentence is crucial to distinguish the referent from the background. Existing methods either insufficiently or redundantly model the multimodal context. To tackle this problem, we propose a "gather-propagate-distribute" scheme to model multimodal context by cross-modal interaction and implement this scheme as a novel Linguistic Structure guided Context Modeling (LSCM) module. Our LSCM module builds a Dependency Parsing Tree suppressed Word Graph (DPT-WG) which guides all the words to include valid multimodal context of the sentence while excluding disturbing ones through three steps over the multimodal feature, i.e., gathering, constrained propagation and distributing. Extensive experiments on four benchmarks demonstrate that our method outperforms all the previous state-of-the-arts.Comment: Accepted by ECCV 2020. Code is available at https://github.com/spyflying/LSCM-Refse

    Deep Burst Denoising

    Full text link
    Noise is an inherent issue of low-light image capture, one which is exacerbated on mobile devices due to their narrow apertures and small sensors. One strategy for mitigating noise in a low-light situation is to increase the shutter time of the camera, thus allowing each photosite to integrate more light and decrease noise variance. However, there are two downsides of long exposures: (a) bright regions can exceed the sensor range, and (b) camera and scene motion will result in blurred images. Another way of gathering more light is to capture multiple short (thus noisy) frames in a "burst" and intelligently integrate the content, thus avoiding the above downsides. In this paper, we use the burst-capture strategy and implement the intelligent integration via a recurrent fully convolutional deep neural net (CNN). We build our novel, multiframe architecture to be a simple addition to any single frame denoising model, and design to handle an arbitrary number of noisy input frames. We show that it achieves state of the art denoising results on our burst dataset, improving on the best published multi-frame techniques, such as VBM4D and FlexISP. Finally, we explore other applications of image enhancement by integrating content from multiple frames and demonstrate that our DNN architecture generalizes well to image super-resolution

    Leaf segmentation in plant phenotyping: a collation study

    Get PDF
    Image-based plant phenotyping is a growing application area of computer vision in agriculture. A key task is the segmentation of all individual leaves in images. Here we focus on the most common rosette model plants, Arabidopsis and young tobacco. Although leaves do share appearance and shape characteristics, the presence of occlusions and variability in leaf shape and pose, as well as imaging conditions, render this problem challenging. The aim of this paper is to compare several leaf segmentation solutions on a unique and first-of-its-kind dataset containing images from typical phenotyping experiments. In particular, we report and discuss methods and findings of a collection of submissions for the first Leaf Segmentation Challenge of the Computer Vision Problems in Plant Phenotyping workshop in 2014. Four methods are presented: three segment leaves by processing the distance transform in an unsupervised fashion, and the other via optimal template selection and Chamfer matching. Overall, we find that although separating plant from background can be accomplished with satisfactory accuracy (>>90 % Dice score), individual leaf segmentation and counting remain challenging when leaves overlap. Additionally, accuracy is lower for younger leaves. We find also that variability in datasets does affect outcomes. Our findings motivate further investigations and development of specialized algorithms for this particular application, and that challenges of this form are ideally suited for advancing the state of the art. Data are publicly available (online at http://​www.​plant-phenotyping.​org/​datasets) to support future challenges beyond segmentation within this application domain

    Using Multi-view Recognition and Meta-data Annotation to Guide a Robot's Attention

    Get PDF
    In the transition from industrial to service robotics, robots will have to deal with increasingly unpredictable and variable environments. We present a system that is able to recognize objects of a certain class in an image and to identify their parts for potential interactions. The method can recognize objects from arbitrary viewpoints and generalizes to instances that have never been observed during training, even if they are partially occluded and appear against cluttered backgrounds. Our approach builds on the implicit shape model of Leibe et al. We extend it to couple recognition to the provision of meta-dat
    • …