5,395 research outputs found
Automatic Model Based Dataset Generation for Fast and Accurate Crop and Weeds Detection
Selective weeding is one of the key challenges in the field of agriculture
robotics. To accomplish this task, a farm robot should be able to accurately
detect plants and to distinguish them between crop and weeds. Most of the
promising state-of-the-art approaches make use of appearance-based models
trained on large annotated datasets. Unfortunately, creating large agricultural
datasets with pixel-level annotations is an extremely time consuming task,
actually penalizing the usage of data-driven techniques. In this paper, we face
this problem by proposing a novel and effective approach that aims to
dramatically minimize the human intervention needed to train the detection and
classification algorithms. The idea is to procedurally generate large synthetic
training datasets randomizing the key features of the target environment (i.e.,
crop and weed species, type of soil, light conditions). More specifically, by
tuning these model parameters, and exploiting a few real-world textures, it is
possible to render a large amount of realistic views of an artificial
agricultural scenario with no effort. The generated data can be directly used
to train the model or to supplement real-world images. We validate the proposed
methodology by using as testbed a modern deep learning based image segmentation
architecture. We compare the classification results obtained using both real
and synthetic images as training data. The reported results confirm the
effectiveness and the potentiality of our approach.Comment: To appear in IEEE/RSJ IROS 201
Semantically Guided Depth Upsampling
We present a novel method for accurate and efficient up- sampling of sparse
depth data, guided by high-resolution imagery. Our approach goes beyond the use
of intensity cues only and additionally exploits object boundary cues through
structured edge detection and semantic scene labeling for guidance. Both cues
are combined within a geodesic distance measure that allows for
boundary-preserving depth in- terpolation while utilizing local context. We
model the observed scene structure by locally planar elements and formulate the
upsampling task as a global energy minimization problem. Our method determines
glob- ally consistent solutions and preserves fine details and sharp depth
bound- aries. In our experiments on several public datasets at different levels
of application, we demonstrate superior performance of our approach over the
state-of-the-art, even for very sparse measurements.Comment: German Conference on Pattern Recognition 2016 (Oral
3D Segmentation Method for Natural Environments based on a Geometric-Featured Voxel Map
This work proposes a new segmentation algorithm for three-dimensional dense point clouds and has been
specially designed for natural environments where the ground is unstructured and may include big slopes, non-flat areas and
isolated areas. This technique is based on a Geometric-Featured Voxel map (GFV) where the scene is discretized in
constant size cubes or voxels which are classified in flat surface, linear or tubular structures and scattered or undefined
shapes, usually corresponding to vegetation. Since this is not a point-based technique the computational cost is significantly
reduced, hence it may be compatible with Real-Time applications. The ground is extracted in order to obtain more accurate
results in the posterior segmentation process. The scene is split into objects and a second segmentation in regions inside
each object is performed based on the voxel’s geometric class. The work here evaluates the proposed algorithm in various
versions and several voxel sizes and compares the results with other methods from the literature. For the segmentation
evaluation the algorithms are tested on several differently challenging hand-labeled data sets using two metrics, one of which
is novel.Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
Motion Cooperation: Smooth Piece-Wise Rigid Scene Flow from RGB-D Images
We propose a novel joint registration and segmentation approach to estimate scene flow from RGB-D images. Instead of assuming the scene to be composed of a number of independent rigidly-moving parts, we use non-binary labels to capture non-rigid deformations at transitions between
the rigid parts of the scene. Thus, the velocity of any point can be computed as a linear combination (interpolation) of the estimated rigid motions, which provides better results
than traditional sharp piecewise segmentations. Within a variational framework, the smooth segments of the scene and their corresponding rigid velocities are alternately refined
until convergence. A K-means-based segmentation is employed as an initialization, and the number of regions is subsequently adapted during the optimization process to capture any arbitrary number of independently moving objects.
We evaluate our approach with both synthetic and
real RGB-D images that contain varied and large motions. The experiments show that our method estimates the scene flow more accurately than the most recent works in the field, and at the same time provides a meaningful segmentation of the scene based on 3D motion.Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech. Spanish Government under the grant programs FPI-MICINN 2012 and DPI2014- 55826-R (co-founded by the European Regional Development Fund), as well as by the EU ERC grant Convex Vision (grant agreement no. 240168)
Im2Pano3D: Extrapolating 360 Structure and Semantics Beyond the Field of View
We present Im2Pano3D, a convolutional neural network that generates a dense
prediction of 3D structure and a probability distribution of semantic labels
for a full 360 panoramic view of an indoor scene when given only a partial
observation (<= 50%) in the form of an RGB-D image. To make this possible,
Im2Pano3D leverages strong contextual priors learned from large-scale synthetic
and real-world indoor scenes. To ease the prediction of 3D structure, we
propose to parameterize 3D surfaces with their plane equations and train the
model to predict these parameters directly. To provide meaningful training
supervision, we use multiple loss functions that consider both pixel level
accuracy and global context consistency. Experiments demon- strate that
Im2Pano3D is able to predict the semantics and 3D structure of the unobserved
scene with more than 56% pixel accuracy and less than 0.52m average distance
error, which is significantly better than alternative approaches.Comment: Video summary: https://youtu.be/Au3GmktK-S
Volume-based Semantic Labeling with Signed Distance Functions
Research works on the two topics of Semantic Segmentation and SLAM
(Simultaneous Localization and Mapping) have been following separate tracks.
Here, we link them quite tightly by delineating a category label fusion
technique that allows for embedding semantic information into the dense map
created by a volume-based SLAM algorithm such as KinectFusion. Accordingly, our
approach is the first to provide a semantically labeled dense reconstruction of
the environment from a stream of RGB-D images. We validate our proposal using a
publicly available semantically annotated RGB-D dataset and a) employing ground
truth labels, b) corrupting such annotations with synthetic noise, c) deploying
a state of the art semantic segmentation algorithm based on Convolutional
Neural Networks.Comment: Submitted to PSIVT201
- …