38,833 research outputs found
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
Few prior works study deep learning on point sets. PointNet by Qi et al. is a
pioneer in this direction. However, by design PointNet does not capture local
structures induced by the metric space points live in, limiting its ability to
recognize fine-grained patterns and generalizability to complex scenes. In this
work, we introduce a hierarchical neural network that applies PointNet
recursively on a nested partitioning of the input point set. By exploiting
metric space distances, our network is able to learn local features with
increasing contextual scales. With further observation that point sets are
usually sampled with varying densities, which results in greatly decreased
performance for networks trained on uniform densities, we propose novel set
learning layers to adaptively combine features from multiple scales.
Experiments show that our network called PointNet++ is able to learn deep point
set features efficiently and robustly. In particular, results significantly
better than state-of-the-art have been obtained on challenging benchmarks of 3D
point clouds
Visualising Basins of Attraction for the Cross-Entropy and the Squared Error Neural Network Loss Functions
Quantification of the stationary points and the associated basins of
attraction of neural network loss surfaces is an important step towards a
better understanding of neural network loss surfaces at large. This work
proposes a novel method to visualise basins of attraction together with the
associated stationary points via gradient-based random sampling. The proposed
technique is used to perform an empirical study of the loss surfaces generated
by two different error metrics: quadratic loss and entropic loss. The empirical
observations confirm the theoretical hypothesis regarding the nature of neural
network attraction basins. Entropic loss is shown to exhibit stronger gradients
and fewer stationary points than quadratic loss, indicating that entropic loss
has a more searchable landscape. Quadratic loss is shown to be more resilient
to overfitting than entropic loss. Both losses are shown to exhibit local
minima, but the number of local minima is shown to decrease with an increase in
dimensionality. Thus, the proposed visualisation technique successfully
captures the local minima properties exhibited by the neural network loss
surfaces, and can be used for the purpose of fitness landscape analysis of
neural networks.Comment: Preprint submitted to the Neural Networks journa
Grasp Pose Detection in Point Clouds
Recently, a number of grasp detection methods have been proposed that can be
used to localize robotic grasp configurations directly from sensor data without
estimating object pose. The underlying idea is to treat grasp perception
analogously to object detection in computer vision. These methods take as input
a noisy and partially occluded RGBD image or point cloud and produce as output
pose estimates of viable grasps, without assuming a known CAD model of the
object. Although these methods generalize grasp knowledge to new objects well,
they have not yet been demonstrated to be reliable enough for wide use. Many
grasp detection methods achieve grasp success rates (grasp successes as a
fraction of the total number of grasp attempts) between 75% and 95% for novel
objects presented in isolation or in light clutter. Not only are these success
rates too low for practical grasping applications, but the light clutter
scenarios that are evaluated often do not reflect the realities of real world
grasping. This paper proposes a number of innovations that together result in a
significant improvement in grasp detection performance. The specific
improvement in performance due to each of our contributions is quantitatively
measured either in simulation or on robotic hardware. Ultimately, we report a
series of robotic experiments that average a 93% end-to-end grasp success rate
for novel objects presented in dense clutter.Comment: arXiv admin note: text overlap with arXiv:1603.0156
TextureNet: Consistent Local Parametrizations for Learning from High-Resolution Signals on Meshes
We introduce, TextureNet, a neural network architecture designed to extract
features from high-resolution signals associated with 3D surface meshes (e.g.,
color texture maps). The key idea is to utilize a 4-rotational symmetric
(4-RoSy) field to define a domain for convolution on a surface. Though 4-RoSy
fields have several properties favorable for convolution on surfaces (low
distortion, few singularities, consistent parameterization, etc.), orientations
are ambiguous up to 4-fold rotation at any sample point. So, we introduce a new
convolutional operator invariant to the 4-RoSy ambiguity and use it in a
network to extract features from high-resolution signals on geodesic
neighborhoods of a surface. In comparison to alternatives, such as PointNet
based methods which lack a notion of orientation, the coherent structure given
by these neighborhoods results in significantly stronger features. As an
example application, we demonstrate the benefits of our architecture for 3D
semantic segmentation of textured 3D meshes. The results show that our method
outperforms all existing methods on the basis of mean IoU by a significant
margin in both geometry-only (6.4%) and RGB+Geometry (6.9-8.2%) settings
Land use/land cover mapping (1:25000) of Taiwan, Republic of China by automated multispectral interpretation of LANDSAT imagery
Three methods were tested for collection of the training sets needed to establish the spectral signatures of the land uses/land covers sought due to the difficulties of retrospective collection of representative ground control data. Computer preprocessing techniques applied to the digital images to improve the final classification results were geometric corrections, spectral band or image ratioing and statistical cleaning of the representative training sets. A minimal level of statistical verification was made based upon the comparisons between the airphoto estimates and the classification results. The verifications provided a further support to the selection of MSS band 5 and 7. It also indicated that the maximum likelihood ratioing technique can achieve more agreeable classification results with the airphoto estimates than the stepwise discriminant analysis
Microbial Similarity between Students in a Common Dormitory Environment Reveals the Forensic Potential of Individual Microbial Signatures.
The microbiota of the built environment is an amalgamation of both human and environmental sources. While human sources have been examined within single-family households or in public environments, it is unclear what effect a large number of cohabitating people have on the microbial communities of their shared environment. We sampled the public and private spaces of a college dormitory, disentangling individual microbial signatures and their impact on the microbiota of common spaces. We compared multiple methods for marker gene sequence clustering and found that minimum entropy decomposition (MED) was best able to distinguish between the microbial signatures of different individuals and was able to uncover more discriminative taxa across all taxonomic groups. Further, weighted UniFrac- and random forest-based graph analyses uncovered two distinct spheres of hand- or shoe-associated samples. Using graph-based clustering, we identified spheres of interaction and found that connection between these clusters was enriched for hands, implicating them as a primary means of transmission. In contrast, shoe-associated samples were found to be freely interacting, with individual shoes more connected to each other than to the floors they interact with. Individual interactions were highly dynamic, with groups of samples originating from individuals clustering freely with samples from other individuals, while all floor and shoe samples consistently clustered together.IMPORTANCE Humans leave behind a microbial trail, regardless of intention. This may allow for the identification of individuals based on the "microbial signatures" they shed in built environments. In a shared living environment, these trails intersect, and through interaction with common surfaces may become homogenized, potentially confounding our ability to link individuals to their associated microbiota. We sought to understand the factors that influence the mixing of individual signatures and how best to process sequencing data to best tease apart these signatures
A Dataset for Developing and Benchmarking Active Vision
We present a new public dataset with a focus on simulating robotic vision
tasks in everyday indoor environments using real imagery. The dataset includes
20,000+ RGB-D images and 50,000+ 2D bounding boxes of object instances densely
captured in 9 unique scenes. We train a fast object category detector for
instance detection on our data. Using the dataset we show that, although
increasingly accurate and fast, the state of the art for object detection is
still severely impacted by object scale, occlusion, and viewing direction all
of which matter for robotics applications. We next validate the dataset for
simulating active vision, and use the dataset to develop and evaluate a
deep-network-based system for next best move prediction for object
classification using reinforcement learning. Our dataset is available for
download at cs.unc.edu/~ammirato/active_vision_dataset_website/.Comment: To appear at ICRA 201
Investigation of event-based memory surfaces for high-speed tracking, unsupervised feature extraction and object recognition
In this paper we compare event-based decaying and time based-decaying memory
surfaces for high-speed eventbased tracking, feature extraction, and object
classification using an event-based camera. The high-speed recognition task
involves detecting and classifying model airplanes that are dropped free-hand
close to the camera lens so as to generate a challenging dataset exhibiting
significant variance in target velocity. This variance motivated the
investigation of event-based decaying memory surfaces in comparison to
time-based decaying memory surfaces to capture the temporal aspect of the
event-based data. These surfaces are then used to perform unsupervised feature
extraction, tracking and recognition. In order to generate the memory surfaces,
event binning, linearly decaying kernels, and exponentially decaying kernels
were investigated with exponentially decaying kernels found to perform best.
Event-based decaying memory surfaces were found to outperform time-based
decaying memory surfaces in recognition especially when invariance to target
velocity was made a requirement. A range of network and receptive field sizes
were investigated. The system achieves 98.75% recognition accuracy within 156
milliseconds of an airplane entering the field of view, using only twenty-five
event-based feature extracting neurons in series with a linear classifier. By
comparing the linear classifier results to an ELM classifier, we find that a
small number of event-based feature extractors can effectively project the
complex spatio-temporal event patterns of the dataset to an almost linearly
separable representation in feature space.Comment: This is an updated version of a previously submitted manuscrip
SPNet: Deep 3D Object Classification and Retrieval using Stereographic Projection
We propose an efficient Stereographic Projection Neural Network (SPNet) for
learning representations of 3D objects. We first transform a 3D input volume
into a 2D planar image using stereographic projection. We then present a
shallow 2D convolutional neural network (CNN) to estimate the object category
followed by view ensemble, which combines the responses from multiple views of
the object to further enhance the predictions. Specifically, the proposed
approach consists of four stages: (1) Stereographic projection of a 3D object,
(2) view-specific feature learning, (3) view selection and (4) view ensemble.
The proposed approach performs comparably to the state-of-the-art methods while
having substantially lower GPU memory as well as network parameters. Despite
its lightness, the experiments on 3D object classification and shape retrievals
demonstrate the high performance of the proposed method
O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis
We present O-CNN, an Octree-based Convolutional Neural Network (CNN) for 3D
shape analysis. Built upon the octree representation of 3D shapes, our method
takes the average normal vectors of a 3D model sampled in the finest leaf
octants as input and performs 3D CNN operations on the octants occupied by the
3D shape surface. We design a novel octree data structure to efficiently store
the octant information and CNN features into the graphics memory and execute
the entire O-CNN training and evaluation on the GPU. O-CNN supports various CNN
structures and works for 3D shapes in different representations. By restraining
the computations on the octants occupied by 3D surfaces, the memory and
computational costs of the O-CNN grow quadratically as the depth of the octree
increases, which makes the 3D CNN feasible for high-resolution 3D models. We
compare the performance of the O-CNN with other existing 3D CNN solutions and
demonstrate the efficiency and efficacy of O-CNN in three shape analysis tasks,
including object classification, shape retrieval, and shape segmentation
- …