30,197 research outputs found
Learning Dexterous Manipulation for a Soft Robotic Hand from Human Demonstration
Dexterous multi-fingered hands can accomplish fine manipulation behaviors
that are infeasible with simple robotic grippers. However, sophisticated
multi-fingered hands are often expensive and fragile. Low-cost soft hands offer
an appealing alternative to more conventional devices, but present considerable
challenges in sensing and actuation, making them difficult to apply to more
complex manipulation tasks. In this paper, we describe an approach to learning
from demonstration that can be used to train soft robotic hands to perform
dexterous manipulation tasks. Our method uses object-centric demonstrations,
where a human demonstrates the desired motion of manipulated objects with their
own hands, and the robot autonomously learns to imitate these demonstrations
using reinforcement learning. We propose a novel algorithm that allows us to
blend and select a subset of the most feasible demonstrations to learn to
imitate on the hardware, which we use with an extension of the guided policy
search framework to use multiple demonstrations to learn generalizable neural
network policies. We demonstrate our approach on the RBO Hand 2, with learned
motor skills for turning a valve, manipulating an abacus, and grasping.Comment: Accepted at International Conference on Intelligent Robots and
Systems(IROS) 2016. Pdf file updated for stylistic consistenc
Machine Learning Methods for Data Association in Multi-Object Tracking
Data association is a key step within the multi-object tracking pipeline that
is notoriously challenging due to its combinatorial nature. A popular and
general way to formulate data association is as the NP-hard multidimensional
assignment problem (MDAP). Over the last few years, data-driven approaches to
assignment have become increasingly prevalent as these techniques have started
to mature. We focus this survey solely on learning algorithms for the
assignment step of multi-object tracking, and we attempt to unify various
methods by highlighting their connections to linear assignment as well as to
the MDAP. First, we review probabilistic and end-to-end optimization approaches
to data association, followed by methods that learn association affinities from
data. We then compare the performance of the methods presented in this survey,
and conclude by discussing future research directions.Comment: Accepted for publication in ACM Computing Survey
GeoSeq2Seq: Information Geometric Sequence-to-Sequence Networks
The Fisher information metric is an important foundation of information
geometry, wherein it allows us to approximate the local geometry of a
probability distribution. Recurrent neural networks such as the
Sequence-to-Sequence (Seq2Seq) networks that have lately been used to yield
state-of-the-art performance on speech translation or image captioning have so
far ignored the geometry of the latent embedding, that they iteratively learn.
We propose the information geometric Seq2Seq (GeoSeq2Seq) network which
abridges the gap between deep recurrent neural networks and information
geometry. Specifically, the latent embedding offered by a recurrent network is
encoded as a Fisher kernel of a parametric Gaussian Mixture Model, a formalism
common in computer vision. We utilise such a network to predict the shortest
routes between two nodes of a graph by learning the adjacency matrix using the
GeoSeq2Seq formalism; our results show that for such a problem the
probabilistic representation of the latent embedding supersedes the
non-probabilistic embedding by 10-15\%
Multi-Person Pose Estimation via Column Generation
We study the problem of multi-person pose estimation in natural images. A
pose estimate describes the spatial position and identity (head, foot, knee,
etc.) of every non-occluded body part of a person. Pose estimation is difficult
due to issues such as deformation and variation in body configurations and
occlusion of parts, while multi-person settings add complications such as an
unknown number of people, with unknown appearance and possible interactions in
their poses and part locations. We give a novel integer program formulation of
the multi-person pose estimation problem, in which variables correspond to
assignments of parts in the image to poses in a two-tier, hierarchical way.
This enables us to develop an efficient custom optimization procedure based on
column generation, where columns are produced by exact optimization of very
small scale integer programs. We demonstrate improved accuracy and speed for
our method on the MPII multi-person pose estimation benchmark
A novel image tag completion method based on convolutional neural network
In the problems of image retrieval and annotation, complete textual tag lists
of images play critical roles. However, in real-world applications, the image
tags are usually incomplete, thus it is important to learn the complete tags
for images. In this paper, we study the problem of image tag complete and
proposed a novel method for this problem based on a popular image
representation method, convolutional neural network (CNN). The method estimates
the complete tags from the convolutional filtering outputs of images based on a
linear predictor. The CNN parameters, linear predictor, and the complete tags
are learned jointly by our method. We build a minimization problem to encourage
the consistency between the complete tags and the available incomplete tags,
reduce the estimation error, and reduce the model complexity. An iterative
algorithm is developed to solve the minimization problem. Experiments over
benchmark image data sets show its effectiveness
Tagger: Deep Unsupervised Perceptual Grouping
We present a framework for efficient perceptual inference that explicitly
reasons about the segmentation of its inputs and features. Rather than being
trained for any specific segmentation, our framework learns the grouping
process in an unsupervised manner or alongside any supervised task. By
enriching the representations of a neural network, we enable it to group the
representations of different objects in an iterative manner. By allowing the
system to amortize the iterative inference of the groupings, we achieve very
fast convergence. In contrast to many other recently proposed methods for
addressing multi-object scenes, our system does not assume the inputs to be
images and can therefore directly handle other modalities. For multi-digit
classification of very cluttered images that require texture segmentation, our
method offers improved classification performance over convolutional networks
despite being fully connected. Furthermore, we observe that our system greatly
improves on the semi-supervised result of a baseline Ladder network on our
dataset, indicating that segmentation can also improve sample efficiency.Comment: 14 pages + 5 pages supplementary, accepted at NIPS 201
Scan2Mesh: From Unstructured Range Scans to 3D Meshes
We introduce Scan2Mesh, a novel data-driven generative approach which
transforms an unstructured and potentially incomplete range scan into a
structured 3D mesh representation. The main contribution of this work is a
generative neural network architecture whose input is a range scan of a 3D
object and whose output is an indexed face set conditioned on the input scan.
In order to generate a 3D mesh as a set of vertices and face indices, the
generative model builds on a series of proxy losses for vertices, edges, and
faces. At each stage, we realize a one-to-one discrete mapping between the
predicted and ground truth data points with a combination of convolutional- and
graph neural network architectures. This enables our algorithm to predict a
compact mesh representation similar to those created through manual artist
effort using 3D modeling software. Our generated mesh results thus produce
sharper, cleaner meshes with a fundamentally different structure from those
generated through implicit functions, a first step in bridging the gap towards
artist-created CAD models
Deep Spectral Clustering using Dual Autoencoder Network
The clustering methods have recently absorbed even-increasing attention in
learning and vision. Deep clustering combines embedding and clustering together
to obtain optimal embedding subspace for clustering, which can be more
effective compared with conventional clustering methods. In this paper, we
propose a joint learning framework for discriminative embedding and spectral
clustering. We first devise a dual autoencoder network, which enforces the
reconstruction constraint for the latent representations and their noisy
versions, to embed the inputs into a latent space for clustering. As such the
learned latent representations can be more robust to noise. Then the mutual
information estimation is utilized to provide more discriminative information
from the inputs. Furthermore, a deep spectral clustering method is applied to
embed the latent representations into the eigenspace and subsequently clusters
them, which can fully exploit the relationship between inputs to achieve
optimal clustering results. Experimental results on benchmark datasets show
that our method can significantly outperform state-of-the-art clustering
approaches
Recommended from our members
White Matter Microstructure Associations of Cognitive and Visuomotor Control in Children: A Sensory Processing Perspective.
Objective: Recent evidence suggests that co-occurring deficits in cognitive control and visuomotor control are common to many neurodevelopmental disorders. Specifically, children with sensory processing dysfunction (SPD), a condition characterized by sensory hyper/hypo-sensitivity, show varying degrees of overlapping attention and visuomotor challenges. In this study, we assess associations between cognitive and visuomotor control abilities among children with and without SPD. In this same context, we also examined the common and unique diffusion tensor imaging (DTI) tracts that may support the overlap of cognitive control and visuomotor control. Method: We collected cognitive control and visuomotor control behavioral measures as well as DTI data in 37 children with SPD and 25 typically developing controls (TDCs). We constructed regressions to assess for associations between behavioral performance and mean fractional anisotropy (FA) in selected regions of interest (ROIs). Results: We observed an association between behavioral performance on cognitive control and visuomotor control. Further, our findings indicated that FA in the anterior limb of the internal capsule (ALIC), the anterior thalamic radiation (ATR), and the superior longitudinal fasciculus (SLF) are associated with both cognitive control and visuomotor control, while FA in the superior corona radiata (SCR) uniquely correlate with cognitive control performance and FA in the posterior limb of the internal capsule (PLIC) and the cerebral peduncle (CP) tract uniquely correlate with visuomotor control performance. Conclusions: These findings suggest that children who demonstrate lower cognitive control are also more likely to demonstrate lower visuomotor control, and vice-versa, regardless of clinical cohort assignment. The overlapping neural tracts, which correlate with both cognitive and visuomotor control suggest a possible common neural mechanism supporting both control-based processes
An Empirical Study of Spatial Attention Mechanisms in Deep Networks
Attention mechanisms have become a popular component in deep neural networks,
yet there has been little examination of how different influencing factors and
methods for computing attention from these factors affect performance. Toward a
better general understanding of attention mechanisms, we present an empirical
study that ablates various spatial attention elements within a generalized
attention formulation, encompassing the dominant Transformer attention as well
as the prevalent deformable convolution and dynamic convolution modules.
Conducted on a variety of applications, the study yields significant findings
about spatial attention in deep networks, some of which run counter to
conventional understanding. For example, we find that the query and key content
comparison in Transformer attention is negligible for self-attention, but vital
for encoder-decoder attention. A proper combination of deformable convolution
with key content only saliency achieves the best accuracy-efficiency tradeoff
in self-attention. Our results suggest that there exists much room for
improvement in the design of attention mechanisms
- …