Search CORE

30,197 research outputs found

Learning Dexterous Manipulation for a Soft Robotic Hand from Human Demonstration

Author: Abbeel Pieter
Eppner Clemens
Gupta Abhishek
Levine Sergey
Publication venue
Publication date: 20/03/2017
Field of study

Dexterous multi-fingered hands can accomplish fine manipulation behaviors that are infeasible with simple robotic grippers. However, sophisticated multi-fingered hands are often expensive and fragile. Low-cost soft hands offer an appealing alternative to more conventional devices, but present considerable challenges in sensing and actuation, making them difficult to apply to more complex manipulation tasks. In this paper, we describe an approach to learning from demonstration that can be used to train soft robotic hands to perform dexterous manipulation tasks. Our method uses object-centric demonstrations, where a human demonstrates the desired motion of manipulated objects with their own hands, and the robot autonomously learns to imitate these demonstrations using reinforcement learning. We propose a novel algorithm that allows us to blend and select a subset of the most feasible demonstrations to learn to imitate on the hardware, which we use with an extension of the guided policy search framework to use multiple demonstrations to learn generalizable neural network policies. We demonstrate our approach on the RBO Hand 2, with learned motor skills for turning a valve, manipulating an abacus, and grasping.Comment: Accepted at International Conference on Intelligent Robots and Systems(IROS) 2016. Pdf file updated for stylistic consistenc

arXiv.org e-Print Archive

Machine Learning Methods for Data Association in Multi-Object Tracking

Author: Elefteriadou Lily
Emami Patrick
Pardalos Panos M.
Ranka Sanjay
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/08/2020
Field of study

Data association is a key step within the multi-object tracking pipeline that is notoriously challenging due to its combinatorial nature. A popular and general way to formulate data association is as the NP-hard multidimensional assignment problem (MDAP). Over the last few years, data-driven approaches to assignment have become increasingly prevalent as these techniques have started to mature. We focus this survey solely on learning algorithms for the assignment step of multi-object tracking, and we attempt to unify various methods by highlighting their connections to linear assignment as well as to the MDAP. First, we review probabilistic and end-to-end optimization approaches to data association, followed by methods that learn association affinities from data. We then compare the performance of the methods presented in this survey, and conclude by discussing future research directions.Comment: Accepted for publication in ACM Computing Survey

arXiv.org e-Print Archive

GeoSeq2Seq: Information Geometric Sequence-to-Sequence Networks

Author: Bay Alessandro
Sengupta Biswa
Publication venue
Publication date: 05/01/2018
Field of study

The Fisher information metric is an important foundation of information geometry, wherein it allows us to approximate the local geometry of a probability distribution. Recurrent neural networks such as the Sequence-to-Sequence (Seq2Seq) networks that have lately been used to yield state-of-the-art performance on speech translation or image captioning have so far ignored the geometry of the latent embedding, that they iteratively learn. We propose the information geometric Seq2Seq (GeoSeq2Seq) network which abridges the gap between deep recurrent neural networks and information geometry. Specifically, the latent embedding offered by a recurrent network is encoded as a Fisher kernel of a parametric Gaussian Mixture Model, a formalism common in computer vision. We utilise such a network to predict the shortest routes between two nodes of a graph by learning the adjacency matrix using the GeoSeq2Seq formalism; our results show that for such a problem the probabilistic representation of the latent embedding supersedes the non-probabilistic embedding by 10-15\%

arXiv.org e-Print Archive

Multi-Person Pose Estimation via Column Generation

Author: Gonzalez-Ballester Miguel A.
Ihler Alexander
Wang Shaofei
Yarkony Julian
Zhang Chong
Publication venue
Publication date: 18/09/2017
Field of study

We study the problem of multi-person pose estimation in natural images. A pose estimate describes the spatial position and identity (head, foot, knee, etc.) of every non-occluded body part of a person. Pose estimation is difficult due to issues such as deformation and variation in body configurations and occlusion of parts, while multi-person settings add complications such as an unknown number of people, with unknown appearance and possible interactions in their poses and part locations. We give a novel integer program formulation of the multi-person pose estimation problem, in which variables correspond to assignments of parts in the image to poses in a two-tier, hierarchical way. This enables us to develop an efficient custom optimization procedure based on column generation, where columns are produced by exact optimization of very small scale integer programs. We demonstrate improved accuracy and speed for our method on the MPII multi-person pose estimation benchmark

arXiv.org e-Print Archive

A novel image tag completion method based on convolutional neural network

Author: Geng Yanyan
Gu Yi
Li Weizhi
Liang Gaoyuan
Liang Ru-Ze
Patil Nitin
Wang Jing-Yan
Wang Jingbin
Wu Yanbin
Zhang Guohui
Publication venue
Publication date: 03/06/2017
Field of study

In the problems of image retrieval and annotation, complete textual tag lists of images play critical roles. However, in real-world applications, the image tags are usually incomplete, thus it is important to learn the complete tags for images. In this paper, we study the problem of image tag complete and proposed a novel method for this problem based on a popular image representation method, convolutional neural network (CNN). The method estimates the complete tags from the convolutional filtering outputs of images based on a linear predictor. The CNN parameters, linear predictor, and the complete tags are learned jointly by our method. We build a minimization problem to encourage the consistency between the complete tags and the available incomplete tags, reduce the estimation error, and reduce the model complexity. An iterative algorithm is developed to solve the minimization problem. Experiments over benchmark image data sets show its effectiveness

arXiv.org e-Print Archive

Tagger: Deep Unsupervised Perceptual Grouping

Author: Berglund Mathias
Greff Klaus
Hao Tele Hotloo
Rasmus Antti
Schmidhuber Jürgen
Valpola Harri
Publication venue
Publication date: 28/11/2016
Field of study

We present a framework for efficient perceptual inference that explicitly reasons about the segmentation of its inputs and features. Rather than being trained for any specific segmentation, our framework learns the grouping process in an unsupervised manner or alongside any supervised task. By enriching the representations of a neural network, we enable it to group the representations of different objects in an iterative manner. By allowing the system to amortize the iterative inference of the groupings, we achieve very fast convergence. In contrast to many other recently proposed methods for addressing multi-object scenes, our system does not assume the inputs to be images and can therefore directly handle other modalities. For multi-digit classification of very cluttered images that require texture segmentation, our method offers improved classification performance over convolutional networks despite being fully connected. Furthermore, we observe that our system greatly improves on the semi-supervised result of a baseline Ladder network on our dataset, indicating that segmentation can also improve sample efficiency.Comment: 14 pages + 5 pages supplementary, accepted at NIPS 201

arXiv.org e-Print Archive

Scan2Mesh: From Unstructured Range Scans to 3D Meshes

Author: Dai Angela
Nießner Matthias
Publication venue
Publication date: 02/04/2019
Field of study

We introduce Scan2Mesh, a novel data-driven generative approach which transforms an unstructured and potentially incomplete range scan into a structured 3D mesh representation. The main contribution of this work is a generative neural network architecture whose input is a range scan of a 3D object and whose output is an indexed face set conditioned on the input scan. In order to generate a 3D mesh as a set of vertices and face indices, the generative model builds on a series of proxy losses for vertices, edges, and faces. At each stage, we realize a one-to-one discrete mapping between the predicted and ground truth data points with a combination of convolutional- and graph neural network architectures. This enables our algorithm to predict a compact mesh representation similar to those created through manual artist effort using 3D modeling software. Our generated mesh results thus produce sharper, cleaner meshes with a fundamentally different structure from those generated through implicit functions, a first step in bridging the gap towards artist-created CAD models

arXiv.org e-Print Archive

Deep Spectral Clustering using Dual Autoencoder Network

Author: Deng Cheng
Liu Wei
Yan Junchi
Yang Xu
Zheng Feng
Publication venue
Publication date: 30/04/2019
Field of study

The clustering methods have recently absorbed even-increasing attention in learning and vision. Deep clustering combines embedding and clustering together to obtain optimal embedding subspace for clustering, which can be more effective compared with conventional clustering methods. In this paper, we propose a joint learning framework for discriminative embedding and spectral clustering. We first devise a dual autoencoder network, which enforces the reconstruction constraint for the latent representations and their noisy versions, to embed the inputs into a latent space for clustering. As such the learned latent representations can be more robust to noise. Then the mutual information estimation is utilized to provide more discriminative information from the inputs. Furthermore, a deep spectral clustering method is applied to embed the latent representations into the eigenspace and subsequently clusters them, which can fully exploit the relationship between inputs to achieve optimal clustering results. Experimental results on benchmark datasets show that our method can significantly outperform state-of-the-art clustering approaches

arXiv.org e-Print Archive

Recommended from our members

White Matter Microstructure Associations of Cognitive and Visuomotor Control in Children: A Sensory Processing Perspective.

Author: Anguera Joaquin A
Brandes-Aitken Annie
Chang Yi-Shin
Demopoulos Carly
Gazzaley Adam
Marco Elysa J
Mukherjee Pratik
Owen Julia P
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

Objective: Recent evidence suggests that co-occurring deficits in cognitive control and visuomotor control are common to many neurodevelopmental disorders. Specifically, children with sensory processing dysfunction (SPD), a condition characterized by sensory hyper/hypo-sensitivity, show varying degrees of overlapping attention and visuomotor challenges. In this study, we assess associations between cognitive and visuomotor control abilities among children with and without SPD. In this same context, we also examined the common and unique diffusion tensor imaging (DTI) tracts that may support the overlap of cognitive control and visuomotor control. Method: We collected cognitive control and visuomotor control behavioral measures as well as DTI data in 37 children with SPD and 25 typically developing controls (TDCs). We constructed regressions to assess for associations between behavioral performance and mean fractional anisotropy (FA) in selected regions of interest (ROIs). Results: We observed an association between behavioral performance on cognitive control and visuomotor control. Further, our findings indicated that FA in the anterior limb of the internal capsule (ALIC), the anterior thalamic radiation (ATR), and the superior longitudinal fasciculus (SLF) are associated with both cognitive control and visuomotor control, while FA in the superior corona radiata (SCR) uniquely correlate with cognitive control performance and FA in the posterior limb of the internal capsule (PLIC) and the cerebral peduncle (CP) tract uniquely correlate with visuomotor control performance. Conclusions: These findings suggest that children who demonstrate lower cognitive control are also more likely to demonstrate lower visuomotor control, and vice-versa, regardless of clinical cohort assignment. The overlapping neural tracts, which correlate with both cognitive and visuomotor control suggest a possible common neural mechanism supporting both control-based processes

eScholarship - University of California

An Empirical Study of Spatial Attention Mechanisms in Deep Networks

Author: Cheng Dazhi
Dai Jifeng
Lin Stephen
Zhang Zheng
Zhu Xizhou
Publication venue
Publication date: 11/04/2019
Field of study

Attention mechanisms have become a popular component in deep neural networks, yet there has been little examination of how different influencing factors and methods for computing attention from these factors affect performance. Toward a better general understanding of attention mechanisms, we present an empirical study that ablates various spatial attention elements within a generalized attention formulation, encompassing the dominant Transformer attention as well as the prevalent deformable convolution and dynamic convolution modules. Conducted on a variety of applications, the study yields significant findings about spatial attention in deep networks, some of which run counter to conventional understanding. For example, we find that the query and key content comparison in Transformer attention is negligible for self-attention, but vital for encoder-decoder attention. A proper combination of deformable convolution with key content only saliency achieves the best accuracy-efficiency tradeoff in self-attention. Our results suggest that there exists much room for improvement in the design of attention mechanisms

arXiv.org e-Print Archive