269,419 research outputs found

    Zero-Shot Object Recognition Based on Haptic Attributes

    Get PDF
    International audienceRobots operating in household environments need to recognize a variety of objects. Several touch-based object recognition systems have been proposed in the last few years [2]– [5]. They map haptic data to object classes using machine learning techniques, and then use the learned mapping to recognize one of the previously encountered objects. The accuracy of these proposed methods depends on the mass of the the training samples available for each object class. On the other hand, haptic data collection is often system (robot) specific and labour intensive. One way to cope with this problem is to use a knowledge transfer based system, that can exploit object relationships to share learned models between objects. However, while knowledge-based systems, such as zero shot learning [6], have been regularly proposed for visual object recognition, a similar system is not available for haptic recognition. Here we developed [1] the first haptic zero-shot learning system that enables a robot to recognize, using haptic exploration alone, objects that it encounters for the first time. Our system first uses the so called Direct Attributes Prediction (DAP) model [7] to train on the semantic representation of objects based on a list of haptic attributes, rather than the object itself. The attributes (including physical properties such as shape, texture, material) constitute an intermediate layer relating objects, and is used for knowledge transfer. Using this layering, our system can predict the attribute-based representation of a new (previously non-trained) object and use it to infer its identity. A. System Overview An overview of our system is given in Fig. 1. Given distinct training and test data-sets Y and Z, that are described by an attribute basis a, we first associate a binary label a o m to each object o with o ∈ Y ∪ Z and m = 1. .. M. This results in a binary object-attribute matrix K. For a given attributes list during training, haptic data collected from Y are used to train a binary classifier for each attribute a m. Finally, to classify a test sample x as one of Z objects, x is introduced to each one of the learned attribute classifiers and the output attributes posteriors p(a m | x) are used to predict the corresponding object, provided that the ground truth is available in K. This extended abstract is a summary of submission [1] B. Experimental Setup To collect haptic data, we use the Shadow anthropo-morphic robotic hand equipped with a BioTac multimodal tactile sensor on each fingertip. We developed a force-based grasp controller that enables the hand to enclose an object. The joint encoder readings provides us with information on object shape, while the BioTac sensors provides us with information about objects material, texture and compliance at each fingertip 1. In order to find the appropriate list of attributes describing our object set (illustrated in Fig. 2), we used online dictionaries to collect one or multiple textual definitions of each object. From this data, we extracted 11 haptic adjectives, or descriptions that could be " felt " using our robot hand. These adjectives served as our attributes: made of porcelain, made of plastic, made of glass, made of cardboard, made of stainless steel, cylindrical, round, rectangular, concave, has a handle, has a narrow part. We grouped the attributes into material attributes, and shape attributes. During the training phase, we use the Shadow hand joint readings x sh to train an SVM classifier for each shape, and BioTacs readings x b to train an SVM classifier for each material attribute. SVM training returns a distance s m (x) measure for each sample x that gives how far x lies from the discriminant hyper-plane. We transform this score to an attribute posterior p(a m | x) using a sigmoid function

    Multi-View Priors for Learning Detectors from Sparse Viewpoint Data

    Full text link
    While the majority of today's object class models provide only 2D bounding boxes, far richer output hypotheses are desirable including viewpoint, fine-grained category, and 3D geometry estimate. However, models trained to provide richer output require larger amounts of training data, preferably well covering the relevant aspects such as viewpoint and fine-grained categories. In this paper, we address this issue from the perspective of transfer learning, and design an object class model that explicitly leverages correlations between visual features. Specifically, our model represents prior distributions over permissible multi-view detectors in a parametric way -- the priors are learned once from training data of a source object class, and can later be used to facilitate the learning of a detector for a target class. As we show in our experiments, this transfer is not only beneficial for detectors based on basic-level category representations, but also enables the robust learning of detectors that represent classes at finer levels of granularity, where training data is typically even scarcer and more unbalanced. As a result, we report largely improved performance in simultaneous 2D object localization and viewpoint estimation on a recent dataset of challenging street scenes.Comment: 13 pages, 7 figures, 4 tables, International Conference on Learning Representations 201

    Building a scalable and interpretable bayesian deep learning framework for quality control of free form surfaces

    Get PDF
    Deep learning has demonstrated high accuracy for 3D object shape error modeling necessary to estimate dimensional and geometric quality defects in multi-station assembly systems (MAS). Increasingly, deep learning-driven Root Cause Analysis (RCA) is used for decision-making when planning corrective action of quality defects. However, given the current absence of scalability enabling models, training deep learning models for each individual MAS is exceedingly time-consuming as it requires large amounts of labelled data and multiple computational cycles. Additionally, understanding and interpreting how deep learning produces final predictions while quantifying various uncertainties also remains a fundamental challenge. In an effort to address these gaps, a novel closed-loop in-process (CLIP) diagnostic framework underpinned algorithm portfolio is proposed which simultaneously enhances scalability and interpretability of the current Bayesian deep learning approach, Object Shape Error Response (OSER), to isolate root cause(s) of quality defects in MAS. The OSER-MAS leverages a Bayesian 3D U-Net architecture integrated with Computer-Aided Engineering simulations to estimate root causes. The CLIP diagnostic framework shortens OSER-MAS model training time by developing: (i) closed-loop training to enable faster convergence for a single MAS by leveraging uncertainty estimates of the Bayesian 3D U-net model; and, (ii) transfer/continual learning-based scalability model to transmit meta-knowledge from the trained model to a new MAS resulting in convergence using comparatively less training samples. Additionally, CLIP increases the transparency for quality-related root cause predictions by developing interpretability model which is based on 3D Gradient-based Class Activation Maps (3D Grad-CAMs) and entails: (a) linking elements of MAS model with functional elements of the U-Net architecture; and, (b) relating features extracted by the architecture with elements of the MAS model and further with the object shape error patterns for root cause(s) that occur in MAS. Benchmarking studies are conducted using six automotive-MAS with varying complexities. Results highlight a reduction in training samples of up to 56% with a loss in performance of up to 2.1%

    Pose Induction for Novel Object Categories

    Full text link
    We address the task of predicting pose for objects of unannotated object categories from a small seed set of annotated object classes. We present a generalized classifier that can reliably induce pose given a single instance of a novel category. In case of availability of a large collection of novel instances, our approach then jointly reasons over all instances to improve the initial estimates. We empirically validate the various components of our algorithm and quantitatively show that our method produces reliable pose estimates. We also show qualitative results on a diverse set of classes and further demonstrate the applicability of our system for learning shape models of novel object classes
    • …
    corecore