546 research outputs found

    Classification and Grip of Occluded Objects

    Get PDF
    The present paper exposes a system for detection, classification, and grip of occluded objects by machine vision, artificial intelligence, and an anthropomorphic robot, to generate a solution for the subjection of elements that present occlusions. The deep learning algorithm used is based on Convolutional Neural Networks (CNN), specifically Fast R-CNN (Fast Region-Based CNN) and DAG-CNN (Directed Acyclic Graph CNN) for pattern recognition, the three-dimensional information of the environment was collected through Kinect V1, and tests simulations by the tool VRML. A sequence of detection, classification, and grip was programmed to determine which elements present occlusions and which type of tool generates the occlusion. According to the user's requirements, the desired elements are delivered (occluded or not), and the unwanted elements are removed. It was possible to develop a program with 88.89% accuracy in gripping and delivering occluded objects using networks Fast R-CNN and DAG-CNN with achieving of 70.9% and 96.2% accuracy respectively, detecting elements without occlusions for the first net and classifying the objects into five tools (Scalpel, Scissor, Screwdriver, Spanner, and Pliers), with the second net. The grip of occluded objects requires accurate detection of the element located at the top of the pile of objects to remove it without affecting the rest of the environment. Additionally, the detection process requires that a part of the occluded tool be visible to determine the existence of occlusions in the stac

    3D Shape Prediction on Convolutional Deep Belief Networks

    Get PDF
    The field of image recognition software has grown immensely in recent years with the emergence of new deep learning techniques. Deep belief networks inspired by Hinton [11] were one of the earliest methodologies of deep learning in the late 2000s. More recently, convolutional neural networks have been used in deep learning techniques, architecture, and software to identify patterns in imagery in order to make predictions such as classification, image segmentation, etc. Traditional two-dimensional, or 2D, images stored as picture files, typically contain red, green, and blue color data for each individual pixel in the picture. However, more recent commercial 2.5D or depth cameras have become more readily available such as the Microsoft Kinect, which is capable of capturing both RGB and depth (RGB-D) data. With the new depth dimension that can be captured from these cameras, objects are no longer limited to a flat dimension and the volumetric shape of the object can now be used to aid in recognizing that particular object. In this project, I will utilize a convolutional deep belief network in order to observe the effects of rotation and sliding window stride when conducting classification on 3D models. An early study conducted named 3D ShapeNets experimented with this idea utilizing 3D computer aided design (CAD) model data in order to classify 3D models [2]. Extending from this research, the results from my research experiment showed an adverse correlation between angle granularity and recognition accuracy. Moreover, in regards to sliding window stride length, the training time increased substantially but had little effect on overall 3D model classification

    Object Detection in 20 Years: A Survey

    Full text link
    Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today's object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible publicatio

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    Using Support Vector Machines, Convolutional Neural Networks and Deep Belief Networks for Partially Occluded Object Recognition

    Get PDF
    Artificial neural networks have been widely used for machine learning tasks such as object recognition. Recent developments have made use of biologically inspired architectures, such as the Convolutional Neural Network, and the Deep Belief Network. A theoretical method for estimating the optimal number of feature maps for a Convolutional Neural Network maps using the dimensions of the receptive field or convolutional kernel is proposed. Empirical experiments are performed that show that the method works to an extent for extremely small receptive fields, but doesn't generalize as clearly to all receptive field sizes. We then test the hypothesis that generative models such as the Deep Belief Network should perform better on occluded object recognition tasks than purely discriminative models such as Convolutional Neural Networks. We find that the data does not support this hypothesis when the generative models are run in a partially discriminative manner. We also find that the use of Gaussian visible units in a Deep Belief Network trained on occluded image data allows it to also learn to classify non-occluded images

    Augmenting Deep Learning Performance in an Evidential Multiple Classifier System

    Get PDF
    International audienceThe main objective of this work is to study the applicability of ensemble methods in the context of deep learning with limited amounts of labeled data. We exploit an ensemble of neural networks derived using Monte Carlo dropout, along with an ensemble of SVM classifiers which owes its effectiveness to the hand-crafted features used as inputs and to an active learning procedure. In order to leverage each classifier's respective strengths, we combine them in an evidential framework, which models specifically their imprecision and uncertainty. The application we consider in order to illustrate the interest of our Multiple Classifier System is pedestrian detection in high-density crowds, which is ideally suited for its difficulty, cost of labeling and intrinsic imprecision of annotation data. We show that the fusion resulting from the effective modeling of uncertainty allows for performance improvement, and at the same time, for a deeper interpretation of the result in terms of commitment of the decision

    On the 3D point cloud for human-pose estimation

    Get PDF
    This thesis aims at investigating methodologies for estimating a human pose from a 3D point cloud that is captured by a static depth sensor. Human-pose estimation (HPE) is important for a range of applications, such as human-robot interaction, healthcare, surveillance, and so forth. Yet, HPE is challenging because of the uncertainty in sensor measurements and the complexity of human poses. In this research, we focus on addressing challenges related to two crucial components in the estimation process, namely, human-pose feature extraction and human-pose modeling. In feature extraction, the main challenge involves reducing feature ambiguity. We propose a 3D-point-cloud feature called viewpoint and shape feature histogram (VISH) to reduce feature ambiguity by capturing geometric properties of the 3D point cloud of a human. The feature extraction consists of three steps: 3D-point-cloud pre-processing, hierarchical structuring, and feature extraction. In the pre-processing step, 3D points corresponding to a human are extracted and outliers from the environment are removed to retain the 3D points of interest. This step is important because it allows us to reduce the number of 3D points by keeping only those points that correspond to the human body for further processing. In the hierarchical structuring, the pre-processed 3D point cloud is partitioned and replicated into a tree structure as nodes. Viewpoint feature histogram (VFH) and shape features are extracted from each node in the tree to provide a descriptor to represent each node. As the features are obtained based on histograms, coarse-level details are highlighted in large regions and fine-level details are highlighted in small regions. Therefore, the features from the point cloud in the tree can capture coarse level to fine level information to reduce feature ambiguity. In human-pose modeling, the main challenges involve reducing the dimensionality of human-pose space and designing appropriate factors that represent the underlying probability distributions for estimating human poses. To reduce the dimensionality, we propose a non-parametric action-mixture model (AMM). It represents high-dimensional human-pose space using low-dimensional manifolds in searching human poses. In each manifold, a probability distribution is estimated based on feature similarity. The distributions in the manifolds are then redistributed according to the stationary distribution of a Markov chain that models the frequency of human actions. After the redistribution, the manifolds are combined according to a probability distribution determined by action classification. Experiments were conducted using VISH features as input to the AMM. The results showed that the overall error and standard deviation of the AMM were reduced by about 7.9% and 7.1%, respectively, compared with a model without action classification. To design appropriate factors, we consider the AMM as a Bayesian network and propose a mapping that converts the Bayesian network to a neural network called NN-AMM. The proposed mapping consists of two steps: structure identification and parameter learning. In structure identification, we have developed a bottom-up approach to build a neural network while preserving the Bayesian-network structure. In parameter learning, we have created a part-based approach to learn synaptic weights by decomposing a neural network into parts. Based on the concept of distributed representation, the NN-AMM is further modified into a scalable neural network called NND-AMM. A neural-network-based system is then built by using VISH features to represent 3D-point-cloud input and the NND-AMM to estimate 3D human poses. The results showed that the proposed mapping can be utilized to design AMM factors automatically. The NND-AMM can provide more accurate human-pose estimates with fewer hidden neurons than both the AMM and NN-AMM can. Both the NN-AMM and NND-AMM can adapt to different types of input, showing the advantage of using neural networks to design factors
    • …
    corecore