12,545 research outputs found

    Learning Less is More - 6D Camera Localization via 3D Surface Regression

    Full text link
    Popular research areas like autonomous driving and augmented reality have renewed the interest in image-based camera localization. In this work, we address the task of predicting the 6D camera pose from a single RGB image in a given 3D environment. With the advent of neural networks, previous works have either learned the entire camera localization process, or multiple components of a camera localization pipeline. Our key contribution is to demonstrate and explain that learning a single component of this pipeline is sufficient. This component is a fully convolutional neural network for densely regressing so-called scene coordinates, defining the correspondence between the input image and the 3D scene space. The neural network is prepended to a new end-to-end trainable pipeline. Our system is efficient, highly accurate, robust in training, and exhibits outstanding generalization capabilities. It exceeds state-of-the-art consistently on indoor and outdoor datasets. Interestingly, our approach surpasses existing techniques even without utilizing a 3D model of the scene during training, since the network is able to discover 3D scene geometry automatically, solely from single-view constraints.Comment: CVPR 201

    Segmentation-Based Deep-Learning Approach for Surface-Defect Detection

    Full text link
    Automated surface-anomaly detection using machine learning has become an interesting and promising area of research, with a very high and direct impact on the application domain of visual inspection. Deep-learning methods have become the most suitable approaches for this task. They allow the inspection system to learn to detect the surface anomaly by simply showing it a number of exemplar images. This paper presents a segmentation-based deep-learning architecture that is designed for the detection and segmentation of surface anomalies and is demonstrated on a specific domain of surface-crack detection. The design of the architecture enables the model to be trained using a small number of samples, which is an important requirement for practical applications. The proposed model is compared with the related deep-learning methods, including the state-of-the-art commercial software, showing that the proposed approach outperforms the related methods on the specific domain of surface-crack detection. The large number of experiments also shed light on the required precision of the annotation, the number of required training samples and on the required computational cost. Experiments are performed on a newly created dataset based on a real-world quality control case and demonstrates that the proposed approach is able to learn on a small number of defected surfaces, using only approximately 25-30 defective training samples, instead of hundreds or thousands, which is usually the case in deep-learning applications. This makes the deep-learning method practical for use in industry where the number of available defective samples is limited. The dataset is also made publicly available to encourage the development and evaluation of new methods for surface-defect detection.Comment: Journal of Intelligent Manufacturing 201

    Automatic Detection of Knee Joints and Quantification of Knee Osteoarthritis Severity using Convolutional Neural Networks

    Full text link
    This paper introduces a new approach to automatically quantify the severity of knee OA using X-ray images. Automatically quantifying knee OA severity involves two steps: first, automatically localizing the knee joints; next, classifying the localized knee joint images. We introduce a new approach to automatically detect the knee joints using a fully convolutional neural network (FCN). We train convolutional neural networks (CNN) from scratch to automatically quantify the knee OA severity optimizing a weighted ratio of two loss functions: categorical cross-entropy and mean-squared loss. This joint training further improves the overall quantification of knee OA severity, with the added benefit of naturally producing simultaneous multi-class classification and regression outputs. Two public datasets are used to evaluate our approach, the Osteoarthritis Initiative (OAI) and the Multicenter Osteoarthritis Study (MOST), with extremely promising results that outperform existing approaches

    Weakly supervised training of pixel resolution segmentation models on whole slide images

    Full text link
    We present a novel approach to train pixel resolution segmentation models on whole slide images in a weakly supervised setup. The model is trained to classify patches extracted from slides. This leads the training to be made under noisy labeled data. We solve the problem with two complementary strategies. First, the patches are sampled online using the model's knowledge by focusing on regions where the model's confidence is higher. Second, we propose an extension of the KL divergence that is robust to noisy labels. Our preliminary experiment on CAMELYON 16 data set show promising results. The model can successfully segment tumor areas with strong morphological consistency.Comment: Performance updat

    Automatic Renal Segmentation in DCE-MRI using Convolutional Neural Networks

    Full text link
    Kidney function evaluation using dynamic contrast-enhanced MRI (DCE-MRI) images could help in diagnosis and treatment of kidney diseases of children. Automatic segmentation of renal parenchyma is an important step in this process. In this paper, we propose a time and memory efficient fully automated segmentation method which achieves high segmentation accuracy with running time in the order of seconds in both normal kidneys and kidneys with hydronephrosis. The proposed method is based on a cascaded application of two 3D convolutional neural networks that employs spatial and temporal information at the same time in order to learn the tasks of localization and segmentation of kidneys, respectively. Segmentation performance is evaluated on both normal and abnormal kidneys with varying levels of hydronephrosis. We achieved a mean dice coefficient of 91.4 and 83.6 for normal and abnormal kidneys of pediatric patients, respectively

    A spatiotemporal model with visual attention for video classification

    Full text link
    High level understanding of sequential visual input is important for safe and stable autonomy, especially in localization and object detection. While traditional object classification and tracking approaches are specifically designed to handle variations in rotation and scale, current state-of-the-art approaches based on deep learning achieve better performance. This paper focuses on developing a spatiotemporal model to handle videos containing moving objects with rotation and scale changes. Built on models that combine Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to classify sequential data, this work investigates the effectiveness of incorporating attention modules in the CNN stage for video classification. The superiority of the proposed spatiotemporal model is demonstrated on the Moving MNIST dataset augmented with rotation and scaling.Comment: Accepted by Robotics: Science and Systems 2017 Workshop on Articulated Model Trackin

    Adversarial Learning for Image Forensics Deep Matching with Atrous Convolution

    Full text link
    Constrained image splicing detection and localization (CISDL) is a newly proposed challenging task for image forensics, which investigates two input suspected images and identifies whether one image has suspected regions pasted from the other. In this paper, we propose a novel adversarial learning framework to train the deep matching network for CISDL. Our framework mainly consists of three building blocks: 1) the deep matching network based on atrous convolution (DMAC) aims to generate two high-quality candidate masks which indicate the suspected regions of the two input images, 2) the detection network is designed to rectify inconsistencies between the two corresponding candidate masks, 3) the discriminative network drives the DMAC network to produce masks that are hard to distinguish from ground-truth ones. In DMAC, atrous convolution is adopted to extract features with rich spatial information, the correlation layer based on the skip architecture is proposed to capture hierarchical features, and atrous spatial pyramid pooling is constructed to localize tampered regions at multiple scales. The detection network and the discriminative network act as the losses with auxiliary parameters to supervise the training of DMAC in an adversarial way. Extensive experiments, conducted on 21 generated testing sets and two public datasets, demonstrate the effectiveness of the proposed framework and the superior performance of DMAC.Comment: 13 pages, 8 figure

    Graph-based Proprioceptive Localization Using a Discrete Heading-Length Feature Sequence Matching Approach

    Full text link
    Proprioceptive localization refers to a new class of robot egocentric localization methods that do not rely on the perception and recognition of external landmarks. These methods are naturally immune to bad weather, poor lighting conditions, or other extreme environmental conditions that may hinder exteroceptive sensors such as a camera or a laser ranger finder. These methods depend on proprioceptive sensors such as inertial measurement units (IMUs) and/or wheel encoders. Assisted by magnetoreception, the sensors can provide a rudimentary estimation of vehicle trajectory which is used to query a prior known map to obtain location. Named as graph-based proprioceptive localization (GBPL), we provide a low cost fallback solution for localization under challenging environmental conditions. As a robot/vehicle travels, we extract a sequence of heading-length values for straight segments from the trajectory and match the sequence with a pre-processed heading-length graph (HLG) abstracted from the prior known map to localize the robot under a graph-matching approach. Using the information from HLG, our location alignment and verification module compensates for trajectory drift, wheel slip, or tire inflation level. We have implemented our algorithm and tested it in both simulated and physical experiments. The algorithm runs successfully in finding robot location continuously and achieves localization accurate at the level that the prior map allows (less than 10m).Comment: 13 pages, 32 figure

    ProNet: Learning to Propose Object-specific Boxes for Cascaded Neural Networks

    Full text link
    This paper aims to classify and locate objects accurately and efficiently, without using bounding box annotations. It is challenging as objects in the wild could appear at arbitrary locations and in different scales. In this paper, we propose a novel classification architecture ProNet based on convolutional neural networks. It uses computationally efficient neural networks to propose image regions that are likely to contain objects, and applies more powerful but slower networks on the proposed regions. The basic building block is a multi-scale fully-convolutional network which assigns object confidence scores to boxes at different locations and scales. We show that such networks can be trained effectively using image-level annotations, and can be connected into cascades or trees for efficient object classification. ProNet outperforms previous state-of-the-art significantly on PASCAL VOC 2012 and MS COCO datasets for object classification and point-based localization.Comment: CVPR 2016 (fixed reference issue

    BAOD: Budget-Aware Object Detection

    Full text link
    We study the problem of object detection from a novel perspective in which annotation budget constraints are taken into consideration, appropriately coined Budget Aware Object Detection (BAOD). When provided with a fixed budget, we propose a strategy for building a diverse and informative dataset that can be used to optimally train a robust detector. We investigate both optimization and learning-based methods to sample which images to annotate and what type of annotation (strongly or weakly supervised) to annotate them with. We adopt a hybrid supervised learning framework to train the object detector from both these types of annotation. We conduct a comprehensive empirical study showing that a handcrafted optimization method outperforms other selection techniques including random sampling, uncertainty sampling and active learning. By combining an optimal image/annotation selection scheme with hybrid supervised learning to solve the BAOD problem, we show that one can achieve the performance of a strongly supervised detector on PASCAL-VOC 2007 while saving 12.8% of its original annotation budget. Furthermore, when 100%100\% of the budget is used, it surpasses this performance by 2.0 mAP percentage points
    corecore