7,633 research outputs found

    Self-paced Convolutional Neural Network for Computer Aided Detection in Medical Imaging Analysis

    Full text link
    Tissue characterization has long been an important component of Computer Aided Diagnosis (CAD) systems for automatic lesion detection and further clinical planning. Motivated by the superior performance of deep learning methods on various computer vision problems, there has been increasing work applying deep learning to medical image analysis. However, the development of a robust and reliable deep learning model for computer-aided diagnosis is still highly challenging due to the combination of the high heterogeneity in the medical images and the relative lack of training samples. Specifically, annotation and labeling of the medical images is much more expensive and time-consuming than other applications and often involves manual labor from multiple domain experts. In this work, we propose a multi-stage, self-paced learning framework utilizing a convolutional neural network (CNN) to classify Computed Tomography (CT) image patches. The key contribution of this approach is that we augment the size of training samples by refining the unlabeled instances with a self-paced learning CNN. By implementing the framework on high performance computing servers including the NVIDIA DGX1 machine, we obtained the experimental result, showing that the self-pace boosted network consistently outperformed the original network even with very scarce manual labels. The performance gain indicates that applications with limited training samples such as medical image analysis can benefit from using the proposed framework.Comment: accepted by 8th International Workshop on Machine Learning in Medical Imaging (MLMI 2017

    Web-Scale Training for Face Identification

    Full text link
    Scaling machine learning methods to very large datasets has attracted considerable attention in recent years, thanks to easy access to ubiquitous sensing and data from the web. We study face recognition and show that three distinct properties have surprising effects on the transferability of deep convolutional networks (CNN): (1) The bottleneck of the network serves as an important transfer learning regularizer, and (2) in contrast to the common wisdom, performance saturation may exist in CNN's (as the number of training samples grows); we propose a solution for alleviating this by replacing the naive random subsampling of the training set with a bootstrapping process. Moreover, (3) we find a link between the representation norm and the ability to discriminate in a target domain, which sheds lights on how such networks represent faces. Based on these discoveries, we are able to improve face recognition accuracy on the widely used LFW benchmark, both in the verification (1:1) and identification (1:N) protocols, and directly compare, for the first time, with the state of the art Commercially-Off-The-Shelf system and show a sizable leap in performance

    Bootstrapped CNNs for Building Segmentation on RGB-D Aerial Imagery

    Get PDF
    Detection of buildings and other objects from aerial images has various applications in urban planning and map making. Automated building detection from aerial imagery is a challenging task, as it is prone to varying lighting conditions, shadows and occlusions. Convolutional Neural Networks (CNNs) are robust against some of these variations, although they fail to distinguish easy and difficult examples. We train a detection algorithm from RGB-D images to obtain a segmented mask by using the CNN architecture DenseNet.First, we improve the performance of the model by applying a statistical re-sampling technique called Bootstrapping and demonstrate that more informative examples are retained. Second, the proposed method outperforms the non-bootstrapped version by utilizing only one-sixth of the original training data and it obtains a precision-recall break-even of 95.10% on our aerial imagery dataset.Comment: Published at ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Science

    OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

    Full text link
    We present an integrated framework for using Convolutional Networks for classification, localization and detection. We show how a multiscale and sliding window approach can be efficiently implemented within a ConvNet. We also introduce a novel deep learning approach to localization by learning to predict object boundaries. Bounding boxes are then accumulated rather than suppressed in order to increase detection confidence. We show that different tasks can be learned simultaneously using a single shared network. This integrated framework is the winner of the localization task of the ImageNet Large Scale Visual Recognition Challenge 2013 (ILSVRC2013) and obtained very competitive results for the detection and classifications tasks. In post-competition work, we establish a new state of the art for the detection task. Finally, we release a feature extractor from our best model called OverFeat

    Hand Keypoint Detection in Single Images using Multiview Bootstrapping

    Full text link
    We present an approach that uses a multi-camera system to train fine-grained detectors for keypoints that are prone to occlusion, such as the joints of a hand. We call this procedure multiview bootstrapping: first, an initial keypoint detector is used to produce noisy labels in multiple views of the hand. The noisy detections are then triangulated in 3D using multiview geometry or marked as outliers. Finally, the reprojected triangulations are used as new labeled training data to improve the detector. We repeat this process, generating more labeled data in each iteration. We derive a result analytically relating the minimum number of views to achieve target true and false positive rates for a given detector. The method is used to train a hand keypoint detector for single images. The resulting keypoint detector runs in realtime on RGB images and has accuracy comparable to methods that use depth sensors. The single view detector, triangulated over multiple views, enables 3D markerless hand motion capture with complex object interactions.Comment: CVPR 201
    • …
    corecore