2,954 research outputs found

    DeepCut: Object Segmentation from Bounding Box Annotations using Convolutional Neural Networks

    Get PDF
    In this paper, we propose DeepCut, a method to obtain pixelwise object segmentations given an image dataset labelled with bounding box annotations. It extends the approach of the well-known GrabCut method to include machine learning by training a neural network classifier from bounding box annotations. We formulate the problem as an energy minimisation problem over a densely-connected conditional random field and iteratively update the training targets to obtain pixelwise object segmentations. Additionally, we propose variants of the DeepCut method and compare those to a naive approach to CNN training under weak supervision. We test its applicability to solve brain and lung segmentation problems on a challenging fetal magnetic resonance dataset and obtain encouraging results in terms of accuracy

    WESPE: Weakly Supervised Photo Enhancer for Digital Cameras

    Full text link
    Low-end and compact mobile cameras demonstrate limited photo quality mainly due to space, hardware and budget constraints. In this work, we propose a deep learning solution that translates photos taken by cameras with limited capabilities into DSLR-quality photos automatically. We tackle this problem by introducing a weakly supervised photo enhancer (WESPE) - a novel image-to-image Generative Adversarial Network-based architecture. The proposed model is trained by under weak supervision: unlike previous works, there is no need for strong supervision in the form of a large annotated dataset of aligned original/enhanced photo pairs. The sole requirement is two distinct datasets: one from the source camera, and one composed of arbitrary high-quality images that can be generally crawled from the Internet - the visual content they exhibit may be unrelated. Hence, our solution is repeatable for any camera: collecting the data and training can be achieved in a couple of hours. In this work, we emphasize on extensive evaluation of obtained results. Besides standard objective metrics and subjective user study, we train a virtual rater in the form of a separate CNN that mimics human raters on Flickr data and use this network to get reference scores for both original and enhanced photos. Our experiments on the DPED, KITTI and Cityscapes datasets as well as pictures from several generations of smartphones demonstrate that WESPE produces comparable or improved qualitative results with state-of-the-art strongly supervised methods

    Advances in deep learning methods for pavement surface crack detection and identification with visible light visual images

    Full text link
    Compared to NDT and health monitoring method for cracks in engineering structures, surface crack detection or identification based on visible light images is non-contact, with the advantages of fast speed, low cost and high precision. Firstly, typical pavement (concrete also) crack public data sets were collected, and the characteristics of sample images as well as the random variable factors, including environmental, noise and interference etc., were summarized. Subsequently, the advantages and disadvantages of three main crack identification methods (i.e., hand-crafted feature engineering, machine learning, deep learning) were compared. Finally, from the aspects of model architecture, testing performance and predicting effectiveness, the development and progress of typical deep learning models, including self-built CNN, transfer learning(TL) and encoder-decoder(ED), which can be easily deployed on embedded platform, were reviewed. The benchmark test shows that: 1) It has been able to realize real-time pixel-level crack identification on embedded platform: the entire crack detection average time cost of an image sample is less than 100ms, either using the ED method (i.e., FPCNet) or the TL method based on InceptionV3. It can be reduced to less than 10ms with TL method based on MobileNet (a lightweight backbone base network). 2) In terms of accuracy, it can reach over 99.8% on CCIC which is easily identified by human eyes. On SDNET2018, some samples of which are difficult to be identified, FPCNet can reach 97.5%, while TL method is close to 96.1%. To the best of our knowledge, this paper for the first time comprehensively summarizes the pavement crack public data sets, and the performance and effectiveness of surface crack detection and identification deep learning methods for embedded platform, are reviewed and evaluated.Comment: 15 pages, 14 figures, 11 table

    A novel infrared video surveillance system using deep learning based techniques

    Get PDF
    This is the author accepted manuscript. The final version is available from Springer via the DOI in this record.This paper presents a new, practical infrared video based surveillance system, consisting of a resolution-enhanced, automatic target detection/recognition (ATD/R) system that is widely applicable in civilian and military applications. To deal with the issue of small numbers of pixel on target in the developed ATD/R system, as are encountered in long range imagery, a super-resolution method is employed to increase target signature resolution and optimise the baseline quality of inputs for object recognition. To tackle the challenge of detecting extremely low-resolution targets, we train a sophisticated and powerful convolutional neural network (CNN) based faster-RCNN using long wave infrared imagery datasets that were prepared and marked in-house. The system was tested under different weather conditions, using two datasets featuring target types comprising pedestrians and 6 different types of ground vehicles. The developed ATD/R system can detect extremely low-resolution targets with superior performance by effectively addressing the low small number of pixels on target, encountered in long range applications. A comparison with traditional methods confirms this superiority both qualitatively and quantitativelyThis work was funded by Thales UK, the Centre of Excellence for Sensor and Imaging System (CENSIS), and the Scottish Funding Council under the project “AALART. Thales-Challenge Low-pixel Automatic Target Detection and Recognition (ATD/ATR)”, ref. CAF-0036. Thanks are also given to the Digital Health and Care Institute (DHI, project Smartcough-MacMasters), which partially supported Mr. Monge-Alvarez’s contribution, and to the Royal Society of Edinburgh and National Science Foundation of China for the funding associated to the project “Flood Detection and Monitoring using Hyperspectral Remote Sensing from Unmanned Aerial Vehicles”, which partially covered Dr. Casaseca-de-la-Higuera’s, Dr. Luo’s, and Prof. Wang’s contribution. Dr. Casaseca-de-la-Higuera would also like to acknowledge the Royal Society of Edinburgh for the funding associated to project “HIVE”

    3D Reconstruction of Optical Building Images Based on Improved 3D-R2N2 Algorithm

    Get PDF
    Three-dimensional reconstruction technology is a key element in the construction of urban geospatial models. Addressing the current shortcomings in reconstruction accuracy, registration results convergence, reconstruction effectiveness, and convergence time of 3D reconstruction algorithms, we propose an optical building object 3D reconstruction method based on an improved 3D-R2N2 algorithm. The method inputs preprocessed optical remote sensing images into a Convolutional Neural Network (CNN) with dense connections for encoding, converting them into a low-dimensional feature matrix and adding a residual connection between every two convolutional layers to enhance network depth. Subsequently, 3D Long Short-Term Memory (3D-LSTM) units are used for transitional connections and cyclic learning. Each unit selectively adjusts or maintains its state, accepting feature vectors computed by the encoder. These data are further passed into a Deep Convolutional Neural Network (DCNN), where each 3D-LSTM hidden unit partially reconstructs output voxels. The DCNN convolutional layer employs an equally sized 3 3 3 convolutional kernel to process these feature data and decode them, thereby accomplishing the 3D reconstruction of buildings. Simultaneously, a pyramid pooling layer is introduced between the feature extraction module and the fully connected layer to enhance the performance of the algorithm. Experimental results indicate that, compared to the 3D-R2N2 algorithm, the SFM-enhanced AKAZE algorithm, the AISI-BIM algorithm, and the improved PMVS algorithm, the proposed algorithm improves the reconstruction effect by 5.3%, 7.8%, 7.4%, and 1.0% respectively. Furthermore, compared to other algorithms, the proposed algorithm exhibits higher efficiency in terms of registration result convergence and reconstruction time, with faster computational speed. This research contributes to the enhancement of building 3D reconstruction technology, laying a foundation for future research in deep learning applications in the architectural field
    • …
    corecore