2,954 research outputs found
DeepCut: Object Segmentation from Bounding Box Annotations using Convolutional Neural Networks
In this paper, we propose DeepCut, a method to obtain pixelwise object
segmentations given an image dataset labelled with bounding box annotations. It
extends the approach of the well-known GrabCut method to include machine
learning by training a neural network classifier from bounding box annotations.
We formulate the problem as an energy minimisation problem over a
densely-connected conditional random field and iteratively update the training
targets to obtain pixelwise object segmentations. Additionally, we propose
variants of the DeepCut method and compare those to a naive approach to CNN
training under weak supervision. We test its applicability to solve brain and
lung segmentation problems on a challenging fetal magnetic resonance dataset
and obtain encouraging results in terms of accuracy
WESPE: Weakly Supervised Photo Enhancer for Digital Cameras
Low-end and compact mobile cameras demonstrate limited photo quality mainly
due to space, hardware and budget constraints. In this work, we propose a deep
learning solution that translates photos taken by cameras with limited
capabilities into DSLR-quality photos automatically. We tackle this problem by
introducing a weakly supervised photo enhancer (WESPE) - a novel image-to-image
Generative Adversarial Network-based architecture. The proposed model is
trained by under weak supervision: unlike previous works, there is no need for
strong supervision in the form of a large annotated dataset of aligned
original/enhanced photo pairs. The sole requirement is two distinct datasets:
one from the source camera, and one composed of arbitrary high-quality images
that can be generally crawled from the Internet - the visual content they
exhibit may be unrelated. Hence, our solution is repeatable for any camera:
collecting the data and training can be achieved in a couple of hours. In this
work, we emphasize on extensive evaluation of obtained results. Besides
standard objective metrics and subjective user study, we train a virtual rater
in the form of a separate CNN that mimics human raters on Flickr data and use
this network to get reference scores for both original and enhanced photos. Our
experiments on the DPED, KITTI and Cityscapes datasets as well as pictures from
several generations of smartphones demonstrate that WESPE produces comparable
or improved qualitative results with state-of-the-art strongly supervised
methods
Advances in deep learning methods for pavement surface crack detection and identification with visible light visual images
Compared to NDT and health monitoring method for cracks in engineering
structures, surface crack detection or identification based on visible light
images is non-contact, with the advantages of fast speed, low cost and high
precision. Firstly, typical pavement (concrete also) crack public data sets
were collected, and the characteristics of sample images as well as the random
variable factors, including environmental, noise and interference etc., were
summarized. Subsequently, the advantages and disadvantages of three main crack
identification methods (i.e., hand-crafted feature engineering, machine
learning, deep learning) were compared. Finally, from the aspects of model
architecture, testing performance and predicting effectiveness, the development
and progress of typical deep learning models, including self-built CNN,
transfer learning(TL) and encoder-decoder(ED), which can be easily deployed on
embedded platform, were reviewed. The benchmark test shows that: 1) It has been
able to realize real-time pixel-level crack identification on embedded
platform: the entire crack detection average time cost of an image sample is
less than 100ms, either using the ED method (i.e., FPCNet) or the TL method
based on InceptionV3. It can be reduced to less than 10ms with TL method based
on MobileNet (a lightweight backbone base network). 2) In terms of accuracy, it
can reach over 99.8% on CCIC which is easily identified by human eyes. On
SDNET2018, some samples of which are difficult to be identified, FPCNet can
reach 97.5%, while TL method is close to 96.1%.
To the best of our knowledge, this paper for the first time comprehensively
summarizes the pavement crack public data sets, and the performance and
effectiveness of surface crack detection and identification deep learning
methods for embedded platform, are reviewed and evaluated.Comment: 15 pages, 14 figures, 11 table
A novel infrared video surveillance system using deep learning based techniques
This is the author accepted manuscript. The final version is available from Springer via the DOI in this record.This paper presents a new, practical infrared video based surveillance
system, consisting of a resolution-enhanced, automatic target detection/recognition
(ATD/R) system that is widely applicable in civilian and military applications. To
deal with the issue of small numbers of pixel on target in the developed ATD/R
system, as are encountered in long range imagery, a super-resolution method is
employed to increase target signature resolution and optimise the baseline quality
of inputs for object recognition. To tackle the challenge of detecting extremely
low-resolution targets, we train a sophisticated and powerful convolutional neural
network (CNN) based faster-RCNN using long wave infrared imagery datasets
that were prepared and marked in-house. The system was tested under different
weather conditions, using two datasets featuring target types comprising pedestrians
and 6 different types of ground vehicles. The developed ATD/R system can
detect extremely low-resolution targets with superior performance by effectively
addressing the low small number of pixels on target, encountered in long range applications.
A comparison with traditional methods confirms this superiority both
qualitatively and quantitativelyThis work was funded by Thales UK, the Centre of Excellence for
Sensor and Imaging System (CENSIS), and the Scottish Funding Council under the project
“AALART. Thales-Challenge Low-pixel Automatic Target Detection and Recognition (ATD/ATR)”,
ref. CAF-0036. Thanks are also given to the Digital Health and Care Institute (DHI, project
Smartcough-MacMasters), which partially supported Mr. Monge-Alvarez’s contribution, and
to the Royal Society of Edinburgh and National Science Foundation of China for the funding
associated to the project “Flood Detection and Monitoring using Hyperspectral Remote Sensing
from Unmanned Aerial Vehicles”, which partially covered Dr. Casaseca-de-la-Higuera’s,
Dr. Luo’s, and Prof. Wang’s contribution. Dr. Casaseca-de-la-Higuera would also like to acknowledge
the Royal Society of Edinburgh for the funding associated to project “HIVE”
3D Reconstruction of Optical Building Images Based on Improved 3D-R2N2 Algorithm
Three-dimensional reconstruction technology is a key element in the construction of urban geospatial models. Addressing the current shortcomings in reconstruction accuracy, registration results convergence, reconstruction effectiveness, and convergence time of 3D reconstruction algorithms, we propose an optical building object 3D reconstruction method based on an improved 3D-R2N2 algorithm. The method inputs preprocessed optical remote sensing images into a Convolutional Neural Network (CNN) with dense connections for encoding, converting them into a low-dimensional feature matrix and adding a residual connection between every two convolutional layers to enhance network depth. Subsequently, 3D Long Short-Term Memory (3D-LSTM) units are used for transitional connections and cyclic learning. Each unit selectively adjusts or maintains its state, accepting feature vectors computed by the encoder. These data are further passed into a Deep Convolutional Neural Network (DCNN), where each 3D-LSTM hidden unit partially reconstructs output voxels. The DCNN convolutional layer employs an equally sized 3 3 3 convolutional kernel to process these feature data and decode them, thereby accomplishing the 3D reconstruction of buildings. Simultaneously, a pyramid pooling layer is introduced between the feature extraction module and the fully connected layer to enhance the performance of the algorithm. Experimental results indicate that, compared to the 3D-R2N2 algorithm, the SFM-enhanced AKAZE algorithm, the AISI-BIM algorithm, and the improved PMVS algorithm, the proposed algorithm improves the reconstruction effect by 5.3%, 7.8%, 7.4%, and 1.0% respectively. Furthermore, compared to other algorithms, the proposed algorithm exhibits higher efficiency in terms of registration result convergence and reconstruction time, with faster computational speed. This research contributes to the enhancement of building 3D reconstruction technology, laying a foundation for future research in deep learning applications in the architectural field
- …