212 research outputs found
3D Depthwise Convolution: Reducing Model Parameters in 3D Vision Tasks
Standard 3D convolution operations require much larger amounts of memory and
computation cost than 2D convolution operations. The fact has hindered the
development of deep neural nets in many 3D vision tasks. In this paper, we
investigate the possibility of applying depthwise separable convolutions in 3D
scenario and introduce the use of 3D depthwise convolution. A 3D depthwise
convolution splits a single standard 3D convolution into two separate steps,
which would drastically reduce the number of parameters in 3D convolutions with
more than one order of magnitude. We experiment with 3D depthwise convolution
on popular CNN architectures and also compare it with a similar structure
called pseudo-3D convolution. The results demonstrate that, with 3D depthwise
convolutions, 3D vision tasks like classification and reconstruction can be
carried out with more light-weighted neural networks while still delivering
comparable performances.Comment: Work in progres
Application of Convolutional Neural Network in the Segmentation and Classification of High-Resolution Remote Sensing Images
Numerous convolution neural networks increase accuracy of classification for remote sensing scene images at the expense of the models space and time sophistication This causes the model to run slowly and prevents the realization of a trade-off among model accuracy and running time The loss of deep characteristics as the network gets deeper makes it impossible to retrieve the key aspects with a sample double branching structure which is bad for classifying remote sensing scene photo
DPSA: Dense pixelwise spatial attention network for hatching egg fertility detection
© 2020 SPIE and IS & T. Deep convolutional neural networks show a good prospect in the fertility detection and classification of specific pathogen-free hatching egg embryos in the production of avian influenza vaccine, and our previous work has mainly investigated three factors of networks to push performance: depth, width, and cardinality. However, an important problem that feeble embryos with weak blood vessels interfering with the classification of resilient fertile ones remains. Inspired by fine-grained classification, we introduce the attention mechanism into our model by proposing a dense pixelwise spatial attention module combined with the existing channel attention through depthwise separable convolutions to further enhance the network class-discriminative ability. In our fused attention module, depthwise convolutions are used for channel-specific features learning, and dilated convolutions with different sampling rates are adopted to capture spatial multiscale context and preserve rich detail, which can maintain high resolution and increase receptive fields simultaneously. The attention mask with strong semantic information generated by aggregating outputs of the spatial pyramid dilated convolution is broadcasted to low-level features via elementwise multiplications, serving as a feature selector to emphasize informative features and suppress less useful ones. A series of experiments conducted on our hatching egg dataset show that our attention network achieves a lower misjudgment rate on weak embryos and a more stable accuracy, which is up to 98.3% and 99.1% on 5-day and 9-day old eggs, respectively
IRX-1D: A Simple Deep Learning Architecture for Remote Sensing Classifications
We proposes a simple deep learning architecture combining elements of
Inception, ResNet and Xception networks. Four new datasets were used for
classification with both small and large training samples. Results in terms of
classification accuracy suggests improved performance by proposed architecture
in comparison to Bayesian optimised 2D-CNN with small training samples.
Comparison of results using small training sample with Indiana Pines
hyperspectral dataset suggests comparable or better performance by proposed
architecture than nine reported works using different deep learning
architectures. In spite of achieving high classification accuracy with limited
training samples, comparison of classified image suggests different land cover
classes are assigned to same area when compared with the classified image
provided by the model trained using large training samples with all datasets.Comment: 22 Page, 6 tables, 9 Figure
Dynamic Convolution Self-Attention Network for Land-Cover Classification in VHR Remote-Sensing Images
The current deep convolutional neural networks for very-high-resolution (VHR) remote-sensing image land-cover classification often suffer from two challenges. First, the feature maps extracted by network encoders based on vanilla convolution usually contain a lot of redundant information, which easily causes misclassification of land cover. Moreover, these encoders usually require a large number of parameters and high computational costs. Second, as remote-sensing images are complex and contain many objects with large-scale variances, it is difficult to use the popular feature fusion modules to improve the representation ability of networks. To address the above issues, we propose a dynamic convolution self-attention network (DCSA-Net) for VHR remote-sensing image land-cover classification. The proposed network has two advantages. On one hand, we designed a lightweight dynamic convolution module (LDCM) by using dynamic convolution and a self-attention mechanism. This module can extract more useful image features than vanilla convolution, avoiding the negative effect of useless feature maps on land-cover classification. On the other hand, we designed a context information aggregation module (CIAM) with a ladder structure to enlarge the receptive field. This module can aggregate multi-scale contexture information from feature maps with different resolutions using a dense connection. Experiment results show that the proposed DCSA-Net is superior to state-of-the-art networks due to higher accuracy of land-cover classification, fewer parameters, and lower computational cost. The source code is made public available.National Natural Science Foundation of China (Program No. 61871259, 62271296, 61861024), in part by Natural Science Basic Research Program of Shaanxi (Program No. 2021JC-47), in part by Key Research and Development Program of Shaanxi (Program No. 2022GY-436, 2021ZDLGY08-07), in part by Natural Science Basic Research Program of Shaanxi (Program No. 2022JQ-634, 2022JQ-018), and in part by Shaanxi Joint Laboratory of Artificial Intelligence (No. 2020SS-03)
DGCNet: An Efficient 3D-Densenet based on Dynamic Group Convolution for Hyperspectral Remote Sensing Image Classification
Deep neural networks face many problems in the field of hyperspectral image
classification, lack of effective utilization of spatial spectral information,
gradient disappearance and overfitting as the model depth increases. In order
to accelerate the deployment of the model on edge devices with strict latency
requirements and limited computing power, we introduce a lightweight model
based on the improved 3D-Densenet model and designs DGCNet. It improves the
disadvantage of group convolution. Referring to the idea of dynamic network,
dynamic group convolution(DGC) is designed on 3d convolution kernel. DGC
introduces small feature selectors for each grouping to dynamically decide
which part of the input channel to connect based on the activations of all
input channels. Multiple groups can capture different and complementary visual
and semantic features of input images, allowing convolution neural network(CNN)
to learn rich features. 3D convolution extracts high-dimensional and redundant
hyperspectral data, and there is also a lot of redundant information between
convolution kernels. DGC module allows 3D-Densenet to select channel
information with richer semantic features and discard inactive regions. The
3D-CNN passing through the DGC module can be regarded as a pruned network. DGC
not only allows 3D-CNN to complete sufficient feature extraction, but also
takes into account the requirements of speed and calculation amount. The
inference speed and accuracy have been improved, with outstanding performance
on the IN, Pavia and KSC datasets, ahead of the mainstream hyperspectral image
classification methods
Slum image detection and localization using transfer learning: a case study in Northern Morocco
Developing countries are faced with social and economic challenges, including the emergence and proliferation of slums. Slum detection and localization methods typically rely on regular topographic surveys or on visual identification of high-resolution spatial satellite images, as well as socio-environmental surveys from land surveys and general population censuses. Yet, they consume so much time and effort. To overcome these problems, this paper exploits well-known seven pretrained models using transfer learning approaches such as MobileNets, InceptionV3, NASNetMobile, Xception, VGG16, EfficientNet, and ResNet50, consecutively, on a smaller dataset of medium-resolution satellite imagery. The accuracies obtained from these experiments, respectively, demonstrate that the top three pretrained models achieve 98.78%, 97.9%, and 97.56%. Besides, MobileNets have the smallest memory sizes of 9.1 Mo and the shortest latency of 17.01 s, which can be implemented as needed. The results show the good performance of the top three pretrained models to be used for detecting and localizing slum housing in northern Morocco
Neural Architecture Search for Image Segmentation and Classification
Deep learning (DL) is a class of machine learning algorithms that relies on deep neural networks (DNNs) for computations. Unlike traditional machine learning algorithms, DL can learn from raw data directly and effectively. Hence, DL has been successfully applied to tackle many real-world problems. When applying DL to a given problem, the primary task is designing the optimum DNN. This task relies heavily on human expertise, is time-consuming, and requires many trial-and-error experiments.
This thesis aims to automate the laborious task of designing the optimum DNN by exploring the neural architecture search (NAS) approach. Here, we propose two new NAS algorithms for two real-world problems: pedestrian lane detection for assistive navigation and hyperspectral image segmentation for biosecurity scanning. Additionally, we also introduce a new dataset-agnostic predictor of neural network performance, which can be used to speed-up NAS algorithms that require the evaluation of candidate DNNs
- …