22 research outputs found
Cross-dataset domain adaptation for the classification COVID-19 using chest computed tomography images
Detecting COVID-19 patients using Computed Tomography (CT) images of the
lungs is an active area of research. Datasets of CT images from COVID-19
patients are becoming available. Deep learning (DL) solutions and in particular
Convolutional Neural Networks (CNN) have achieved impressive results for the
classification of COVID-19 CT images, but only when the training and testing
take place within the same dataset. Work on the cross-dataset problem is still
limited and the achieved results are low. Our work tackles the cross-dataset
problem through a Domain Adaptation (DA) technique with deep learning. Our
proposed solution, COVID19-DANet, is based on pre-trained CNN backbone for
feature extraction. For this task, we select the pre-trained Efficientnet-B3
CNN because it has achieved impressive classification accuracy in previous
work. The backbone CNN is followed by a prototypical layer which is a concept
borrowed from prototypical networks in few-shot learning (FSL). It computes a
cosine distance between given samples and the class prototypes and then
converts them to class probabilities using the Softmax function. To train the
COVID19-DANet model, we propose a combined loss function that is composed of
the standard cross-entropy loss for class discrimination and another entropy
loss computed over the unlabelled target set only. This so-called unlabelled
target entropy loss is minimized and maximized in an alternative fashion, to
reach the two objectives of class discrimination and domain invariance.
COVID19-DANet is tested under four cross-dataset scenarios using the
SARS-CoV-2-CT and COVID19-CT datasets and has achieved encouraging results
compared to recent work in the literature.Comment: 31 pages, 15 figure
Light Weight Residual Convolutional Neural Network for Atrial Fibrillation Detection in Single-lead ECG Recordings
Electrocardiogram (ECG) analysis constitutes the most important approach able to classify heart infarction anomalies. These anomalies can be identified from the changes in various features of the ECG signal. In this paper, we are proposing a new features-based classification method of short-time single-lead ECG signals. The goal of this method is to classify these ECG signal into one of the following classes: normal, atrial fibrillation, other abnormalities, and too noisy as defined by the dataset. This is a challenging problem because of the severe imbalance between the classes, where the normal class makes up the majority of the samples in the dataset. The second challenge in this dataset is the fact that the sample ECG signals have a variable length (it varies between 3 to 60 seconds). The proposed method considers three main processes. The first process consists of detecting inverted ECG record by analyzing the signal range and mean in a sliding-window. The second process involves the extraction of many features effective in characterizing ECG signals and detecting abnormalities. These features include morphological, Heart Rate Variability, statistical, time/frequency amplitudes, and special Atrial Fibrillation (AF) features. The third process represents the main contribution by designing a lightweight residual Convolutional Neural Network (CNN) model for the classification of short-time single-lead ECG signals. This model is composed of five layers with two residual connections where advanced CNN concepts such as Batch Normalization, DropOut, and Leaky-ReLU are used. Compared to state-of-the-art solutions, the proposed method achieved the best performance with F1-score of 95.11% using inversion correction
SSDAN: Multi-Source Semi-Supervised Domain Adaptation Network for Remote Sensing Scene Classification
We present a new method for multi-source semi-supervised domain adaptation in remote sensing scene classification. The method consists of a pre-trained convolutional neural network (CNN) model, namely EfficientNet-B3, for the extraction of highly discriminative features, followed by a classification module that learns feature prototypes for each class. Then, the classification module computes a cosine distance between feature vectors of target data samples and the feature prototypes. Finally, the proposed method ends with a Softmax activation function that converts the distances into class probabilities. The feature prototypes are also divided by a temperature parameter to normalize and control the classification module. The whole model is trained on both the unlabeled and labeled target samples. It is trained to predict the correct classes utilizing the standard cross-entropy loss computed over the labeled source and target samples. At the same time, the model is trained to learn domain invariant features using another loss function based on entropy computed over the unlabeled target samples. Unlike the standard cross-entropy loss, the new entropy loss function is computed on the model’s predicted probabilities and does not need the true labels. This entropy loss, called minimax loss, needs to be maximized with respect to the classification module to learn features that are domain-invariant (hence removing the data shift), and at the same time, it should be minimized with respect to the CNN feature extractor to learn discriminative features that are clustered around the class prototypes (in other words reducing intra-class variance). To accomplish these maximization and minimization processes at the same time, we use an adversarial training approach, where we alternate between the two processes. The model combines the standard cross-entropy loss and the new minimax entropy loss and optimizes them jointly. The proposed method is tested on four RS scene datasets, namely UC Merced, AID, RESISC45, and PatternNet, using two-source and three-source domain adaptation scenarios. The experimental results demonstrate the strong capability of the proposed method to achieve impressive performance despite using only a few (six in our case) labeled target samples per class. Its performance is already better than several state-of-the-art methods, including RevGrad, ADDA, Siamese-GAN, and MSCN
Scene Description for Visually Impaired People with Multi-Label Convolutional SVM Networks
In this paper, we present a portable camera-based method for helping visually impaired (VI) people to recognize multiple objects in images. This method relies on a novel multi-label convolutional support vector machine (CSVM) network for coarse description of images. The core idea of CSVM is to use a set of linear SVMs as filter banks for feature map generation. During the training phase, the weights of the SVM filters are obtained using a forward-supervised learning strategy unlike the backpropagation algorithm used in standard convolutional neural networks (CNNs). To handle multi-label detection, we introduce a multi-branch CSVM architecture, where each branch will be used for detecting one object in the image. This architecture exploits the correlation between the objects present in the image by means of an opportune fusion mechanism of the intermediate outputs provided by the convolution layers of each branch. The high-level reasoning of the network is done through binary classification SVMs for predicting the presence/absence of objects in the image. The experiments obtained on two indoor datasets and one outdoor dataset acquired from a portable camera mounted on a lightweight shield worn by the user, and connected via a USB wire to a laptop processing unit are reported and discussed
An automatic approach for palm tree counting in UAV images
In this paper, we develop an automatic method for counting palm trees in UAV images. First we extract a set of keypoints using the Scale Invariant Feature Transform (SIFT). Then, we analyze these keypoints with an Extreme Learning Machine (ELM) classifier a priori trained on a set of palm and no-palm keypoints. As output, the ELM classifier will mark each detected palm tree by several keypoints. Then, in order to capture the shape of each tree, we propose to merge these keypoints with an active contour method based on level-sets (LS). Finally, we further analyze the texture of the regions obtained by LS with local binary patterns (LBPs) to distinguish palm trees from other vegetations. Experimental results obtained on a UAV image acquired over a palm farm are reported and discussed
Unified Generative Adversarial Networks for Multidomain Fingerprint Presentation Attack Detection
With the rapid growth of fingerprint-based biometric systems, it is essential to ensure the security and reliability of the deployed algorithms. Indeed, the security vulnerability of these systems has been widely recognized. Thus, it is critical to enhance the generalization ability of fingerprint presentation attack detection (PAD) cross-sensor and cross-material settings. In this work, we propose a novel solution for addressing the case of a single source domain (sensor) with large labeled real/fake fingerprint images and multiple target domains (sensors) with only few real images obtained from different sensors. Our aim is to build a model that leverages the limited sample issues in all target domains by transferring knowledge from the source domain. To this end, we train a unified generative adversarial network (UGAN) for multidomain conversion to learn several mappings between all domains. This allows us to generate additional synthetic images for the target domains from the source domain to reduce the distribution shift between fingerprint representations. Then, we train a scale compound network (EfficientNetV2) coupled with multiple head classifiers (one classifier for each domain) using the source domain and the translated images. The outputs of these classifiers are then aggregated using an additional fusion layer with learnable weights. In the experiments, we validate the proposed methodology on the public LivDet2015 dataset. The experimental results show that the proposed method improves the average classification accuracy over twelve classification scenarios from 67.80 to 80.44% after adaptation
Tile-Based Semisupervised Classification of Large-Scale VHR Remote Sensing Images
This paper deals with the problem of the classification of large-scale very high-resolution (VHR) remote sensing (RS) images in a semisupervised scenario, where we have a limited training set (less than ten training samples per class). Typical pixel-based classification methods are unfeasible for large-scale VHR images. Thus, as a practical and efficient solution, we propose to subdivide the large image into a grid of tiles and then classify the tiles instead of classifying pixels. Our proposed method uses the power of a pretrained convolutional neural network (CNN) to first extract descriptive features from each tile. Next, a neural network classifier (composed of 2 fully connected layers) is trained in a semisupervised fashion and used to classify all remaining tiles in the image. This basically presents a coarse classification of the image, which is sufficient for many RS application. The second contribution deals with the employment of the semisupervised learning to improve the classification accuracy. We present a novel semisupervised approach which exploits both the spectral and spatial relationships embedded in the remaining unlabelled tiles. In particular, we embed a spectral graph Laplacian in the hidden layer of the neural network. In addition, we apply regularization of the output labels using a spatial graph Laplacian and the random Walker algorithm. Experimental results obtained by testing the method on two large-scale images acquired by the IKONOS2 sensor reveal promising capabilities of this method in terms of classification accuracy even with less than ten training samples per class
Efficient Framework for Palm Tree Detection in UAV Images
The latest developments in unmanned aerial vehicles (UAVs) and associated sensing systems make these platforms increasingly attractive to the remote sensing community. The large amount of spatial details contained in these images opens the door for advanced monitoring applications. In this paper, we use this cost-effective and attractive technology for the automatic detection of palm trees. Given a UAV image acquired over a palm farm, first we extract a set of keypoints using the Scale-invariant Feature Transform (SIFT). Then, we analyze these keypoints with an extreme learning machine (ELM) classifier a priori trained on a set of palm and no-palm keypoints. As output, the ELM classifier will mark each detected palm tree by several keypoints. Then, in order to capture the shape of each tree, we propose to merge these keypoints with an active contour method based on level sets (LSs). Finally, we further analyze the texture of the regions obtained by LS with local binary patterns (LBPs) to distinguish palm trees from other vegetations. Experimental results obtained on UAV images with 3.5 cm of spatial resolution and acquired over two different farms confirm the promising capabilities of the proposed framework
Self-supervised learning for remote sensing scene classification under the few shot scenario
Abstract Scene classification is a crucial research problem in remote sensing (RS) that has attracted many researchers recently. It has many challenges due to multiple issues, such as: the complexity of remote sensing scenes, the classes overlapping (as a scene may contain objects that belong to foreign classes), and the difficulty of gaining sufficient labeled scenes. Deep learning (DL) solutions and in particular convolutional neural networks (CNN) are now state-of-the-art solution in RS scene classification; however, CNN models need huge amounts of annotated data, which can be costly and time-consuming. On the other hand, it is relatively easy to acquire large amounts of unlabeled images. Recently, Self-Supervised Learning (SSL) is proposed as a method that can learn from unlabeled images, potentially reducing the need for labeling. In this work, we propose a deep SSL method, called RS-FewShotSSL, for RS scene classification under the few shot scenario when we only have a few (less than 20) labeled scenes per class. Under this scenario, typical DL solutions that fine-tune CNN models, pre-trained on the ImageNet dataset, fail dramatically. In the SSL paradigm, a DL model is pre-trained from scratch during the pretext task using the large amounts of unlabeled scenes. Then, during the main or the so-called downstream task, the model is fine-tuned on the labeled scenes. Our proposed RS-FewShotSSL solution is composed of an online network and a target network both using the EfficientNet-B3 CNN model as a feature encoder backbone. During the pretext task, RS-FewShotSSL learns discriminative features from the unlabeled images using cross-view contrastive learning. Different views are generated from each image using geometric transformations and passed to the online and target networks. Then, the whole model is optimized by minimizing the cross-view distance between the online and target networks. To address the problem of limited computation resources available to us, our proposed method uses a novel DL architecture that can be trained using both high-resolution and low-resolution images. During the pretext task, RS-FewShotSSL is trained using low-resolution images, thereby, allowing for larger batch sizes which significantly boosts the performance of the proposed pipeline on the task of RS classification. In the downstream task, the target network is discarded, and the online network is fine-tuned using the few labeled shots or scenes. Here, we use smaller batches of both high-resolution and low-resolution images. This architecture allows RS-FewshotSSL to benefit from both large batch sizes and full image sizes, thereby learning from the large amounts of unlabeled data in an effective way. We tested RS-FewShotSSL on three RS public datasets, and it demonstrated a significant improvement compared to other state-of-the-art methods such as: SimCLR, MoCo, BYOL and IDSSL
UAV Image Multi-Labeling with Data-Efficient Transformers
In this paper, we present an approach for the multi-label classification of remote sensing images based on data-efficient transformers. During the training phase, we generated a second view for each image from the training set using data augmentation. Then, both the image and its augmented version were reshaped into a sequence of flattened patches and then fed to the transformer encoder. The latter extracts a compact feature representation from each image with the help of a self-attention mechanism, which can handle the global dependencies between different regions of the high-resolution aerial image. On the top of the encoder, we mounted two classifiers, a token and a distiller classifier. During training, we minimized a global loss consisting of two terms, each corresponding to one of the two classifiers. In the test phase, we considered the average of the two classifiers as the final class labels. Experiments on two datasets acquired over the cities of Trento and Civezzano with a ground resolution of two-centimeter demonstrated the effectiveness of the proposed model