216 research outputs found
Synthetic Aperture Radar (SAR) Meets Deep Learning
This reprint focuses on the application of the combination of synthetic aperture radars and depth learning technology. It aims to further promote the development of SAR image intelligent interpretation technology. A synthetic aperture radar (SAR) is an important active microwave imaging sensor, whose all-day and all-weather working capacity give it an important place in the remote sensing community. Since the United States launched the first SAR satellite, SAR has received much attention in the remote sensing community, e.g., in geological exploration, topographic mapping, disaster forecast, and traffic monitoring. It is valuable and meaningful, therefore, to study SAR-based remote sensing applications. In recent years, deep learning represented by convolution neural networks has promoted significant progress in the computer vision community, e.g., in face recognition, the driverless field and Internet of things (IoT). Deep learning can enable computational models with multiple processing layers to learn data representations with multiple-level abstractions. This can greatly improve the performance of various applications. This reprint provides a platform for researchers to handle the above significant challenges and present their innovative and cutting-edge research results when applying deep learning to SAR in various manuscript types, e.g., articles, letters, reviews and technical reports
3D GANs and Latent Space: A comprehensive survey
Generative Adversarial Networks (GANs) have emerged as a significant player
in generative modeling by mapping lower-dimensional random noise to
higher-dimensional spaces. These networks have been used to generate
high-resolution images and 3D objects. The efficient modeling of 3D objects and
human faces is crucial in the development process of 3D graphical environments
such as games or simulations. 3D GANs are a new type of generative model used
for 3D reconstruction, point cloud reconstruction, and 3D semantic scene
completion. The choice of distribution for noise is critical as it represents
the latent space. Understanding a GAN's latent space is essential for
fine-tuning the generated samples, as demonstrated by the morphing of
semantically meaningful parts of images. In this work, we explore the latent
space and 3D GANs, examine several GAN variants and training methods to gain
insights into improving 3D GAN training, and suggest potential future
directions for further research
Survey on Unsupervised Domain Adaptation for Semantic Segmentation for Visual Perception in Automated Driving
Deep neural networks (DNNs) have proven their capabilities in the past years and play a significant role in environment perception for the challenging application of automated driving. They are employed for tasks such as detection, semantic segmentation, and sensor fusion. Despite tremendous research efforts, several issues still need to be addressed that limit the applicability of DNNs in automated driving. The bad generalization of DNNs to unseen domains is a major problem on the way to a safe, large-scale application, because manual annotation of new domains is costly, particularly for semantic segmentation. For this reason, methods are required to adapt DNNs to new domains without labeling effort. This task is termed unsupervised domain adaptation (UDA). While several different domain shifts challenge DNNs, the shift between synthetic and real data is of particular importance for automated driving, as it allows the use of simulation environments for DNN training. We present an overview of the current state of the art in this research field. We categorize and explain the different approaches for UDA. The number of considered
publications is larger than any other survey on this topic. We also go far beyond the description of the UDA state-of-the-art, as we present a quantitative comparison of approaches and point out the latest trends in this field. We conduct a critical analysis of the state-of-the-art and highlight promising future research directions. With this survey, we aim to facilitate UDA research further and encourage scientists to exploit novel research directions
Proceedings of the 8th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2023)
This volume gathers the papers presented at the Detection and Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023), Tampere, Finland, during 21–22 September 2023
The influence of graphical user interface on motion onset brain-computer interface performance and the effect of data augmentation on motor imagery brain-computer interface
Motor Imagery Brain Computer Interface (MI BCI) is one of the most frequently used BCI modalities, due to the versatility of its applications. However, it still has unresolved issues like time-consuming calibration, low information transfer rate, and inconsistent performance across individuals. Combining MI BCI with Motion Onset Visual Evoked Potential (mVEP) BCI in a hybrid structure may solve some of these problems. Combining MI BCI with more robust mVEP BCI, would increase the degrees of freedom thereby increasing the information transfer rate, and would also indirectly improve intrasubject consistency in performance by replacing some MI-based tasks with mVEP. Unfortunately, due to Covid -19 pandemic experimental research on hybrid BCI was not possible, therefore this thesis focuses on two BCI separately.
Chapter 1 provides an overview of different BCIs modalities and the underlying neurophysiological principles, followed by the objectives of the thesis. The research contributions are also highlighted. Finally, the thesis outlines are presented at the end of this chapter. Chapter 2 presents a comprehensive state of the art to the thesis, drawing on a wide range of literature in relevant fields. Specifically, it delves into MI BCI, mVEP BCI, Deep Learning, Transfer Learning (TL), Data Augmentation (DA) and Generative Adversarial Networks (GANs). Chapter 3 investigates the effect of graphical elements, in online and offline experiments. In the offline experiment, graphical elements such as the color, size, position, and layout were explored. Replacing a default red moving bar with a green and blue bar, changing the background color from white to gray, and using smaller visual angles did not lead to statistically significant improvement in accuracy. However, the effect size of η2 (0.085) indicated a moderate effect for these changes of graphical factors. Similarly, no statistically significant difference was found for the two different layouts in online experiments. Overall, the mVEP BCI has achieved a classification accuracy of approximately 80%, and it is relatively impervious to changes in graphical interface parameters. This suggests that mVEP is a promising candidate for a hybrid BCI system combined with MI, that requires dynamic, versatile graphical design features. In Chapter 4, various DA methods are explored, including Segmentation and Recombination in Time Domain, Segmentation and Recombination in Time-Frequency Domain, and Spatial Analogy. These methods are evaluated based on three feature extraction approaches: Common Spatial Patterns, Time Domain Parameters (TDP), and Band Power. The evaluation was conducted using a validated BCI set, namely the BCI Competition IV dataset 2a, as well as a dataset obtained from our research group. The methods are effective when a small dataset of single subject are available. All three DA methods significantly affect the performance of the TDP feature extraction method. Chapter 5 explored the use of GANs for DA in combination with TL and cropped training strategies using ShallowFBCSP classifier. It also used the same validated dataset (BCI competition IV dataset 2a) as in Chapter 4. In contrast to DA method explored in Chapter 4, this DA is suitable for larger datasets and for generalizing training based on other people’s data. Applying GAN-based DA to the dataset resulted on average in a 2% improvement in average accuracy (from 68.2% to 70.7%). This study provides a novel method to enable MI GAN training with only 40 trials per participant with the rest 8 people’s data for TL, addressing the data insufficiency issue for GANs. The evaluation of generated artificial trials revealed the importance of inter-class differences in MI patterns, which can be easily identified by GANs.
Overall the thesis addressed the main practical issues of both mVEP and MI BCI paving the way for their successful combination in future experiments
DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning
Diffusion models have proven to be highly effective in generating
high-quality images. However, adapting large pre-trained diffusion models to
new domains remains an open challenge, which is critical for real-world
applications. This paper proposes DiffFit, a parameter-efficient strategy to
fine-tune large pre-trained diffusion models that enable fast adaptation to new
domains. DiffFit is embarrassingly simple that only fine-tunes the bias term
and newly-added scaling factors in specific layers, yet resulting in
significant training speed-up and reduced model storage costs. Compared with
full fine-tuning, DiffFit achieves 2 training speed-up and only needs
to store approximately 0.12\% of the total model parameters. Intuitive
theoretical analysis has been provided to justify the efficacy of scaling
factors on fast adaptation. On 8 downstream datasets, DiffFit achieves superior
or competitive performances compared to the full fine-tuning while being more
efficient. Remarkably, we show that DiffFit can adapt a pre-trained
low-resolution generative model to a high-resolution one by adding minimal
cost. Among diffusion-based methods, DiffFit sets a new state-of-the-art FID of
3.02 on ImageNet 512512 benchmark by fine-tuning only 25 epochs from a
public pre-trained ImageNet 256256 checkpoint while being 30
more training efficient than the closest competitor.Comment: Tech Repor
Deep Neural Networks and Tabular Data: Inference, Generation, and Explainability
Over the last decade, deep neural networks have enabled remarkable technological advancements, potentially transforming a wide range of aspects of our lives in the future. It is becoming increasingly common for deep-learning models to be used in a variety of situations in the modern life, ranging from search and recommendations to financial and healthcare solutions, and the number of applications utilizing deep neural networks is still on the rise.
However, a lot of recent research efforts in deep learning have focused primarily on neural networks and domains in which they excel. This includes computer vision, audio processing, and natural language processing. It is a general tendency for data in these areas to be homogeneous, whereas heterogeneous tabular datasets have received relatively scant attention despite the fact that they are extremely prevalent. In fact, more than half of the datasets on the Google dataset platform are structured and can be represented in a tabular form.
The first aim of this study is to provide a thoughtful and comprehensive analysis of deep neural networks' application to modeling and generating tabular data. Apart from that, an open-source performance benchmark on tabular data is presented, where we thoroughly compare over twenty machine and deep learning models on heterogeneous tabular datasets.
The second contribution relates to synthetic tabular data generation. Inspired by their success in other homogeneous data modalities, deep generative models such as variational autoencoders and generative adversarial networks are also commonly applied for tabular data generation. However, the use of Transformer-based large language models (which are also generative) for tabular data generation have been received scant research attention. Our contribution to this literature consists of the development of a novel method for generating tabular data based on this family of autoregressive generative models that, on multiple challenging benchmarks, outperformed the current state-of-the-art methods for tabular data generation.
Another crucial aspect for a deep-learning data system is that it needs to be reliable and trustworthy to gain broader acceptance in practice, especially in life-critical fields. One of the possible ways to bring trust into a data-driven system is to use explainable machine-learning methods.
In spite of this, the current explanation methods often fail to provide robust explanations due to their high sensitivity to the hyperparameter selection or even changes of the random seed. Furthermore, most of these methods are based on feature-wise importance, ignoring the crucial relationship between variables in a sample. The third aim of this work is to address both of these issues by offering more robust and stable explanations, as well as taking into account the relationships between variables using a graph structure.
In summary, this thesis made a significant contribution that touched many areas related to deep neural networks and heterogeneous tabular data as well as the usage of explainable machine learning methods
Generalizable deep learning based medical image segmentation
Deep learning is revolutionizing medical image analysis and interpretation. However, its real-world deployment is often hindered by the poor generalization to unseen domains (new imaging modalities and protocols). This lack of generalization ability is further exacerbated by the scarcity of labeled datasets for training: Data collection and annotation can be prohibitively expensive in terms of labor and costs because label quality heavily dependents on the expertise of radiologists. Additionally, unreliable predictions caused by poor model generalization pose safety risks to clinical downstream applications.
To mitigate labeling requirements, we investigate and develop a series of techniques to strengthen the generalization ability and the data efficiency of deep medical image computing models. We further improve model accountability and identify unreliable predictions made on out-of-domain data, by designing probability calibration techniques.
In the first and the second part of thesis, we discuss two types of problems for handling unexpected domains: unsupervised domain adaptation and single-source domain generalization. For domain adaptation we present a data-efficient technique that adapts a segmentation model trained on a labeled source domain (e.g., MRI) to an unlabeled target domain (e.g., CT), using a small number of unlabeled training images from the target domain.
For domain generalization, we focus on both image reconstruction and segmentation. For image reconstruction, we design a simple and effective domain generalization technique for cross-domain MRI reconstruction, by reusing image representations learned from natural image datasets. For image segmentation, we perform causal analysis of the challenging cross-domain image segmentation problem. Guided by this causal analysis we propose an effective data-augmentation-based generalization technique for single-source domains. The proposed method outperforms existing approaches on a large variety of cross-domain image segmentation scenarios.
In the third part of the thesis, we present a novel self-supervised method for learning generic image representations that can be used to analyze unexpected objects of interest. The proposed method is designed together with a novel few-shot image segmentation framework that can segment unseen objects of interest by taking only a few labeled examples as references. Superior flexibility over conventional fully-supervised models is demonstrated by our few-shot framework: it does not require any fine-tuning on novel objects of interest. We further build a publicly available comprehensive evaluation environment for few-shot medical image segmentation.
In the fourth part of the thesis, we present a novel probability calibration model. To ensure safety in clinical settings, a deep model is expected to be able to alert human radiologists if it has low confidence, especially when confronted with out-of-domain data. To this end we present a plug-and-play model to calibrate prediction probabilities on out-of-domain data. It aligns the prediction probability in line with the actual accuracy on the test data. We evaluate our method on both artifact-corrupted images and images from an unforeseen MRI scanning protocol. Our method demonstrates improved calibration accuracy compared with the state-of-the-art method.
Finally, we summarize the major contributions and limitations of our works. We also suggest future research directions that will benefit from the works in this thesis.Open Acces
- …