171 research outputs found
Unsupervised Domain Adaptation for Acoustic Scene Classification Using Band-Wise Statistics Matching
The performance of machine learning algorithms is known to be negatively
affected by possible mismatches between training (source) and test (target)
data distributions. In fact, this problem emerges whenever an acoustic scene
classification system which has been trained on data recorded by a given device
is applied to samples acquired under different acoustic conditions or captured
by mismatched recording devices. To address this issue, we propose an
unsupervised domain adaptation method that consists of aligning the first- and
second-order sample statistics of each frequency band of target-domain acoustic
scenes to the ones of the source-domain training dataset. This model-agnostic
approach is devised to adapt audio samples from unseen devices before they are
fed to a pre-trained classifier, thus avoiding any further learning phase.
Using the DCASE 2018 Task 1-B development dataset, we show that the proposed
method outperforms the state-of-the-art unsupervised methods found in the
literature in terms of both source- and target-domain classification accuracy.Comment: 5 pages, 1 figure, 3 tables, submitted to EUSIPCO 202
Privacy Preserving Domain Adaptation for Semantic Segmentation of Medical Images
Convolutional neural networks (CNNs) have led to significant improvements in
tasks involving semantic segmentation of images. CNNs are vulnerable in the
area of biomedical image segmentation because of distributional gap between two
source and target domains with different data modalities which leads to domain
shift. Domain shift makes data annotations in new modalities necessary because
models must be retrained from scratch. Unsupervised domain adaptation (UDA) is
proposed to adapt a model to new modalities using solely unlabeled target
domain data. Common UDA algorithms require access to data points in the source
domain which may not be feasible in medical imaging due to privacy concerns. In
this work, we develop an algorithm for UDA in a privacy-constrained setting,
where the source domain data is inaccessible. Our idea is based on encoding the
information from the source samples into a prototypical distribution that is
used as an intermediate distribution for aligning the target domain
distribution with the source domain distribution. We demonstrate the
effectiveness of our algorithm by comparing it to state-of-the-art medical
image semantic segmentation approaches on two medical image semantic
segmentation datasets
Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene Classification
In this paper, we propose a domain adaptation framework to address the device
mismatch issue in acoustic scene classification leveraging upon neural label
embedding (NLE) and relational teacher student learning (RTSL). Taking into
account the structural relationships between acoustic scene classes, our
proposed framework captures such relationships which are intrinsically
device-independent. In the training stage, transferable knowledge is condensed
in NLE from the source domain. Next in the adaptation stage, a novel RTSL
strategy is adopted to learn adapted target models without using paired
source-target data often required in conventional teacher student learning. The
proposed framework is evaluated on the DCASE 2018 Task1b data set. Experimental
results based on AlexNet-L deep classification models confirm the effectiveness
of our proposed approach for mismatch situations. NLE-alone adaptation compares
favourably with the conventional device adaptation and teacher student based
adaptation techniques. NLE with RTSL further improves the classification
accuracy.Comment: Accepted by Interspeech 202
Unsupervised Model Adaptation for Source-free Segmentation of Medical Images
The recent prevalence of deep neural networks has lead semantic segmentation
networks to achieve human-level performance in the medical field when
sufficient training data is provided. Such networks however fail to generalize
when tasked with predicting semantic maps for out-of-distribution images,
requiring model re-training on the new distributions. This expensive process
necessitates expert knowledge in order to generate training labels.
Distribution shifts can arise naturally in the medical field via the choice of
imaging device, i.e. MRI or CT scanners. To combat the need for labeling images
in a target domain after a model is successfully trained in a fully annotated
\textit{source domain} with a different data distribution, unsupervised domain
adaptation (UDA) can be used. Most UDA approaches ensure target generalization
by creating a shared source/target latent feature space. This allows a source
trained classifier to maintain performance on the target domain. However most
UDA approaches require joint source and target data access, which may create
privacy leaks with respect to patient information. We propose an UDA algorithm
for medical image segmentation that does not require access to source data
during adaptation, and is thus capable in maintaining patient data privacy. We
rely on an approximation of the source latent features at adaptation time, and
create a joint source/target embedding space by minimizing a distributional
distance metric based on optimal transport. We demonstrate our approach is
competitive to recent UDA medical segmentation works even with the added
privacy requisite
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
ResiDualGAN: Resize-Residual DualGAN for Cross-Domain Remote Sensing Images Semantic Segmentation
The performance of a semantic segmentation model for remote sensing (RS)
images pretrained on an annotated dataset would greatly decrease when testing
on another unannotated dataset because of the domain gap. Adversarial
generative methods, e.g., DualGAN, are utilized for unpaired image-to-image
translation to minimize the pixel-level domain gap, which is one of the common
approaches for unsupervised domain adaptation (UDA). However, the existing
image translation methods are facing two problems when performing RS images
translation: 1) ignoring the scale discrepancy between two RS datasets which
greatly affects the accuracy performance of scale-invariant objects, 2)
ignoring the characteristic of real-to-real translation of RS images which
brings an unstable factor for the training of the models. In this paper,
ResiDualGAN is proposed for RS images translation, where an in-network resizer
module is used for addressing the scale discrepancy of RS datasets, and a
residual connection is used for strengthening the stability of real-to-real
images translation and improving the performance in cross-domain semantic
segmentation tasks. Combined with an output space adaptation method, the
proposed method greatly improves the accuracy performance on common benchmarks,
which demonstrates the superiority and reliability of ResiDuanGAN. At the end
of the paper, a thorough discussion is also conducted to give a reasonable
explanation for the improvement of ResiDualGAN. Our source code is available at
https://github.com/miemieyanga/ResiDualGAN-DRDG
- …