16 research outputs found
Synchronous Image-Label Diffusion Probability Model with Application to Stroke Lesion Segmentation on Non-contrast CT
Stroke lesion volume is a key radiologic measurement for assessing the
prognosis of Acute Ischemic Stroke (AIS) patients, which is challenging to be
automatically measured on Non-Contrast CT (NCCT) scans. Recent diffusion
probabilistic models have shown potentials of being used for image
segmentation. In this paper, a novel Synchronous image-label Diffusion
Probability Model (SDPM) is proposed for stroke lesion segmentation on NCCT
using Markov diffusion process. The proposed SDPM is fully based on a Latent
Variable Model (LVM), offering a complete probabilistic elaboration. An
additional net-stream, parallel with a noise prediction stream, is introduced
to obtain initial noisy label estimates for efficiently inferring the final
labels. By optimizing the specified variational boundaries, the trained model
can infer multiple label estimates for reference given the input images with
noises. The proposed model was assessed on three stroke lesion datasets
including one public and two private datasets. Compared to several U-net and
transformer-based segmentation methods, our proposed SDPM model is able to
achieve state-of-the-art performance. The code is publicly available
Towards Low-Resource StarGAN Voice Conversion using Weight Adaptive Instance Normalization
Many-to-many voice conversion with non-parallel training data has seen
significant progress in recent years. StarGAN-based models have been interests
of voice conversion. However, most of the StarGAN-based methods only focused on
voice conversion experiments for the situations where the number of speakers
was small, and the amount of training data was large. In this work, we aim at
improving the data efficiency of the model and achieving a many-to-many
non-parallel StarGAN-based voice conversion for a relatively large number of
speakers with limited training samples. In order to improve data efficiency,
the proposed model uses a speaker encoder for extracting speaker embeddings and
conducts adaptive instance normalization (AdaIN) on convolutional weights.
Experiments are conducted with 109 speakers under two low-resource situations,
where the number of training samples is 20 and 5 per speaker. An objective
evaluation shows the proposed model is better than the baseline methods.
Furthermore, a subjective evaluation shows that, for both naturalness and
similarity, the proposed model outperforms the baseline method.Comment: Accepted by ICASSP202
Quantifying and Learning Disentangled Representations with Limited Supervision
Learning low-dimensional representations that disentangle the underlying
factors of variation in data has been posited as an important step towards
interpretable machine learning with good generalization. To address the fact
that there is no consensus on what disentanglement entails, Higgins et al.
(2018) propose a formal definition for Linear Symmetry-Based Disentanglement,
or LSBD, arguing that underlying real-world transformations give exploitable
structure to data.
Although several works focus on learning LSBD representations, such methods
require supervision on the underlying transformations for the entire dataset,
and cannot deal with unlabeled data. Moreover, none of these works provide a
metric to quantify LSBD.
We propose a metric to quantify LSBD representations that is easy to compute
under certain well-defined assumptions. Furthermore, we present a method that
can leverage unlabeled data, such that LSBD representations can be learned with
limited supervision on transformations. Using our LSBD metric, our results show
that limited supervision is indeed sufficient to learn LSBD representations