271 research outputs found
Three-dimensional Bone Image Synthesis with Generative Adversarial Networks
Medical image processing has been highlighted as an area where deep
learning-based models have the greatest potential. However, in the medical
field in particular, problems of data availability and privacy are hampering
research progress and thus rapid implementation in clinical routine. The
generation of synthetic data not only ensures privacy, but also allows to
\textit{draw} new patients with specific characteristics, enabling the
development of data-driven models on a much larger scale. This work
demonstrates that three-dimensional generative adversarial networks (GANs) can
be efficiently trained to generate high-resolution medical volumes with finely
detailed voxel-based architectures. In addition, GAN inversion is successfully
implemented for the three-dimensional setting and used for extensive research
on model interpretability and applications such as image morphing, attribute
editing and style mixing. The results are comprehensively validated on a
database of three-dimensional HR-pQCT instances representing the bone
micro-architecture of the distal radius.Comment: Submitted to the journal Artificial Intelligence in Medicin
Image Synthesis under Limited Data: A Survey and Taxonomy
Deep generative models, which target reproducing the given data distribution
to produce novel samples, have made unprecedented advancements in recent years.
Their technical breakthroughs have enabled unparalleled quality in the
synthesis of visual content. However, one critical prerequisite for their
tremendous success is the availability of a sufficient number of training
samples, which requires massive computation resources. When trained on limited
data, generative models tend to suffer from severe performance deterioration
due to overfitting and memorization. Accordingly, researchers have devoted
considerable attention to develop novel models that are capable of generating
plausible and diverse images from limited training data recently. Despite
numerous efforts to enhance training stability and synthesis quality in the
limited data scenarios, there is a lack of a systematic survey that provides 1)
a clear problem definition, critical challenges, and taxonomy of various tasks;
2) an in-depth analysis on the pros, cons, and remain limitations of existing
literature; as well as 3) a thorough discussion on the potential applications
and future directions in the field of image synthesis under limited data. In
order to fill this gap and provide a informative introduction to researchers
who are new to this topic, this survey offers a comprehensive review and a
novel taxonomy on the development of image synthesis under limited data. In
particular, it covers the problem definition, requirements, main solutions,
popular benchmarks, and remain challenges in a comprehensive and all-around
manner.Comment: 230 references, 25 pages. GitHub:
https://github.com/kobeshegu/awesome-few-shot-generatio
Multi-modal Machine Learning in Engineering Design: A Review and Future Directions
In the rapidly advancing field of multi-modal machine learning (MMML), the
convergence of multiple data modalities has the potential to reshape various
applications. This paper presents a comprehensive overview of the current
state, advancements, and challenges of MMML within the sphere of engineering
design. The review begins with a deep dive into five fundamental concepts of
MMML:multi-modal information representation, fusion, alignment, translation,
and co-learning. Following this, we explore the cutting-edge applications of
MMML, placing a particular emphasis on tasks pertinent to engineering design,
such as cross-modal synthesis, multi-modal prediction, and cross-modal
information retrieval. Through this comprehensive overview, we highlight the
inherent challenges in adopting MMML in engineering design, and proffer
potential directions for future research. To spur on the continued evolution of
MMML in engineering design, we advocate for concentrated efforts to construct
extensive multi-modal design datasets, develop effective data-driven MMML
techniques tailored to design applications, and enhance the scalability and
interpretability of MMML models. MMML models, as the next generation of
intelligent design tools, hold a promising future to impact how products are
designed
Disentanglement Learning via Topology
We propose TopDis (Topological Disentanglement), a method for learning
disentangled representations via adding multi-scale topological loss term.
Disentanglement is a crucial property of data representations substantial for
the explainability and robustness of deep learning models and a step towards
high-level cognition. The state-of-the-art method based on VAE minimizes the
total correlation of the joint distribution of latent variables. We take a
different perspective on disentanglement by analyzing topological properties of
data manifolds. In particular, we optimize the topological similarity for data
manifolds traversals. To the best of our knowledge, our paper is the first one
to propose a differentiable topological loss for disentanglement. Our
experiments have shown that the proposed topological loss improves
disentanglement scores such as MIG, FactorVAE score, SAP score and DCI
disentanglement score with respect to state-of-the-art results. Our method
works in an unsupervised manner, permitting to apply it for problems without
labeled factors of variation. Additionally, we show how to use the proposed
topological loss to find disentangled directions in a trained GAN
Human-controllable and structured deep generative models
Deep generative models are a class of probabilistic models that attempts to learn the underlying data distribution. These models are usually trained in an unsupervised way and thus, do not require any labels. Generative models such as Variational Autoencoders and Generative Adversarial Networks have made astounding progress over the last years. These models have several benefits: eased sampling and evaluation, efficient learning of low-dimensional representations for downstream tasks, and better understanding through interpretable representations. However, even though the quality of these models has improved immensely, the ability to control their style and structure is limited. Structured and human-controllable representations of generative models are essential for human-machine interaction and other applications, including fairness, creativity, and entertainment. This thesis investigates learning human-controllable and structured representations with deep generative models. In particular, we focus on generative modelling of 2D images. For the first part, we focus on learning clustered representations. We propose semi-parametric hierarchical variational autoencoders to estimate the intensity of facial action units. The semi-parametric model forms a hybrid generative-discriminative model and leverages both parametric Variational Autoencoder and non-parametric Gaussian Process autoencoder. We show superior performance in comparison with existing facial action unit estimation approaches. Based on the results and analysis of the learned representation, we focus on learning Mixture-of-Gaussians representations in an autoencoding framework. We deviate from the conventional autoencoding framework and consider a regularized objective with the Cauchy-Schwarz divergence. The Cauchy-Schwarz divergence allows a closed-form solution for Mixture-of-Gaussian distributions and, thus, efficiently optimizing the autoencoding objective. We show that our model outperforms existing Variational Autoencoders in density estimation, clustering, and semi-supervised facial action detection. We focus on learning disentangled representations for conditional generation and fair facial attribute classification for the second part. Conditional image generation relies on the accessibility to large-scale annotated datasets. Nevertheless, the geometry of visual objects, such as in faces, cannot be learned implicitly and deteriorate image fidelity. We propose incorporating facial landmarks with a statistical shape model and a differentiable piecewise affine transformation to separate the representation for appearance and shape. The goal of incorporating facial landmarks is that generation is controlled and can separate different appearances and geometries. In our last work, we use weak supervision for disentangling groups of variations. Works on learning disentangled representation have been done in an unsupervised fashion. However, recent works have shown that learning disentangled representations is not identifiable without any inductive biases. Since then, there has been a shift towards weakly-supervised disentanglement learning. We investigate using regularization based on the Kullback-Leiber divergence to disentangle groups of variations. The goal is to have consistent and separated subspaces for different groups, e.g., for content-style learning. Our evaluation shows increased disentanglement abilities and competitive performance for image clustering and fair facial attribute classification with weak supervision compared to supervised and semi-supervised approaches.Open Acces
MddGAN : Multilinear Analysis of the GAN Latent Space
Τα Παραγωγικά Αντιπαλικά Δίκτυα (ΠΑΔ) είναι επί του παρόντος ένα απαραίτητο εργαλείο για σημασιολογική επεξεργασία εικόνας, που χρησιμοποιείται ευρέως σε μια πληθώρα εφαρμογών υπολογιστικής όρασης. Αν και αυτά τα μοντέλα αποδεδειγμένα κωδικοποιούν πλούσια σημασιολογική γνώση στις εσωτερικές τους αναπαραστάσεις, εξακολουθούν να μην έχουν έναν διαισθητικό τρόπο παροχής άμεσου ελέγχου στους χρήστες, προκειμένου να μπορούν να ασκήσουν επιρροή με συνέπεια στο περιεχόμενο της εικόνας εξόδου. Μόλις εξαχθεί αυτή η γνώση ωστόσο, μπορεί να μετατραπεί σε ερμηνεύσιμα από τον άνθρωπο στοιχεία ελέγχου για την αλλαγή των συνθετικών εικόνων με προβλέψιμο τρόπο.
Σε αυτήν την πτυχιακή εργασία, παρουσιάζουμε το MddGAN, μια τεχνική χωρίς επίβλεψη
για την ανάλυση του λανθάνοντος χώρου του GAN και εξαγωγή διανυσματικών
κατευθύνσεων που αντιστοιχούν σε σημαντικούς μετασχηματισμούς εικόνων. Σε αντίθεση με τις υπάρχοντες επιστημονικές εργασίες, εκτελούμε πολυγραμμική αποσύνθεση στα βάρη ενός προεκπαιδευμένου μοντέλου γεννήτριας και υποστηρίζουμε ότι ένα τέτοιο σχέδιο εξερεύνησης μπορεί να είναι περισσότερο κατάλληλο στην αποτύπωση των παραγόντων μεταβλητότητας που έμαθε το μοντέλο με λιγότερo μπέρδεμα. Περαιτέρω, η προτεινόμενη προσέγγιση μπορεί να χωρίσει μαθηματικά την ανακαλυφθείσα σημασιολογία σε ομάδες, ανάλογα με το σημασιολογικό τους περιεχόμενο. Αυτός ο διαχωρισμός γίνεται με εντελώς ανεπιτήρητο τρόπο και ουσιαστικά κάθε διάσταση της παραγόμενης πολυγραμμικής βάσης αντιπροσωπεύει μια τέτοια ομάδα.
Διεξάγοντας πολλά πειράματα σε GAN που έχουν εκπαιδευτεί σε διάφορα σύνολα
δεδομένων, δείχνουμε πως μεταβάλλοντας τον αριθμό των επεξηγηματικών παραγόντων που ανακαλύπτονται στις γενετικές αναπαραστάσεις επηρεάζονται οι σημασιολογικοί χειρισμοί που ανακαλύφθηκαν. Επιπλέον, παρουσιάζουμε πολλές μη τετριμμένες κατευθύνσεις που επισημαίνουν τις δυνατότητες επεξεργασίας της μεθόδου μας. Επιπλέον, συγκρίνουμε το MddGAN με τη τρέχουσα μέθοδο αναφοράς με επίβλεψη και τρέχουσα μέθοδο αναφοράς χωρίς επίβλεψη τόσο ποιοτικά όσο και ποσοτικά. Τα αποτελέσματα δείχνουν ότι η προσέγγισή μας είναι τουλάχιστον εφάμιλλη με αυτές τις μεθόδους.Generative Adversarial Networks (GANs) are currently an indispensable tool for semantic
image editing, being widely used in a plethora of computer vision applications. Although
these models are proven to encode rich semantic knowledge in their internal represent
ations, they still lack an intuitive way to provide direct control to users, so that they can
consistently influence the output image content. Once this knowledge is extracted how
ever, it can be converted to humaninterpretable controls for altering synthesized images
in a predictable way.
In this thesis, we present MddGAN, an unsupervised technique for analyzing the GAN
latent space and extracting vector directions corresponding to meaningful image trans
formations. In contrast to existing works, we perform a multilinear decomposition on the
weights of a pretrained generator, and we argue that such an exploration scheme can be
more suitable in capturing the variability factors learnt with less entanglement. Further
more, the proposed approach can mathematically divide the discovered semantics into
groups, according to their semantic content. This separation happens in a completely
unsupervised way, and essentially each dimension of the produced multilinear basis rep
resents one such group.
By conducting several experiments on GANs trained on various datasets, we show how
varying the number of explanatory factors discovered in the generative representations
affects the semantic manipulations discovered. Moreover, we showcase several non
trivial directions highlighting the editing potential of our method. Furthermore, we compare
MddGAN to the current supervised and unsupervised baselines both qualitatively and
quantitatively. The results indicate that our approach is at least on par with these methods
Human-Centric Deep Generative Models: The Blessing and The Curse
Over the past years, deep neural networks have achieved significant progress in a wide range of real-world applications. In particular, my research puts a focused lens in deep generative models, a neural network solution that proves effective in visual (re)creation. But is generative modeling a niche topic that should be researched on its own? My answer is critically no. In the thesis, I present the two sides of deep generative models, their blessing and their curse to human beings. Regarding what can deep generative models do for us, I demonstrate the improvement in performance and steerability of visual (re)creation. Regarding what can we do for deep generative models, my answer is to mitigate the security concerns of DeepFakes and improve minority inclusion of deep generative models.
For the performance of deep generative models, I probe on applying attention modules and dual contrastive loss to generative adversarial networks (GANs), which pushes photorealistic image generation to a new state of the art. For the steerability, I introduce Texture Mixer, a simple yet effective approach to achieve steerable texture synthesis and blending. For the security, my research spans over a series of GAN fingerprinting solutions that enable the detection and attribution of GAN-generated image misuse. For the inclusion, I investigate the biased misbehavior of generative models and present my solution in enhancing the minority inclusion of GAN models over underrepresented image attributes. All in all, I propose to project actionable insights to the applications of deep generative models, and finally contribute to human-generator interaction
Pushing the limits of Visual Grounding: Pre-training on large synthetic datasets
openVisual Grounding is a crucial computer vision task requiring a deep understanding of data semantics. Leveraging the transformative trend of training controllable generative models, the research aims to demonstrate the substantial improvement of state-of-the-art visual grounding models through the use of massive, synthetically generated data. The study crafts a synthetic dataset using controllable generative models, offering a scalable solution to overcome challenges in traditional data collection processes. The study introduces a synthetic dataset, employing controllable generative models for scalability. Evaluating visual grounding model (TransVG) — on the synthetic dataset showcases promising results, with attributes contributing to a diverse dataset of 250,000 samples. The resulting datasets showcases the impact of synthetic data on visual grounding evolution, contributing to advancements in this dynamic field.Visual Grounding is a crucial computer vision task requiring a deep understanding of data semantics. Leveraging the transformative trend of training controllable generative models, the research aims to demonstrate the substantial improvement of state-of-the-art visual grounding models through the use of massive, synthetically generated data. The study crafts a synthetic dataset using controllable generative models, offering a scalable solution to overcome challenges in traditional data collection processes. The study introduces a synthetic dataset, employing controllable generative models for scalability. Evaluating visual grounding model (TransVG) — on the synthetic dataset showcases promising results, with attributes contributing to a diverse dataset of 250,000 samples. The resulting datasets showcases the impact of synthetic data on visual grounding evolution, contributing to advancements in this dynamic field
- …