Search CORE

271 research outputs found

Three-dimensional Bone Image Synthesis with Generative Adversarial Networks

Author: Angermann Christoph
Bereiter-Payr Johannes
Degenhart Gerald
Haltmeier Markus
Stock Kerstin
Publication venue
Publication date: 26/10/2023
Field of study

Medical image processing has been highlighted as an area where deep learning-based models have the greatest potential. However, in the medical field in particular, problems of data availability and privacy are hampering research progress and thus rapid implementation in clinical routine. The generation of synthetic data not only ensures privacy, but also allows to \textit{draw} new patients with specific characteristics, enabling the development of data-driven models on a much larger scale. This work demonstrates that three-dimensional generative adversarial networks (GANs) can be efficiently trained to generate high-resolution medical volumes with finely detailed voxel-based architectures. In addition, GAN inversion is successfully implemented for the three-dimensional setting and used for extensive research on model interpretability and applications such as image morphing, attribute editing and style mixing. The results are comprehensively validated on a database of three-dimensional HR-pQCT instances representing the bone micro-architecture of the distal radius.Comment: Submitted to the journal Artificial Intelligence in Medicin

arXiv.org e-Print Archive

Image Synthesis under Limited Data: A Survey and Taxonomy

Author: Wang Zhe
Yang Mengping
Publication venue
Publication date: 31/07/2023
Field of study

Deep generative models, which target reproducing the given data distribution to produce novel samples, have made unprecedented advancements in recent years. Their technical breakthroughs have enabled unparalleled quality in the synthesis of visual content. However, one critical prerequisite for their tremendous success is the availability of a sufficient number of training samples, which requires massive computation resources. When trained on limited data, generative models tend to suffer from severe performance deterioration due to overfitting and memorization. Accordingly, researchers have devoted considerable attention to develop novel models that are capable of generating plausible and diverse images from limited training data recently. Despite numerous efforts to enhance training stability and synthesis quality in the limited data scenarios, there is a lack of a systematic survey that provides 1) a clear problem definition, critical challenges, and taxonomy of various tasks; 2) an in-depth analysis on the pros, cons, and remain limitations of existing literature; as well as 3) a thorough discussion on the potential applications and future directions in the field of image synthesis under limited data. In order to fill this gap and provide a informative introduction to researchers who are new to this topic, this survey offers a comprehensive review and a novel taxonomy on the development of image synthesis under limited data. In particular, it covers the problem definition, requirements, main solutions, popular benchmarks, and remain challenges in a comprehensive and all-around manner.Comment: 230 references, 25 pages. GitHub: https://github.com/kobeshegu/awesome-few-shot-generatio

arXiv.org e-Print Archive

Music video generation with machine learning

Author: Luís Miguel Ferreira Trinta
Publication venue
Publication date: 21/11/2022
Field of study

Repositório Aberto da Universidade do Porto

Multi-modal Machine Learning in Engineering Design: A Review and Future Directions

Author: Ahmed Faez
Song Binyang
Zhou Rui
Publication venue
Publication date: 28/07/2023
Field of study

In the rapidly advancing field of multi-modal machine learning (MMML), the convergence of multiple data modalities has the potential to reshape various applications. This paper presents a comprehensive overview of the current state, advancements, and challenges of MMML within the sphere of engineering design. The review begins with a deep dive into five fundamental concepts of MMML:multi-modal information representation, fusion, alignment, translation, and co-learning. Following this, we explore the cutting-edge applications of MMML, placing a particular emphasis on tasks pertinent to engineering design, such as cross-modal synthesis, multi-modal prediction, and cross-modal information retrieval. Through this comprehensive overview, we highlight the inherent challenges in adopting MMML in engineering design, and proffer potential directions for future research. To spur on the continued evolution of MMML in engineering design, we advocate for concentrated efforts to construct extensive multi-modal design datasets, develop effective data-driven MMML techniques tailored to design applications, and enhance the scalability and interpretability of MMML models. MMML models, as the next generation of intelligent design tools, hold a promising future to impact how products are designed

arXiv.org e-Print Archive

Disentanglement Learning via Topology

Author: Balabin Nikita
Barannikov Serguei
Burnaev Evgeny
Trofimov Ilya
Voronkova Daria
Publication venue
Publication date: 24/08/2023
Field of study

We propose TopDis (Topological Disentanglement), a method for learning disentangled representations via adding multi-scale topological loss term. Disentanglement is a crucial property of data representations substantial for the explainability and robustness of deep learning models and a step towards high-level cognition. The state-of-the-art method based on VAE minimizes the total correlation of the joint distribution of latent variables. We take a different perspective on disentanglement by analyzing topological properties of data manifolds. In particular, we optimize the topological similarity for data manifolds traversals. To the best of our knowledge, our paper is the first one to propose a differentiable topological loss for disentanglement. Our experiments have shown that the proposed topological loss improves disentanglement scores such as MIG, FactorVAE score, SAP score and DCI disentanglement score with respect to state-of-the-art results. Our method works in an unsupervised manner, permitting to apply it for problems without labeled factors of variation. Additionally, we show how to use the proposed topological loss to find disentangled directions in a trained GAN

arXiv.org e-Print Archive

Human-controllable and structured deep generative models

Author: Tran Dieu Linh
Publication venue: Computing, Imperial College London
Publication date: 01/03/2022
Field of study

Deep generative models are a class of probabilistic models that attempts to learn the underlying data distribution. These models are usually trained in an unsupervised way and thus, do not require any labels. Generative models such as Variational Autoencoders and Generative Adversarial Networks have made astounding progress over the last years. These models have several benefits: eased sampling and evaluation, efficient learning of low-dimensional representations for downstream tasks, and better understanding through interpretable representations. However, even though the quality of these models has improved immensely, the ability to control their style and structure is limited. Structured and human-controllable representations of generative models are essential for human-machine interaction and other applications, including fairness, creativity, and entertainment. This thesis investigates learning human-controllable and structured representations with deep generative models. In particular, we focus on generative modelling of 2D images. For the first part, we focus on learning clustered representations. We propose semi-parametric hierarchical variational autoencoders to estimate the intensity of facial action units. The semi-parametric model forms a hybrid generative-discriminative model and leverages both parametric Variational Autoencoder and non-parametric Gaussian Process autoencoder. We show superior performance in comparison with existing facial action unit estimation approaches. Based on the results and analysis of the learned representation, we focus on learning Mixture-of-Gaussians representations in an autoencoding framework. We deviate from the conventional autoencoding framework and consider a regularized objective with the Cauchy-Schwarz divergence. The Cauchy-Schwarz divergence allows a closed-form solution for Mixture-of-Gaussian distributions and, thus, efficiently optimizing the autoencoding objective. We show that our model outperforms existing Variational Autoencoders in density estimation, clustering, and semi-supervised facial action detection. We focus on learning disentangled representations for conditional generation and fair facial attribute classification for the second part. Conditional image generation relies on the accessibility to large-scale annotated datasets. Nevertheless, the geometry of visual objects, such as in faces, cannot be learned implicitly and deteriorate image fidelity. We propose incorporating facial landmarks with a statistical shape model and a differentiable piecewise affine transformation to separate the representation for appearance and shape. The goal of incorporating facial landmarks is that generation is controlled and can separate different appearances and geometries. In our last work, we use weak supervision for disentangling groups of variations. Works on learning disentangled representation have been done in an unsupervised fashion. However, recent works have shown that learning disentangled representations is not identifiable without any inductive biases. Since then, there has been a shift towards weakly-supervised disentanglement learning. We investigate using regularization based on the Kullback-Leiber divergence to disentangle groups of variations. The goal is to have consistent and separated subspaces for different groups, e.g., for content-style learning. Our evaluation shows increased disentanglement abilities and competitive performance for image clustering and fair facial attribute classification with weak supervision compared to supervised and semi-supervised approaches.Open Acces

Spiral - Imperial College Digital Repository

MddGAN : Multilinear Analysis of the GAN Latent Space

Author: AVGERIDIS LAZAROS
ΑΥΓΕΡΙΔΗΣ ΛΑΖΑΡΟΣ
Publication venue
Publication date: 01/01/2022
Field of study

Τα Παραγωγικά Αντιπαλικά Δίκτυα (ΠΑΔ) είναι επί του παρόντος ένα απαραίτητο εργαλείο για σημασιολογική επεξεργασία εικόνας, που χρησιμοποιείται ευρέως σε μια πληθώρα εφαρμογών υπολογιστικής όρασης. Αν και αυτά τα μοντέλα αποδεδειγμένα κωδικοποιούν πλούσια σημασιολογική γνώση στις εσωτερικές τους αναπαραστάσεις, εξακολουθούν να μην έχουν έναν διαισθητικό τρόπο παροχής άμεσου ελέγχου στους χρήστες, προκειμένου να μπορούν να ασκήσουν επιρροή με συνέπεια στο περιεχόμενο της εικόνας εξόδου. Μόλις εξαχθεί αυτή η γνώση ωστόσο, μπορεί να μετατραπεί σε ερμηνεύσιμα από τον άνθρωπο στοιχεία ελέγχου για την αλλαγή των συνθετικών εικόνων με προβλέψιμο τρόπο. Σε αυτήν την πτυχιακή εργασία, παρουσιάζουμε το MddGAN, μια τεχνική χωρίς επίβλεψη για την ανάλυση του λανθάνοντος χώρου του GAN και εξαγωγή διανυσματικών κατευθύνσεων που αντιστοιχούν σε σημαντικούς μετασχηματισμούς εικόνων. Σε αντίθεση με τις υπάρχοντες επιστημονικές εργασίες, εκτελούμε πολυγραμμική αποσύνθεση στα βάρη ενός προεκπαιδευμένου μοντέλου γεννήτριας και υποστηρίζουμε ότι ένα τέτοιο σχέδιο εξερεύνησης μπορεί να είναι περισσότερο κατάλληλο στην αποτύπωση των παραγόντων μεταβλητότητας που έμαθε το μοντέλο με λιγότερo μπέρδεμα. Περαιτέρω, η προτεινόμενη προσέγγιση μπορεί να χωρίσει μαθηματικά την ανακαλυφθείσα σημασιολογία σε ομάδες, ανάλογα με το σημασιολογικό τους περιεχόμενο. Αυτός ο διαχωρισμός γίνεται με εντελώς ανεπιτήρητο τρόπο και ουσιαστικά κάθε διάσταση της παραγόμενης πολυγραμμικής βάσης αντιπροσωπεύει μια τέτοια ομάδα. Διεξάγοντας πολλά πειράματα σε GAN που έχουν εκπαιδευτεί σε διάφορα σύνολα δεδομένων, δείχνουμε πως μεταβάλλοντας τον αριθμό των επεξηγηματικών παραγόντων που ανακαλύπτονται στις γενετικές αναπαραστάσεις επηρεάζονται οι σημασιολογικοί χειρισμοί που ανακαλύφθηκαν. Επιπλέον, παρουσιάζουμε πολλές μη τετριμμένες κατευθύνσεις που επισημαίνουν τις δυνατότητες επεξεργασίας της μεθόδου μας. Επιπλέον, συγκρίνουμε το MddGAN με τη τρέχουσα μέθοδο αναφοράς με επίβλεψη και τρέχουσα μέθοδο αναφοράς χωρίς επίβλεψη τόσο ποιοτικά όσο και ποσοτικά. Τα αποτελέσματα δείχνουν ότι η προσέγγισή μας είναι τουλάχιστον εφάμιλλη με αυτές τις μεθόδους.Generative Adversarial Networks (GANs) are currently an indispensable tool for semantic image editing, being widely used in a plethora of computer vision applications. Although these models are proven to encode rich semantic knowledge in their internal represent ations, they still lack an intuitive way to provide direct control to users, so that they can consistently influence the output image content. Once this knowledge is extracted how ever, it can be converted to humaninterpretable controls for altering synthesized images in a predictable way. In this thesis, we present MddGAN, an unsupervised technique for analyzing the GAN latent space and extracting vector directions corresponding to meaningful image trans formations. In contrast to existing works, we perform a multilinear decomposition on the weights of a pretrained generator, and we argue that such an exploration scheme can be more suitable in capturing the variability factors learnt with less entanglement. Further more, the proposed approach can mathematically divide the discovered semantics into groups, according to their semantic content. This separation happens in a completely unsupervised way, and essentially each dimension of the produced multilinear basis rep resents one such group. By conducting several experiments on GANs trained on various datasets, we show how varying the number of explanatory factors discovered in the generative representations affects the semantic manipulations discovered. Moreover, we showcase several non trivial directions highlighting the editing potential of our method. Furthermore, we compare MddGAN to the current supervised and unsupervised baselines both qualitatively and quantitatively. The results indicate that our approach is at least on par with these methods

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens

Human-Centric Deep Generative Models: The Blessing and The Curse

Author: Yu Ning
Publication venue
Publication date: 01/01/2021
Field of study

Over the past years, deep neural networks have achieved significant progress in a wide range of real-world applications. In particular, my research puts a focused lens in deep generative models, a neural network solution that proves effective in visual (re)creation. But is generative modeling a niche topic that should be researched on its own? My answer is critically no. In the thesis, I present the two sides of deep generative models, their blessing and their curse to human beings. Regarding what can deep generative models do for us, I demonstrate the improvement in performance and steerability of visual (re)creation. Regarding what can we do for deep generative models, my answer is to mitigate the security concerns of DeepFakes and improve minority inclusion of deep generative models. For the performance of deep generative models, I probe on applying attention modules and dual contrastive loss to generative adversarial networks (GANs), which pushes photorealistic image generation to a new state of the art. For the steerability, I introduce Texture Mixer, a simple yet effective approach to achieve steerable texture synthesis and blending. For the security, my research spans over a series of GAN fingerprinting solutions that enable the detection and attribution of GAN-generated image misuse. For the inclusion, I investigate the biased misbehavior of generative models and present my solution in enhancing the minority inclusion of GAN models over underrepresented image attributes. All in all, I propose to project actionable insights to the applications of deep generative models, and finally contribute to human-generator interaction

Digital Repository at the University of Maryland

Pushing the limits of Visual Grounding: Pre-training on large synthetic datasets

Author: KOSAREVA MARGARITA
Publication venue
Publication date: 27/02/2024
Field of study

openVisual Grounding is a crucial computer vision task requiring a deep understanding of data semantics. Leveraging the transformative trend of training controllable generative models, the research aims to demonstrate the substantial improvement of state-of-the-art visual grounding models through the use of massive, synthetically generated data. The study crafts a synthetic dataset using controllable generative models, offering a scalable solution to overcome challenges in traditional data collection processes. The study introduces a synthetic dataset, employing controllable generative models for scalability. Evaluating visual grounding model (TransVG) — on the synthetic dataset showcases promising results, with attributes contributing to a diverse dataset of 250,000 samples. The resulting datasets showcases the impact of synthetic data on visual grounding evolution, contributing to advancements in this dynamic field.Visual Grounding is a crucial computer vision task requiring a deep understanding of data semantics. Leveraging the transformative trend of training controllable generative models, the research aims to demonstrate the substantial improvement of state-of-the-art visual grounding models through the use of massive, synthetically generated data. The study crafts a synthetic dataset using controllable generative models, offering a scalable solution to overcome challenges in traditional data collection processes. The study introduces a synthetic dataset, employing controllable generative models for scalability. Evaluating visual grounding model (TransVG) — on the synthetic dataset showcases promising results, with attributes contributing to a diverse dataset of 250,000 samples. The resulting datasets showcases the impact of synthetic data on visual grounding evolution, contributing to advancements in this dynamic field

Padua Thesis and Dissertation Archive