6 research outputs found
Latent Neural Differential Equations for Video Generation
Generative Adversarial Networks have recently shown promise for video
generation, building off of the success of image generation while also
addressing a new challenge: time. Although time was analyzed in some early
work, the literature has not adequately grown with temporal modeling
developments. We propose studying the effects of Neural Differential Equations
to model the temporal dynamics of video generation. The paradigm of Neural
Differential Equations presents many theoretical strengths including the first
continuous representation of time within video generation. In order to address
the effects of Neural Differential Equations, we will investigate how changes
in temporal models affect generated video quality
Lower Dimensional Kernels for Video Discriminators:Lower-Dimensional Video Discriminators for Generative Adversarial Networks
This work presents an analysis of the discriminators used in Generative
Adversarial Networks (GANs) for Video. We show that unconstrained video
discriminator architectures induce a loss surface with high curvature which
make optimisation difficult. We also show that this curvature becomes more
extreme as the maximal kernel dimension of video discriminators increases. With
these observations in hand, we propose a family of efficient Lower-Dimensional
Video Discriminators for GANs (LDVD GANs). The proposed family of
discriminators improve the performance of video GAN models they are applied to
and demonstrate good performance on complex and diverse datasets such as
UCF-101. In particular, we show that they can double the performance of
Temporal-GANs and provide for state-of-the-art performance on a single GPU
Temporal development GAN (TD-GAN): crafting more accurate image sequences of biological development
In this study, we propose a novel Temporal Development Generative Adversarial Network (TD-GAN) for the generation and analysis of videos, with a particular focus on biological and medical applications. Inspired by Progressive Growing GAN (PG-GAN) and Temporal GAN (T-GAN), our approach employs multiple discriminators to analyze generated videos at different resolutions and approaches. A new Temporal Discriminator (TD) that evaluates the developmental coherence of video content is introduced, ensuring that the generated image sequences follow a realistic order of stages. The proposed TD-GAN is evaluated on three datasets: Mold, Yeast, and Embryo, each with unique characteristics. Multiple evaluation metrics are used to comprehensively assess the generated videos, including the Fréchet Inception Distance (FID), Frechet Video Distance (FVD), class accuracy, order accuracy, and Mean Squared Error (MSE). Results indicate that TD-GAN significantly improves FVD scores, demonstrating its effectiveness in generating more coherent videos. It achieves competitive FID scores, particularly when selecting the appropriate number of classes for each dataset and resolution. Additionally, TD-GAN enhances class accuracy, order accuracy, and reduces MSE compared to the default model, demonstrating its ability to generate more realistic and coherent video sequences. Furthermore, our analysis of stage distribution in the generated videos shows that TD-GAN produces videos that closely match the real datasets, offering promising potential for generating and analyzing videos in different domains, including biology and medicine
Generative Adversarial Networks in Computer Vision: A Survey and Taxonomy
Generative adversarial networks (GANs) have been extensively studied in the
past few years. Arguably their most significant impact has been in the area of
computer vision where great advances have been made in challenges such as
plausible image generation, image-to-image translation, facial attribute
manipulation and similar domains. Despite the significant successes achieved to
date, applying GANs to real-world problems still poses significant challenges,
three of which we focus on here. These are: (1) the generation of high quality
images, (2) diversity of image generation, and (3) stable training. Focusing on
the degree to which popular GAN technologies have made progress against these
challenges, we provide a detailed review of the state of the art in GAN-related
research in the published scientific literature. We further structure this
review through a convenient taxonomy we have adopted based on variations in GAN
architectures and loss functions. While several reviews for GANs have been
presented to date, none have considered the status of this field based on their
progress towards addressing practical challenges relevant to computer vision.
Accordingly, we review and critically discuss the most popular
architecture-variant, and loss-variant GANs, for tackling these challenges. Our
objective is to provide an overview as well as a critical analysis of the
status of GAN research in terms of relevant progress towards important computer
vision application requirements. As we do this we also discuss the most
compelling applications in computer vision in which GANs have demonstrated
considerable success along with some suggestions for future research
directions. Code related to GAN-variants studied in this work is summarized on
https://github.com/sheqi/GAN_Review.Comment: Accepted by ACM Computing Surveys, 23 November 202