252 research outputs found
Guiding InfoGAN with Semi-Supervision
In this paper we propose a new semi-supervised GAN architecture (ss-InfoGAN)
for image synthesis that leverages information from few labels (as little as
0.22%, max. 10% of the dataset) to learn semantically meaningful and
controllable data representations where latent variables correspond to label
categories. The architecture builds on Information Maximizing Generative
Adversarial Networks (InfoGAN) and is shown to learn both continuous and
categorical codes and achieves higher quality of synthetic samples compared to
fully unsupervised settings. Furthermore, we show that using small amounts of
labeled data speeds-up training convergence. The architecture maintains the
ability to disentangle latent variables for which no labels are available.
Finally, we contribute an information-theoretic reasoning on how introducing
semi-supervision increases mutual information between synthetic and real data
Reconstructing the Invisible: Video Frame Restoration through Siamese Masked Conditional Variational Autoencoder
In the domain of computer vision, the restoration of missing information in
video frames is a critical challenge, particularly in applications such as
autonomous driving and surveillance systems. This paper introduces the Siamese
Masked Conditional Variational Autoencoder (SiamMCVAE), leveraging a siamese
architecture with twin encoders based on vision transformers. This innovative
design enhances the model's ability to comprehend lost content by capturing
intrinsic similarities between paired frames. SiamMCVAE proficiently
reconstructs missing elements in masked frames, effectively addressing issues
arising from camera malfunctions through variational inferences. Experimental
results robustly demonstrate the model's effectiveness in restoring missing
information, thus enhancing the resilience of computer vision systems. The
incorporation of Siamese Vision Transformer (SiamViT) encoders in SiamMCVAE
exemplifies promising potential for addressing real-world challenges in
computer vision, reinforcing the adaptability of autonomous systems in dynamic
environments
Adaptation of the super resolution SOTA for Art Restoration in camera capture images
Preserving cultural heritage is of paramount importance. In the domain of art
restoration, developing a computer vision model capable of effectively
restoring deteriorated images of art pieces was difficult, but now we have a
good computer vision state-of-art. Traditional restoration methods are often
time-consuming and require extensive expertise. The aim of this work is to
design an automated solution based on computer vision models that can enhance
and reconstruct degraded artworks, improving their visual quality while
preserving their original characteristics and artifacts. The model should
handle a diverse range of deterioration types, including but not limited to
noise, blur, scratches, fading, and other common forms of degradation. We adapt
the current state-of-art for the image super-resolution based on the Diffusion
Model (DM) and fine-tune it for Image art restoration. Our results show that
instead of fine-tunning multiple different models for different kinds of
degradation, fine-tuning one super-resolution. We train it on multiple datasets
to make it robust. code link: https://github.com/Naagar/art_restoration_DMComment: COMPETITIONS @ ICETCI 202
A unified framework for machine learning collective variables for enhanced sampling simulations:
Identifying a reduced set of collective variables is critical for
understanding atomistic simulations and accelerating them through enhanced
sampling techniques. Recently, several methods have been proposed to learn
these variables directly from atomistic data. Depending on the type of data
available, the learning process can be framed as dimensionality reduction,
classification of metastable states or identification of slow modes. Here we
present , a Python library that simplifies the construction
of these variables and their use in the context of enhanced sampling through a
contributed interface to the PLUMED software. The library is organized
modularly to facilitate the extension and cross-contamination of these
methodologies. In this spirit, we developed a general multi-task learning
framework in which multiple objective functions and data from different
simulations can be combined to improve the collective variables. The library's
versatility is demonstrated through simple examples that are prototypical of
realistic scenarios
Information Flow in Self-Supervised Learning
In this paper, we provide a comprehensive toolbox for understanding and
enhancing self-supervised learning (SSL) methods through the lens of matrix
information theory. Specifically, by leveraging the principles of matrix mutual
information and joint entropy, we offer a unified analysis for both contrastive
and feature decorrelation based methods. Furthermore, we propose the matrix
variational masked auto-encoder (M-MAE) method, grounded in matrix information
theory, as an enhancement to masked image modeling. The empirical evaluations
underscore the effectiveness of M-MAE compared with the state-of-the-art
methods, including a 3.9% improvement in linear probing ViT-Base, and a 1%
improvement in fine-tuning ViT-Large, both on ImageNet
On Masked Pre-training and the Marginal Likelihood
Masked pre-training removes random input dimensions and learns a model that
can predict the missing values. Empirical results indicate that this intuitive
form of self-supervised learning yields models that generalize very well to new
domains. A theoretical understanding is, however, lacking. This paper shows
that masked pre-training with a suitable cumulative scoring function
corresponds to maximizing the model's marginal likelihood, which is de facto
the Bayesian model selection measure of generalization. Beyond shedding light
on the success of masked pre-training, this insight also suggests that Bayesian
models can be trained with appropriately designed self-supervision.
Empirically, we confirm the developed theory and explore the main learning
principles of masked pre-training in large language models
- …