97 research outputs found
Semisupervised Autoencoder for Sentiment Analysis
In this paper, we investigate the usage of autoencoders in modeling textual
data. Traditional autoencoders suffer from at least two aspects: scalability
with the high dimensionality of vocabulary size and dealing with
task-irrelevant words. We address this problem by introducing supervision via
the loss function of autoencoders. In particular, we first train a linear
classifier on the labeled data, then define a loss for the autoencoder with the
weights learned from the linear classifier. To reduce the bias brought by one
single classifier, we define a posterior probability distribution on the
weights of the classifier, and derive the marginalized loss of the autoencoder
with Laplace approximation. We show that our choice of loss function can be
rationalized from the perspective of Bregman Divergence, which justifies the
soundness of our model. We evaluate the effectiveness of our model on six
sentiment analysis datasets, and show that our model significantly outperforms
all the competing methods with respect to classification accuracy. We also show
that our model is able to take advantage of unlabeled dataset and get improved
performance. We further show that our model successfully learns highly
discriminative feature maps, which explains its superior performance.Comment: To appear in AAAI 201
Matryoshka Diffusion Models
Diffusion models are the de facto approach for generating high-quality images
and videos, but learning high-dimensional models remains a formidable task due
to computational and optimization challenges. Existing methods often resort to
training cascaded models in pixel space or using a downsampled latent space of
a separately trained auto-encoder. In this paper, we introduce Matryoshka
Diffusion Models(MDM), an end-to-end framework for high-resolution image and
video synthesis. We propose a diffusion process that denoises inputs at
multiple resolutions jointly and uses a NestedUNet architecture where features
and parameters for small-scale inputs are nested within those of large scales.
In addition, MDM enables a progressive training schedule from lower to higher
resolutions, which leads to significant improvements in optimization for
high-resolution generation. We demonstrate the effectiveness of our approach on
various benchmarks, including class-conditioned image generation,
high-resolution text-to-image, and text-to-video applications. Remarkably, we
can train a single pixel-space model at resolutions of up to 1024x1024 pixels,
demonstrating strong zero-shot generalization using the CC12M dataset, which
contains only 12 million images.Comment: 28 pages, 18 figure
BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping
Diffusion models have demonstrated excellent potential for generating diverse
images. However, their performance often suffers from slow generation due to
iterative denoising. Knowledge distillation has been recently proposed as a
remedy that can reduce the number of inference steps to one or a few without
significant quality degradation. However, existing distillation methods either
require significant amounts of offline computation for generating synthetic
training data from the teacher model or need to perform expensive online
learning with the help of real data. In this work, we present a novel technique
called BOOT, that overcomes these limitations with an efficient data-free
distillation algorithm. The core idea is to learn a time-conditioned model that
predicts the output of a pre-trained diffusion model teacher given any time
step. Such a model can be efficiently trained based on bootstrapping from two
consecutive sampled steps. Furthermore, our method can be easily adapted to
large-scale text-to-image diffusion models, which are challenging for
conventional methods given the fact that the training sets are often large and
difficult to access. We demonstrate the effectiveness of our approach on
several benchmark datasets in the DDIM setting, achieving comparable generation
quality while being orders of magnitude faster than the diffusion teacher. The
text-to-image results show that the proposed approach is able to handle highly
complex distributions, shedding light on more efficient generative modeling.Comment: In progres
Enriching the endophytic bacterial microbiota of Ginkgo roots
Bacterial endophytes of Ginkgo roots take part in the secondary metabolic processes of the fossil tree and contribute to plant growth, nutrient uptake, and systemic resistance. However, the diversity of bacterial endophytes in Ginkgo roots is highly underestimated due to the lack of successful isolates and enrichment collections. The resulting culture collection contains 455 unique bacterial isolates representing 8 classes, 20 orders, 42 families, and 67 genera from five phyla: Actinobacteria, Bacteroidetes, Firmicutes, Proteobacteria, and Deinococcus-Thermus, using simply modified media (a mixed medium without any additional carbon sources [MM)] and two other mixed media with separately added starch [GM] and supplemented glucose [MSM]). A series of plant growth-promoting endophytes had multiple representatives within the culture collection. Moreover, we investigated the impact of refilling carbon sources on enrichment outcomes. Approximately 77% of the natural community of root-associated endophytes were predicted to have successfully cultivated the possibility based on a comparison of the 16S rRNA gene sequences between the enrichment collections and the Ginkgo root endophyte community. The rare or recalcitrant taxa in the root endosphere were mainly associated with Actinobacteria, Alphaproteobacteria, Blastocatellia, and Ktedonobacteria. By contrast, more operational taxonomic units (OTUs) (0.6% in the root endosphere) became significantly enriched in MM than in GM and MSM. We further found that the bacterial taxa of the root endosphere had strong metabolisms with the representative of aerobic chemoheterotrophy, while the functions of the enrichment collections were represented by the sulfur metabolism. In addition, the co-occurrence network analysis suggested that the substrate supplement could significantly impact bacterial interactions within the enrichment collections. Our results support the fact that it is better to use the enrichment to assess the cultivable potential and the interspecies interaction as well as to increase the detection/isolation of certain bacterial taxa. Taken together, this study will deepen our knowledge of the indoor endophytic culture and provide important insights into the substrate-driven enrichment
PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model
Autoregressive models for text sometimes generate repetitive and low-quality
output because errors accumulate during the steps of generation. This issue is
often attributed to exposure bias - the difference between how a model is
trained, and how it is used during inference. Denoising diffusion models
provide an alternative approach in which a model can revisit and revise its
output. However, they can be computationally expensive and prior efforts on
text have led to models that produce less fluent output compared to
autoregressive models, especially for longer text and paragraphs. In this
paper, we propose PLANNER, a model that combines latent semantic diffusion with
autoregressive generation, to generate fluent text while exercising global
control over paragraphs. The model achieves this by combining an autoregressive
"decoding" module with a "planning" module that uses latent diffusion to
generate semantic paragraph embeddings in a coarse-to-fine manner. The proposed
method is evaluated on various conditional generation tasks, and results on
semantic generation, text completion and summarization show its effectiveness
in generating high-quality long-form text in an efficient manner.Comment: Accepted by NeurIPS 202
Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Training stability is of great importance to Transformers. In this work, we
investigate the training dynamics of Transformers by examining the evolution of
the attention layers. In particular, we track the attention entropy for each
attention head during the course of training, which is a proxy for model
sharpness. We identify a common pattern across different architectures and
tasks, where low attention entropy is accompanied by high training instability,
which can take the form of oscillating loss or divergence. We denote the
pathologically low attention entropy, corresponding to highly concentrated
attention scores, as . As a remedy, we propose
Reparam, a simple and efficient solution where we reparametrize all
linear layers with spectral normalization and an additional learned scalar. We
demonstrate that the proposed reparameterization successfully prevents entropy
collapse in the attention layers, promoting more stable training. Additionally,
we prove a tight lower bound of the attention entropy, which decreases
exponentially fast with the spectral norm of the attention logits, providing
additional motivation for our approach. We conduct experiments with
Reparam on image classification, image self-supervised learning,
machine translation, automatic speech recognition, and language modeling tasks,
across Transformer architectures. We show that Reparam provides
stability and robustness with respect to the choice of hyperparameters, going
so far as enabling training (a) a Vision Transformer to competitive performance
without warmup, weight decay, layer normalization or adaptive optimizers; (b)
deep architectures in machine translation and (c) speech recognition to
competitive performance without warmup and adaptive optimizers
- …