97 research outputs found

    Semisupervised Autoencoder for Sentiment Analysis

    Full text link
    In this paper, we investigate the usage of autoencoders in modeling textual data. Traditional autoencoders suffer from at least two aspects: scalability with the high dimensionality of vocabulary size and dealing with task-irrelevant words. We address this problem by introducing supervision via the loss function of autoencoders. In particular, we first train a linear classifier on the labeled data, then define a loss for the autoencoder with the weights learned from the linear classifier. To reduce the bias brought by one single classifier, we define a posterior probability distribution on the weights of the classifier, and derive the marginalized loss of the autoencoder with Laplace approximation. We show that our choice of loss function can be rationalized from the perspective of Bregman Divergence, which justifies the soundness of our model. We evaluate the effectiveness of our model on six sentiment analysis datasets, and show that our model significantly outperforms all the competing methods with respect to classification accuracy. We also show that our model is able to take advantage of unlabeled dataset and get improved performance. We further show that our model successfully learns highly discriminative feature maps, which explains its superior performance.Comment: To appear in AAAI 201

    Matryoshka Diffusion Models

    Full text link
    Diffusion models are the de facto approach for generating high-quality images and videos, but learning high-dimensional models remains a formidable task due to computational and optimization challenges. Existing methods often resort to training cascaded models in pixel space or using a downsampled latent space of a separately trained auto-encoder. In this paper, we introduce Matryoshka Diffusion Models(MDM), an end-to-end framework for high-resolution image and video synthesis. We propose a diffusion process that denoises inputs at multiple resolutions jointly and uses a NestedUNet architecture where features and parameters for small-scale inputs are nested within those of large scales. In addition, MDM enables a progressive training schedule from lower to higher resolutions, which leads to significant improvements in optimization for high-resolution generation. We demonstrate the effectiveness of our approach on various benchmarks, including class-conditioned image generation, high-resolution text-to-image, and text-to-video applications. Remarkably, we can train a single pixel-space model at resolutions of up to 1024x1024 pixels, demonstrating strong zero-shot generalization using the CC12M dataset, which contains only 12 million images.Comment: 28 pages, 18 figure

    BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping

    Full text link
    Diffusion models have demonstrated excellent potential for generating diverse images. However, their performance often suffers from slow generation due to iterative denoising. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few without significant quality degradation. However, existing distillation methods either require significant amounts of offline computation for generating synthetic training data from the teacher model or need to perform expensive online learning with the help of real data. In this work, we present a novel technique called BOOT, that overcomes these limitations with an efficient data-free distillation algorithm. The core idea is to learn a time-conditioned model that predicts the output of a pre-trained diffusion model teacher given any time step. Such a model can be efficiently trained based on bootstrapping from two consecutive sampled steps. Furthermore, our method can be easily adapted to large-scale text-to-image diffusion models, which are challenging for conventional methods given the fact that the training sets are often large and difficult to access. We demonstrate the effectiveness of our approach on several benchmark datasets in the DDIM setting, achieving comparable generation quality while being orders of magnitude faster than the diffusion teacher. The text-to-image results show that the proposed approach is able to handle highly complex distributions, shedding light on more efficient generative modeling.Comment: In progres

    Enriching the endophytic bacterial microbiota of Ginkgo roots

    Get PDF
    Bacterial endophytes of Ginkgo roots take part in the secondary metabolic processes of the fossil tree and contribute to plant growth, nutrient uptake, and systemic resistance. However, the diversity of bacterial endophytes in Ginkgo roots is highly underestimated due to the lack of successful isolates and enrichment collections. The resulting culture collection contains 455 unique bacterial isolates representing 8 classes, 20 orders, 42 families, and 67 genera from five phyla: Actinobacteria, Bacteroidetes, Firmicutes, Proteobacteria, and Deinococcus-Thermus, using simply modified media (a mixed medium without any additional carbon sources [MM)] and two other mixed media with separately added starch [GM] and supplemented glucose [MSM]). A series of plant growth-promoting endophytes had multiple representatives within the culture collection. Moreover, we investigated the impact of refilling carbon sources on enrichment outcomes. Approximately 77% of the natural community of root-associated endophytes were predicted to have successfully cultivated the possibility based on a comparison of the 16S rRNA gene sequences between the enrichment collections and the Ginkgo root endophyte community. The rare or recalcitrant taxa in the root endosphere were mainly associated with Actinobacteria, Alphaproteobacteria, Blastocatellia, and Ktedonobacteria. By contrast, more operational taxonomic units (OTUs) (0.6% in the root endosphere) became significantly enriched in MM than in GM and MSM. We further found that the bacterial taxa of the root endosphere had strong metabolisms with the representative of aerobic chemoheterotrophy, while the functions of the enrichment collections were represented by the sulfur metabolism. In addition, the co-occurrence network analysis suggested that the substrate supplement could significantly impact bacterial interactions within the enrichment collections. Our results support the fact that it is better to use the enrichment to assess the cultivable potential and the interspecies interaction as well as to increase the detection/isolation of certain bacterial taxa. Taken together, this study will deepen our knowledge of the indoor endophytic culture and provide important insights into the substrate-driven enrichment

    PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model

    Full text link
    Autoregressive models for text sometimes generate repetitive and low-quality output because errors accumulate during the steps of generation. This issue is often attributed to exposure bias - the difference between how a model is trained, and how it is used during inference. Denoising diffusion models provide an alternative approach in which a model can revisit and revise its output. However, they can be computationally expensive and prior efforts on text have led to models that produce less fluent output compared to autoregressive models, especially for longer text and paragraphs. In this paper, we propose PLANNER, a model that combines latent semantic diffusion with autoregressive generation, to generate fluent text while exercising global control over paragraphs. The model achieves this by combining an autoregressive "decoding" module with a "planning" module that uses latent diffusion to generate semantic paragraph embeddings in a coarse-to-fine manner. The proposed method is evaluated on various conditional generation tasks, and results on semantic generation, text completion and summarization show its effectiveness in generating high-quality long-form text in an efficient manner.Comment: Accepted by NeurIPS 202

    Stabilizing Transformer Training by Preventing Attention Entropy Collapse

    Full text link
    Training stability is of great importance to Transformers. In this work, we investigate the training dynamics of Transformers by examining the evolution of the attention layers. In particular, we track the attention entropy for each attention head during the course of training, which is a proxy for model sharpness. We identify a common pattern across different architectures and tasks, where low attention entropy is accompanied by high training instability, which can take the form of oscillating loss or divergence. We denote the pathologically low attention entropy, corresponding to highly concentrated attention scores, as entropy collapse\textit{entropy collapse}. As a remedy, we propose σ\sigmaReparam, a simple and efficient solution where we reparametrize all linear layers with spectral normalization and an additional learned scalar. We demonstrate that the proposed reparameterization successfully prevents entropy collapse in the attention layers, promoting more stable training. Additionally, we prove a tight lower bound of the attention entropy, which decreases exponentially fast with the spectral norm of the attention logits, providing additional motivation for our approach. We conduct experiments with σ\sigmaReparam on image classification, image self-supervised learning, machine translation, automatic speech recognition, and language modeling tasks, across Transformer architectures. We show that σ\sigmaReparam provides stability and robustness with respect to the choice of hyperparameters, going so far as enabling training (a) a Vision Transformer to competitive performance without warmup, weight decay, layer normalization or adaptive optimizers; (b) deep architectures in machine translation and (c) speech recognition to competitive performance without warmup and adaptive optimizers
    • …
    corecore