200 research outputs found

    FloWaveNet : A Generative Flow for Raw Audio

    Full text link
    Most modern text-to-speech architectures use a WaveNet vocoder for synthesizing high-fidelity waveform audio, but there have been limitations, such as high inference time, in its practical application due to its ancestral sampling scheme. The recently suggested Parallel WaveNet and ClariNet have achieved real-time audio synthesis capability by incorporating inverse autoregressive flow for parallel sampling. However, these approaches require a two-stage training pipeline with a well-trained teacher network and can only produce natural sound by using probability distillation along with auxiliary loss terms. We propose FloWaveNet, a flow-based generative model for raw audio synthesis. FloWaveNet requires only a single-stage training procedure and a single maximum likelihood loss, without any additional auxiliary terms, and it is inherently parallel due to the characteristics of generative flow. The model can efficiently sample raw audio in real-time, with clarity comparable to previous two-stage parallel models. The code and samples for all models, including our FloWaveNet, are publicly available.Comment: 9 pages, ICML'201

    Languages and earnings management

    Get PDF
    We predict that managers of firms in countries where languages do not require speakers to grammatically mark future events perceive future consequences of earnings management to be more imminent, and therefore they are less likely to engage in earnings management. Using data from 38 countries, we find that accrual-based earnings management and real earnings management are less prevalent where there is weaker time disassociation in the language. Our study is the first to examine the relation between the grammatical structure of languages and financial reporting characteristics, and it extends the literature on the effect of informal institutions on corporate actions

    Recasting Continual Learning as Sequence Modeling

    Full text link
    In this work, we aim to establish a strong connection between two significant bodies of machine learning research: continual learning and sequence modeling. That is, we propose to formulate continual learning as a sequence modeling problem, allowing advanced sequence models to be utilized for continual learning. Under this formulation, the continual learning process becomes the forward pass of a sequence model. By adopting the meta-continual learning (MCL) framework, we can train the sequence model at the meta-level, on multiple continual learning episodes. As a specific example of our new formulation, we demonstrate the application of Transformers and their efficient variants as MCL methods. Our experiments on seven benchmarks, covering both classification and regression, show that sequence models can be an attractive solution for general MCL.Comment: NeurIPS 202

    Hamilton transversals in tournaments

    Full text link
    It is well-known that every tournament contains a Hamilton path, and every strongly connected tournament contains a Hamilton cycle. This paper establishes transversal generalizations of these classical results. For a collection T={T1,…,Tm}\mathbf{T}=\{T_1,\dots,T_m\} of not-necessarily distinct tournaments on a common vertex set VV, an mm-edge directed graph D\mathcal{D} with vertices in VV is called a T\mathbf{T}-transversal if there exists a bijection ϕ ⁣:E(D)β†’[m]\phi\colon E(\mathcal{D})\to [m] such that e∈E(TΟ•(e))e\in E(T_{\phi(e)}) for all e∈E(D)e\in E(\mathcal{D}). We prove that for sufficiently large mm with m=∣Vβˆ£βˆ’1m=|V|-1, there exists a T\mathbf{T}-transversal Hamilton path. Moreover, if m=∣V∣m=|V| and at least mβˆ’1m-1 of the tournaments T1,…,TmT_1,\ldots,T_m are assumed to be strongly connected, then there is a T\mathbf{T}-transversal Hamilton cycle. In our proof, we utilize a novel way of partitioning tournaments which we dub H\mathbf{H}-partition

    Instance-Aware Group Quantization for Vision Transformers

    Full text link
    Post-training quantization (PTQ) is an efficient model compression technique that quantizes a pretrained full-precision model using only a small calibration set of unlabeled samples without retraining. PTQ methods for convolutional neural networks (CNNs) provide quantization results comparable to full-precision counterparts. Directly applying them to vision transformers (ViTs), however, incurs severe performance degradation, mainly due to the differences in architectures between CNNs and ViTs. In particular, the distribution of activations for each channel vary drastically according to input instances, making PTQ methods for CNNs inappropriate for ViTs. To address this, we introduce instance-aware group quantization for ViTs (IGQ-ViT). To this end, we propose to split the channels of activation maps into multiple groups dynamically for each input instance, such that activations within each group share similar statistical properties. We also extend our scheme to quantize softmax attentions across tokens. In addition, the number of groups for each layer is adjusted to minimize the discrepancies between predictions from quantized and full-precision models, under a bit-operation (BOP) constraint. We show extensive experimental results on image classification, object detection, and instance segmentation, with various transformer architectures, demonstrating the effectiveness of our approach.Comment: CVPR 202

    An Empirical Examination of Consumer Behavior for Search and Experience Goods in Sentiment Analysis

    Get PDF
    With the explosive increase of user-generated content such as product reviews and social media, sentiment analysis has emerged as an area of interest. Sentiment analysis is a useful method to analyze product reviews, and product feature extraction is an important task in sentiment analysis, during which one identifies features of products from reviews. Product features are categorized by product type, such as search goods or experience goods, and their characteristics are totally different. Thus, we examine whether the classification performance differs by product type. The findings show that the optimal threshold varies by product type, and simply decreasing the threshold to cover many features does not guarantee improvement of the classification performance

    Progressive Deblurring of Diffusion Models for Coarse-to-Fine Image Synthesis

    Full text link
    Recently, diffusion models have shown remarkable results in image synthesis by gradually removing noise and amplifying signals. Although the simple generative process surprisingly works well, is this the best way to generate image data? For instance, despite the fact that human perception is more sensitive to the low frequencies of an image, diffusion models themselves do not consider any relative importance of each frequency component. Therefore, to incorporate the inductive bias for image data, we propose a novel generative process that synthesizes images in a coarse-to-fine manner. First, we generalize the standard diffusion models by enabling diffusion in a rotated coordinate system with different velocities for each component of the vector. We further propose a blur diffusion as a special case, where each frequency component of an image is diffused at different speeds. Specifically, the proposed blur diffusion consists of a forward process that blurs an image and adds noise gradually, after which a corresponding reverse process deblurs an image and removes noise progressively. Experiments show that the proposed model outperforms the previous method in FID on LSUN bedroom and church datasets. Code is available at https://github.com/sangyun884/blur-diffusion
    • …
    corecore