200 research outputs found
FloWaveNet : A Generative Flow for Raw Audio
Most modern text-to-speech architectures use a WaveNet vocoder for
synthesizing high-fidelity waveform audio, but there have been limitations,
such as high inference time, in its practical application due to its ancestral
sampling scheme. The recently suggested Parallel WaveNet and ClariNet have
achieved real-time audio synthesis capability by incorporating inverse
autoregressive flow for parallel sampling. However, these approaches require a
two-stage training pipeline with a well-trained teacher network and can only
produce natural sound by using probability distillation along with auxiliary
loss terms. We propose FloWaveNet, a flow-based generative model for raw audio
synthesis. FloWaveNet requires only a single-stage training procedure and a
single maximum likelihood loss, without any additional auxiliary terms, and it
is inherently parallel due to the characteristics of generative flow. The model
can efficiently sample raw audio in real-time, with clarity comparable to
previous two-stage parallel models. The code and samples for all models,
including our FloWaveNet, are publicly available.Comment: 9 pages, ICML'201
Languages and earnings management
We predict that managers of firms in countries where languages do not require speakers to grammatically mark future events perceive future consequences of earnings management to be more imminent, and therefore they are less likely to engage in earnings management. Using data from 38 countries, we find that accrual-based earnings management and real earnings management are less prevalent where there is weaker time disassociation in the language. Our study is the first to examine the relation between the grammatical structure of languages and financial reporting characteristics, and it extends the literature on the effect of informal institutions on corporate actions
Recasting Continual Learning as Sequence Modeling
In this work, we aim to establish a strong connection between two significant
bodies of machine learning research: continual learning and sequence modeling.
That is, we propose to formulate continual learning as a sequence modeling
problem, allowing advanced sequence models to be utilized for continual
learning. Under this formulation, the continual learning process becomes the
forward pass of a sequence model. By adopting the meta-continual learning (MCL)
framework, we can train the sequence model at the meta-level, on multiple
continual learning episodes. As a specific example of our new formulation, we
demonstrate the application of Transformers and their efficient variants as MCL
methods. Our experiments on seven benchmarks, covering both classification and
regression, show that sequence models can be an attractive solution for general
MCL.Comment: NeurIPS 202
Hamilton transversals in tournaments
It is well-known that every tournament contains a Hamilton path, and every
strongly connected tournament contains a Hamilton cycle. This paper establishes
transversal generalizations of these classical results. For a collection
of not-necessarily distinct tournaments on a
common vertex set , an -edge directed graph with vertices
in is called a -transversal if there exists a bijection
such that for all
. We prove that for sufficiently large with ,
there exists a -transversal Hamilton path. Moreover, if and
at least of the tournaments are assumed to be strongly
connected, then there is a -transversal Hamilton cycle. In our
proof, we utilize a novel way of partitioning tournaments which we dub
-partition
Instance-Aware Group Quantization for Vision Transformers
Post-training quantization (PTQ) is an efficient model compression technique
that quantizes a pretrained full-precision model using only a small calibration
set of unlabeled samples without retraining. PTQ methods for convolutional
neural networks (CNNs) provide quantization results comparable to
full-precision counterparts. Directly applying them to vision transformers
(ViTs), however, incurs severe performance degradation, mainly due to the
differences in architectures between CNNs and ViTs. In particular, the
distribution of activations for each channel vary drastically according to
input instances, making PTQ methods for CNNs inappropriate for ViTs. To address
this, we introduce instance-aware group quantization for ViTs (IGQ-ViT). To
this end, we propose to split the channels of activation maps into multiple
groups dynamically for each input instance, such that activations within each
group share similar statistical properties. We also extend our scheme to
quantize softmax attentions across tokens. In addition, the number of groups
for each layer is adjusted to minimize the discrepancies between predictions
from quantized and full-precision models, under a bit-operation (BOP)
constraint. We show extensive experimental results on image classification,
object detection, and instance segmentation, with various transformer
architectures, demonstrating the effectiveness of our approach.Comment: CVPR 202
An Empirical Examination of Consumer Behavior for Search and Experience Goods in Sentiment Analysis
With the explosive increase of user-generated content such as product reviews and social media, sentiment analysis has emerged as an area of interest. Sentiment analysis is a useful method to analyze product reviews, and product feature extraction is an important task in sentiment analysis, during which one identifies features of products from reviews. Product features are categorized by product type, such as search goods or experience goods, and their characteristics are totally different. Thus, we examine whether the classification performance differs by product type. The findings show that the optimal threshold varies by product type, and simply decreasing the threshold to cover many features does not guarantee improvement of the classification performance
Progressive Deblurring of Diffusion Models for Coarse-to-Fine Image Synthesis
Recently, diffusion models have shown remarkable results in image synthesis
by gradually removing noise and amplifying signals. Although the simple
generative process surprisingly works well, is this the best way to generate
image data? For instance, despite the fact that human perception is more
sensitive to the low frequencies of an image, diffusion models themselves do
not consider any relative importance of each frequency component. Therefore, to
incorporate the inductive bias for image data, we propose a novel generative
process that synthesizes images in a coarse-to-fine manner. First, we
generalize the standard diffusion models by enabling diffusion in a rotated
coordinate system with different velocities for each component of the vector.
We further propose a blur diffusion as a special case, where each frequency
component of an image is diffused at different speeds. Specifically, the
proposed blur diffusion consists of a forward process that blurs an image and
adds noise gradually, after which a corresponding reverse process deblurs an
image and removes noise progressively. Experiments show that the proposed model
outperforms the previous method in FID on LSUN bedroom and church datasets.
Code is available at https://github.com/sangyun884/blur-diffusion
- β¦