58 research outputs found
DNN-Based Source Enhancement to Increase Objective Sound Quality Assessment Score
We propose a training method for deep neural network (DNN)-based source enhancement to increase objective sound quality assessment (OSQA) scores such as the perceptual evaluation of speech quality (PESQ). In many conventional studies, DNNs have been used as a mapping function to estimate time-frequency masks and trained to minimize an analytically tractable objective function such as the mean squared error (MSE). Since OSQA scores have been used widely for soundquality evaluation, constructing DNNs to increase OSQA scores would be better than using the minimum-MSE to create highquality output signals. However, since most OSQA scores are not analytically tractable, i.e., they are black boxes, the gradient of the objective function cannot be calculated by simply applying back-propagation. To calculate the gradient of the OSQA-based objective function, we formulated a DNN optimization scheme on the basis of black-box optimization, which is used for training a computer that plays a game. For a black-box-optimization scheme, we adopt the policy gradient method for calculating the gradient on the basis of a sampling algorithm. To simulate output signals using the sampling algorithm, DNNs are used to estimate the probability-density function of the output signals that maximize OSQA scores. The OSQA scores are calculated from the simulated output signals, and the DNNs are trained to increase the probability of generating the simulated output signals that achieve high OSQA scores. Through several experiments, we found that OSQA scores significantly increased by applying the proposed method, even though the MSE was not minimized
Theoretical Analysis of Primal-Dual Algorithm for Non-Convex Stochastic Decentralized Optimization
In recent years, decentralized learning has emerged as a powerful tool not
only for large-scale machine learning, but also for preserving privacy. One of
the key challenges in decentralized learning is that the data distribution held
by each node is statistically heterogeneous. To address this challenge, the
primal-dual algorithm called the Edge-Consensus Learning (ECL) was proposed and
was experimentally shown to be robust to the heterogeneity of data
distributions. However, the convergence rate of the ECL is provided only when
the objective function is convex, and has not been shown in a standard machine
learning setting where the objective function is non-convex. Furthermore, the
intuitive reason why the ECL is robust to the heterogeneity of data
distributions has not been investigated. In this work, we first investigate the
relationship between the ECL and Gossip algorithm and show that the update
formulas of the ECL can be regarded as correcting the local stochastic gradient
in the Gossip algorithm. Then, we propose the Generalized ECL (G-ECL), which
contains the ECL as a special case, and provide the convergence rates of the
G-ECL in both (strongly) convex and non-convex settings, which do not depend on
the heterogeneity of data distributions. Through synthetic experiments, we
demonstrate that the numerical results of both the G-ECL and ECL coincide with
the convergence rate of the G-ECL
SSFG: Stochastically Scaling Features and Gradients for Regularizing Graph Convolutional Networks
Graph convolutional networks have been successfully applied in various
graph-based tasks. In a typical graph convolutional layer, node features are
updated by aggregating neighborhood information. Repeatedly applying graph
convolutions can cause the oversmoothing issue, i.e., node features at deep
layers converge to similar values. Previous studies have suggested that
oversmoothing is one of the major issues that restrict the performance of graph
convolutional networks. In this paper, we propose a stochastic regularization
method to tackle the oversmoothing problem. In the proposed method, we
stochastically scale features and gradients (SSFG) by a factor sampled from a
probability distribution in the training procedure. By explicitly applying a
scaling factor to break feature convergence, the oversmoothing issue is
alleviated. We show that applying stochastic scaling at the gradient level is
complementary to that applied at the feature level to improve the overall
performance. Our method does not increase the number of trainable parameters.
When used together with ReLU, our SSFG can be seen as a stochastic ReLU
activation function. We experimentally validate our SSFG regularization method
on three commonly used types of graph networks. Extensive experimental results
on seven benchmark datasets for four graph-based tasks demonstrate that our
SSFG regularization is effective in improving the overall performance of the
baseline graph networks
Optimal Transport with Cyclic Symmetry
We propose novel fast algorithms for optimal transport (OT) utilizing a
cyclic symmetry structure of input data. Such OT with cyclic symmetry appears
universally in various real-world examples: image processing, urban planning,
and graph processing. Our main idea is to reduce OT to a small optimization
problem that has significantly fewer variables by utilizing cyclic symmetry and
various optimization techniques. On the basis of this reduction, our algorithms
solve the small optimization problem instead of the original OT. As a result,
our algorithms obtain the optimal solution and the objective function value of
the original OT faster than solving the original OT directly. In this paper,
our focus is on two crucial OT formulations: the linear programming OT (LOT)
and the strongly convex-regularized OT, which includes the well-known
entropy-regularized OT (EROT). Experiments show the effectiveness of our
algorithms for LOT and EROT in synthetic/real-world data that has a
strict/approximate cyclic symmetry structure. Through theoretical and
experimental results, this paper successfully introduces the concept of
symmetry into the OT research field for the first time
Embarrassingly Simple Text Watermarks
We propose Easymark, a family of embarrassingly simple yet effective
watermarks. Text watermarking is becoming increasingly important with the
advent of Large Language Models (LLM). LLMs can generate texts that cannot be
distinguished from human-written texts. This is a serious problem for the
credibility of the text. Easymark is a simple yet effective solution to this
problem. Easymark can inject a watermark without changing the meaning of the
text at all while a validator can detect if a text was generated from a system
that adopted Easymark or not with high credibility. Easymark is extremely easy
to implement so that it only requires a few lines of code. Easymark does not
require access to LLMs, so it can be implemented on the user-side when the LLM
providers do not offer watermarked LLMs. In spite of its simplicity, it
achieves higher detection accuracy and BLEU scores than the state-of-the-art
text watermarking methods. We also prove the impossibility theorem of perfect
watermarking, which is valuable in its own right. This theorem shows that no
matter how sophisticated a watermark is, a malicious user could remove it from
the text, which motivate us to use a simple watermark such as Easymark. We
carry out experiments with LLM-generated texts and confirm that Easymark can be
detected reliably without any degradation of BLEU and perplexity, and
outperform state-of-the-art watermarks in terms of both quality and
reliability
Momentum Tracking: Momentum Acceleration for Decentralized Deep Learning on Heterogeneous Data
SGD with momentum acceleration is one of the key components for improving the
performance of neural networks. For decentralized learning, a straightforward
approach using momentum acceleration is Distributed SGD (DSGD) with momentum
acceleration (DSGDm). However, DSGDm performs worse than DSGD when the data
distributions are statistically heterogeneous. Recently, several studies have
addressed this issue and proposed methods with momentum acceleration that are
more robust to data heterogeneity than DSGDm, although their convergence rates
remain dependent on data heterogeneity and decrease when the data distributions
are heterogeneous. In this study, we propose Momentum Tracking, which is a
method with momentum acceleration whose convergence rate is proven to be
independent of data heterogeneity. More specifically, we analyze the
convergence rate of Momentum Tracking in the standard deep learning setting,
where the objective function is non-convex and the stochastic gradient is used.
Then, we identify that it is independent of data heterogeneity for any momentum
coefficient . Through image classification tasks, we
demonstrate that Momentum Tracking is more robust to data heterogeneity than
the existing decentralized learning methods with momentum acceleration and can
consistently outperform these existing methods when the data distributions are
heterogeneous
- …