173 research outputs found
Natural Model Reduction for Kinetic Equations
A promising approach to investigating high-dimensional problems is to
identify their intrinsically low-dimensional features, which can be achieved
through recently developed techniques for effective low-dimensional
representation of functions such as machine learning. Based on available
finite-dimensional approximate solution manifolds, this paper proposes a novel
model reduction framework for kinetic equations. The method employs projections
onto tangent bundles of approximate manifolds, naturally resulting in
first-order hyperbolic systems. Under certain conditions on the approximate
manifolds, the reduced models preserve several crucial properties, including
hyperbolicity, conservation laws, entropy dissipation, finite propagation
speed, and linear stability. For the first time, this paper rigorously
discusses the relation between the H-theorem of kinetic equations and the
linear stability conditions of reduced systems, determining the choice of
Riemannian metrics involved in the model reduction. The framework is widely
applicable for the model reduction of many models in kinetic theory.Comment: 46 page
Lax Equivalence for Hyperbolic Relaxation Approximations
This paper investigates the zero relaxation limit for general linear
hyperbolic relaxation systems and establishes the asymptotic convergence of
slow variables under the unimprovable weakest stability condition, akin to the
Lax equivalence theorem for hyperbolic relaxation approximations. Despite
potential high oscillations, the convergence of macroscopic variables is
established in the strong sense rather than the sense of
weak convergence, time averaging, or ensemble averaging.Comment: 32 page
High Order Numerical Homogenization for Dissipative Ordinary Differential Equations
We propose a high order numerical homogenization method for dissipative
ordinary differential equations (ODEs) containing two time scales. Essentially,
only first order homogenized model globally in time can be derived. To achieve
a high order method, we have to adopt a numerical approach in the framework of
the heterogeneous multiscale method (HMM). By a successively refined
microscopic solver, the accuracy improvement up to arbitrary order is attained
providing input data smooth enough. Based on the formulation of the high order
microscopic solver we derived, an iterative formula to calculate the
microscopic solver is then proposed. Using the iterative formula, we develop an
implementation to the method in an efficient way for practical applications.
Several numerical examples are presented to validate the new models and
numerical methods.Comment: 29 pages, 8 figure
HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
Real-world audio recordings are often degraded by factors such as noise,
reverberation, and equalization distortion. This paper introduces HiFi-GAN, a
deep learning method to transform recorded speech to sound as though it had
been recorded in a studio. We use an end-to-end feed-forward WaveNet
architecture, trained with multi-scale adversarial discriminators in both the
time domain and the time-frequency domain. It relies on the deep feature
matching losses of the discriminators to improve the perceptual quality of
enhanced speech. The proposed model generalizes well to new speakers, new
speech content, and new environments. It significantly outperforms
state-of-the-art baseline methods in both objective and subjective experiments.Comment: Accepted by INTERSPEECH 202
Efficient Spoken Language Recognition via Multilabel Classification
Spoken language recognition (SLR) is the task of automatically identifying
the language present in a speech signal. Existing SLR models are either too
computationally expensive or too large to run effectively on devices with
limited resources. For real-world deployment, a model should also gracefully
handle unseen languages outside of the target language set, yet prior work has
focused on closed-set classification where all input languages are known
a-priori. In this paper we address these two limitations: we explore efficient
model architectures for SLR based on convolutional networks, and propose a
multilabel training strategy to handle non-target languages at inference time.
Using the VoxLingua107 dataset, we show that our models obtain competitive
results while being orders of magnitude smaller and faster than current
state-of-the-art methods, and that our multilabel strategy is more robust to
unseen non-target languages compared to multiclass classification.Comment: Accepted to InterSpeech 202
F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder
Non-parallel many-to-many voice conversion remains an interesting but
challenging speech processing task. Many style-transfer-inspired methods such
as generative adversarial networks (GANs) and variational autoencoders (VAEs)
have been proposed. Recently, AutoVC, a conditional autoencoders (CAEs) based
method achieved state-of-the-art results by disentangling the speaker identity
and speech content using information-constraining bottlenecks, and it achieves
zero-shot conversion by swapping in a different speaker's identity embedding to
synthesize a new voice. However, we found that while speaker identity is
disentangled from speech content, a significant amount of prosodic information,
such as source F0, leaks through the bottleneck, causing target F0 to fluctuate
unnaturally. Furthermore, AutoVC has no control of the converted F0 and thus
unsuitable for many applications. In the paper, we modified and improved
autoencoder-based voice conversion to disentangle content, F0, and speaker
identity at the same time. Therefore, we can control the F0 contour, generate
speech with F0 consistent with the target speaker, and significantly improve
quality and similarity. We support our improvement through quantitative and
qualitative analysis
- …