12,244 research outputs found
A Flexible Online Framework for Projection-Based STFT Phase Retrieval
Several recent contributions in the field of iterative STFT phase retrieval
have demonstrated that the performance of the classical Griffin-Lim method can
be considerably improved upon. By using the same projection operators as
Griffin-Lim, but combining them in innovative ways, these approaches achieve
better results in terms of both reconstruction quality and required number of
iterations, while retaining a similar computational complexity per iteration.
However, like Griffin-Lim, these algorithms operate in an offline manner and
thus require an entire spectrogram as input, which is an unrealistic
requirement for many real-world speech communication applications. We propose
to extend RTISI -- an existing online (frame-by-frame) variant of the
Griffin-Lim algorithm -- into a flexible framework that enables straightforward
online implementation of any algorithm based on iterative projections. We
further employ this framework to implement online variants of the fast
Griffin-Lim algorithm, the accelerated Griffin-Lim algorithm, and two
algorithms from the optics domain. Evaluation results on speech signals show
that, similarly to the offline case, these algorithms can achieve a
considerable performance gain compared to RTISI.Comment: Submitted to ICASSP 2
Accelerated Griffin-Lim algorithm: A fast and provably converging numerical method for phase retrieval
The recovery of a signal from the magnitudes of its transformation, like the
Fourier transform, is known as the phase retrieval problem and is of big
relevance in various fields of engineering and applied physics. In this paper,
we present a fast inertial/momentum based algorithm for the phase retrieval
problem and we prove a convergence guarantee for the new algorithm and for the
Fast Griffin-Lim algorithm, whose convergence remained unproven in the past
decade. In the final chapter, we compare the algorithm for the Short Time
Fourier transform phase retrieval with the Griffin-Lim algorithm and FGLA and
to other iterative algorithms typically used for this type of problem
A fast Griffin Lim Algorithm
In this paper, we present a new algorithm to estimate a signal from its short-time Fourier transform modulus (STFTM). This algorithm is computationally simple and is obtained by an acceleration of the well-known Griffin-Lim algorithm (GLA). Before deriving the algorithm, we will give a new interpretation of the GLA and formulate the phase recovery problem in an optimization form. We then present some experimental results where the new algorithm is tested on various signals. It shows not only significant improvement in speed of convergence but it does as well recover the signals with a smaller error than the traditional GLA
Tacotron: Towards End-to-End Speech Synthesis
A text-to-speech synthesis system typically consists of multiple stages, such
as a text analysis frontend, an acoustic model and an audio synthesis module.
Building these components often requires extensive domain expertise and may
contain brittle design choices. In this paper, we present Tacotron, an
end-to-end generative text-to-speech model that synthesizes speech directly
from characters. Given pairs, the model can be trained completely
from scratch with random initialization. We present several key techniques to
make the sequence-to-sequence framework perform well for this challenging task.
Tacotron achieves a 3.82 subjective 5-scale mean opinion score on US English,
outperforming a production parametric system in terms of naturalness. In
addition, since Tacotron generates speech at the frame level, it's
substantially faster than sample-level autoregressive methods.Comment: Submitted to Interspeech 2017. v2 changed paper title to be
consistent with our conference submission (no content change other than typo
fixes
Time-scale and pitch modifications of speech signals and resynthesis from the discrete short-time Fourier transform
The modification methods described in this paper combine characteristics of PSOLA-based methods and algorithms that resynthesize speech from its short-time Fourier magnitude only. The starting point is a short-time Fourier representation of the signal. In the case of duration modification, portions, in voiced speech corresponding to pitch periods, are removed from or inserted in this representation. In the case of pitch modification, pitch periods are shortened or extended in this representation, and a number of pitch periods is inserted or removed, respectively. Since it is an important tool for both duration and pitch modification, the resynthesis-from-short-time-Fourier-magnitude-only method of Griffin and Lim (1984) and Griffin et al. (1984) is reviewed and adapted. Duration and pitch modification methods and their results are presented.\ud
\u
- …