818 research outputs found
Perceptually smooth timbral guides by state-space analysis of phase-vocoder parameters
Sculptor is a phase-vocoder-based package of programs
that allows users to explore timbral manipulation
of sound in real time. It is the product
of a research program seeking ultimately to perform
gestural capture by analysis of the sound a
performer makes using a conventional instrument.
Since the phase-vocoder output is of high dimensionality —
typically more than 1,000 channels per
analysis frame—mapping phase-vocoder output to
appropriate input parameters for a synthesizer is
only feasible in theory
A modulation property of time-frequency derivatives of filtered phase and its application to aperiodicity and fo estimation
We introduce a simple and linear SNR (strictly speaking, periodic to random
power ratio) estimator (0dB to 80dB without additional
calibration/linearization) for providing reliable descriptions of aperiodicity
in speech corpus. The main idea of this method is to estimate the background
random noise level without directly extracting the background noise. The
proposed method is applicable to a wide variety of time windowing functions
with very low sidelobe levels. The estimate combines the frequency derivative
and the time-frequency derivative of the mapping from filter center frequency
to the output instantaneous frequency. This procedure can replace the
periodicity detection and aperiodicity estimation subsystems of recently
introduced open source vocoder, YANG vocoder. Source code of MATLAB
implementation of this method will also be open sourced.Comment: 8 pages 9 figures, Submitted and accepted in Interspeech201
A Phase Vocoder based on Nonstationary Gabor Frames
We propose a new algorithm for time stretching music signals based on the
theory of nonstationary Gabor frames (NSGFs). The algorithm extends the
techniques of the classical phase vocoder (PV) by incorporating adaptive
time-frequency (TF) representations and adaptive phase locking. The adaptive TF
representations imply good time resolution for the onsets of attack transients
and good frequency resolution for the sinusoidal components. We estimate the
phase values only at peak channels and the remaining phases are then locked to
the values of the peaks in an adaptive manner. During attack transients we keep
the stretch factor equal to one and we propose a new strategy for determining
which channels are relevant for reinitializing the corresponding phase values.
In contrast to previously published algorithms we use a non-uniform NSGF to
obtain a low redundancy of the corresponding TF representation. We show that
with just three times as many TF coefficients as signal samples, artifacts such
as phasiness and transient smearing can be greatly reduced compared to the
classical PV. The proposed algorithm is tested on both synthetic and real world
signals and compared with state of the art algorithms in a reproducible manner.Comment: 10 pages, 6 figure
Expediting TTS Synthesis with Adversarial Vocoding
Recent approaches in text-to-speech (TTS) synthesis employ neural network
strategies to vocode perceptually-informed spectrogram representations directly
into listenable waveforms. Such vocoding procedures create a computational
bottleneck in modern TTS pipelines. We propose an alternative approach which
utilizes generative adversarial networks (GANs) to learn mappings from
perceptually-informed spectrograms to simple magnitude spectrograms which can
be heuristically vocoded. Through a user study, we show that our approach
significantly outperforms na\"ive vocoding strategies while being hundreds of
times faster than neural network vocoders used in state-of-the-art TTS systems.
We also show that our method can be used to achieve state-of-the-art results in
unsupervised synthesis of individual words of speech.Comment: Published as a conference paper at INTERSPEECH 201
Antenna Beam Coverage Concepts
The strawman Personal Access Satellite System (PASS) design calls for the use of a CONUS beam for transmission between the supplier and the satellite and for fixed beams for transmission between the basic personal terminal and the satellite. The satellite uses a 3 m main reflector for transmission at 20 GHz and a 2 m main reflector for reception at 30 GHz. There are several types of spot beams under consideration for the PASS system besides fixed beams. The beam pattern of a CONUS coverage switched beam is shown along with that of a scanning beam. A switched beam refers to one in which the signal from the satellite is connected alternatively to various feed horns. Scanning beams are taken to mean beams whose footprints are moved between contiguous regions in the beam's coverage area. The advantages and disadvantages of switched and/or scanning beams relative to fixed beams. The consequences of using switched/scanning in lieu of fixed beams in the PASS design and attempts are made to evaluate the listed advantages and disadvantages. Two uses of switched/scanning beams are examined. To illustrate the implications of switched beams use on PASS system design, operation at two beam scan rates is explored
A Fully Time-domain Neural Model for Subband-based Speech Synthesizer
This paper introduces a deep neural network model for subband-based speech
synthesizer. The model benefits from the short bandwidth of the subband signals
to reduce the complexity of the time-domain speech generator. We employed the
multi-level wavelet analysis/synthesis to decompose/reconstruct the signal into
subbands in time domain. Inspired from the WaveNet, a convolutional neural
network (CNN) model predicts subband speech signals fully in time domain. Due
to the short bandwidth of the subbands, a simple network architecture is enough
to train the simple patterns of the subbands accurately. In the ground truth
experiments with teacher-forcing, the subband synthesizer outperforms the
fullband model significantly in terms of both subjective and objective
measures. In addition, by conditioning the model on the phoneme sequence using
a pronunciation dictionary, we have achieved the fully time-domain neural model
for subband-based text-to-speech (TTS) synthesizer, which is nearly end-to-end.
The generated speech of the subband TTS shows comparable quality as the
fullband one with a slighter network architecture for each subband.Comment: 5 pages, 3 figur
Accurate wearable heart rate monitoring during physical exercises using PPG
Objective: The challenging task of heart rate (HR) estimation from the photoplethysmographic (PPG) signal, during intensive physical exercises is tackled in this paper. Methods: The study presents a detailed analysis of a novel algorithm (WFPV) that exploits a Wiener filter to attenuate the motion artifacts, a phase vocoder to refine the HR estimate and user-adaptive postprocessing to track the subject physiology. Additionally, an offline version of the HR estimation algorithm that uses Viterbi decoding is designed for scenarios that do not require online HR monitoring (WFPV+VD). The performance of the HR estimation systems is rigorously compared with existing algorithms on the publically available database of 23 PPG recordings. Results: On the whole dataset of 23 PPG recordings, the algorithms result in average absolute errors of 1.97 and 1.37 BPM in the online and offline modes, respectively. On the test dataset of 10 PPG recordings which were most corrupted with motion artifacts, WFPV has an error of 2.95 BPM on its own and 2.32 BPM in an ensemble with 2 existing algorithms. Conclusion: The error rate is significantly reduced when compared with the state-of-the art PPG-based HR estimation methods. Significance: The proposed system is shown to be accurate in the presence of strong motion artifacts and in contrast to existing alternatives has very few free parameters to tune. The algorithm has a low computational cost and can be used for fitness tracking and health monitoring in wearable devices. The Matlab implementation of the algorithm is provided online
- …