818 research outputs found

    Perceptually smooth timbral guides by state-space analysis of phase-vocoder parameters

    Get PDF
    Sculptor is a phase-vocoder-based package of programs that allows users to explore timbral manipulation of sound in real time. It is the product of a research program seeking ultimately to perform gestural capture by analysis of the sound a performer makes using a conventional instrument. Since the phase-vocoder output is of high dimensionality — typically more than 1,000 channels per analysis frame—mapping phase-vocoder output to appropriate input parameters for a synthesizer is only feasible in theory

    A modulation property of time-frequency derivatives of filtered phase and its application to aperiodicity and fo estimation

    Full text link
    We introduce a simple and linear SNR (strictly speaking, periodic to random power ratio) estimator (0dB to 80dB without additional calibration/linearization) for providing reliable descriptions of aperiodicity in speech corpus. The main idea of this method is to estimate the background random noise level without directly extracting the background noise. The proposed method is applicable to a wide variety of time windowing functions with very low sidelobe levels. The estimate combines the frequency derivative and the time-frequency derivative of the mapping from filter center frequency to the output instantaneous frequency. This procedure can replace the periodicity detection and aperiodicity estimation subsystems of recently introduced open source vocoder, YANG vocoder. Source code of MATLAB implementation of this method will also be open sourced.Comment: 8 pages 9 figures, Submitted and accepted in Interspeech201

    An introduction to statistical parametric speech synthesis

    Get PDF

    A Phase Vocoder based on Nonstationary Gabor Frames

    Full text link
    We propose a new algorithm for time stretching music signals based on the theory of nonstationary Gabor frames (NSGFs). The algorithm extends the techniques of the classical phase vocoder (PV) by incorporating adaptive time-frequency (TF) representations and adaptive phase locking. The adaptive TF representations imply good time resolution for the onsets of attack transients and good frequency resolution for the sinusoidal components. We estimate the phase values only at peak channels and the remaining phases are then locked to the values of the peaks in an adaptive manner. During attack transients we keep the stretch factor equal to one and we propose a new strategy for determining which channels are relevant for reinitializing the corresponding phase values. In contrast to previously published algorithms we use a non-uniform NSGF to obtain a low redundancy of the corresponding TF representation. We show that with just three times as many TF coefficients as signal samples, artifacts such as phasiness and transient smearing can be greatly reduced compared to the classical PV. The proposed algorithm is tested on both synthetic and real world signals and compared with state of the art algorithms in a reproducible manner.Comment: 10 pages, 6 figure

    Expediting TTS Synthesis with Adversarial Vocoding

    Get PDF
    Recent approaches in text-to-speech (TTS) synthesis employ neural network strategies to vocode perceptually-informed spectrogram representations directly into listenable waveforms. Such vocoding procedures create a computational bottleneck in modern TTS pipelines. We propose an alternative approach which utilizes generative adversarial networks (GANs) to learn mappings from perceptually-informed spectrograms to simple magnitude spectrograms which can be heuristically vocoded. Through a user study, we show that our approach significantly outperforms na\"ive vocoding strategies while being hundreds of times faster than neural network vocoders used in state-of-the-art TTS systems. We also show that our method can be used to achieve state-of-the-art results in unsupervised synthesis of individual words of speech.Comment: Published as a conference paper at INTERSPEECH 201

    Antenna Beam Coverage Concepts

    Get PDF
    The strawman Personal Access Satellite System (PASS) design calls for the use of a CONUS beam for transmission between the supplier and the satellite and for fixed beams for transmission between the basic personal terminal and the satellite. The satellite uses a 3 m main reflector for transmission at 20 GHz and a 2 m main reflector for reception at 30 GHz. There are several types of spot beams under consideration for the PASS system besides fixed beams. The beam pattern of a CONUS coverage switched beam is shown along with that of a scanning beam. A switched beam refers to one in which the signal from the satellite is connected alternatively to various feed horns. Scanning beams are taken to mean beams whose footprints are moved between contiguous regions in the beam's coverage area. The advantages and disadvantages of switched and/or scanning beams relative to fixed beams. The consequences of using switched/scanning in lieu of fixed beams in the PASS design and attempts are made to evaluate the listed advantages and disadvantages. Two uses of switched/scanning beams are examined. To illustrate the implications of switched beams use on PASS system design, operation at two beam scan rates is explored

    A Fully Time-domain Neural Model for Subband-based Speech Synthesizer

    Full text link
    This paper introduces a deep neural network model for subband-based speech synthesizer. The model benefits from the short bandwidth of the subband signals to reduce the complexity of the time-domain speech generator. We employed the multi-level wavelet analysis/synthesis to decompose/reconstruct the signal into subbands in time domain. Inspired from the WaveNet, a convolutional neural network (CNN) model predicts subband speech signals fully in time domain. Due to the short bandwidth of the subbands, a simple network architecture is enough to train the simple patterns of the subbands accurately. In the ground truth experiments with teacher-forcing, the subband synthesizer outperforms the fullband model significantly in terms of both subjective and objective measures. In addition, by conditioning the model on the phoneme sequence using a pronunciation dictionary, we have achieved the fully time-domain neural model for subband-based text-to-speech (TTS) synthesizer, which is nearly end-to-end. The generated speech of the subband TTS shows comparable quality as the fullband one with a slighter network architecture for each subband.Comment: 5 pages, 3 figur

    Accurate wearable heart rate monitoring during physical exercises using PPG

    Get PDF
    Objective: The challenging task of heart rate (HR) estimation from the photoplethysmographic (PPG) signal, during intensive physical exercises is tackled in this paper. Methods: The study presents a detailed analysis of a novel algorithm (WFPV) that exploits a Wiener filter to attenuate the motion artifacts, a phase vocoder to refine the HR estimate and user-adaptive postprocessing to track the subject physiology. Additionally, an offline version of the HR estimation algorithm that uses Viterbi decoding is designed for scenarios that do not require online HR monitoring (WFPV+VD). The performance of the HR estimation systems is rigorously compared with existing algorithms on the publically available database of 23 PPG recordings. Results: On the whole dataset of 23 PPG recordings, the algorithms result in average absolute errors of 1.97 and 1.37 BPM in the online and offline modes, respectively. On the test dataset of 10 PPG recordings which were most corrupted with motion artifacts, WFPV has an error of 2.95 BPM on its own and 2.32 BPM in an ensemble with 2 existing algorithms. Conclusion: The error rate is significantly reduced when compared with the state-of-the art PPG-based HR estimation methods. Significance: The proposed system is shown to be accurate in the presence of strong motion artifacts and in contrast to existing alternatives has very few free parameters to tune. The algorithm has a low computational cost and can be used for fitness tracking and health monitoring in wearable devices. The Matlab implementation of the algorithm is provided online
    • …
    corecore