8 research outputs found
High-Quality Time Stretch and Pitch Shift Effects for Speech and Audio Using the Instantaneous Harmonic Analysis
Relationships between Protein Intake and Renal Function in a Japanese General Population: NIPPON DATA90
A New Method to Represent Speech Signals Via Predefined Signature and Envelope Sequences
A novel systematic procedure referred to as “SYMPES” to model speech signals is introduced. The structure of SYMPES is based on the creation of the so-called predefined “signature S={SR(n)} and envelope E={EK(n)}” sets. These sets are speaker and language independent. Once the speech signals are divided into frames with selected lengths, then each frame sequence Xi(n) is reconstructed by means of the mathematical form Xi(n)=CiEK(n)SR(n). In this representation, Ci is called the gain factor, SR(n) and EK(n) are properly assigned from the predefined signature and envelope sets, respectively. Examples are given to exhibit the implementation of SYMPES. It is shown that for the same compression ratio or better, SYMPES yields considerably better speech quality over the commercially available coders such as G.726 (ADPCM) at 16 kbps and voice excited LPC-10E (FS1015) at 2.4 kbps
Design of MELPe-Based Variable-Bit-Rate Speech Coding with Mel Scale Approach Using Low-Order Linear Prediction Filter and Representing Excitation Signal Using Glottal Closure Instants
A near-end listening enhancement system by RNN-based noise cancellation and speech modification
A uniform phase representation for the harmonic model in speech synthesis applications
Feature-based vocoders, e.g., STRAIGHT, offer a way to manipulate the perceived characteristics of the speech signal in speech transformation and synthesis. For the harmonic model, which provide excellent perceived quality, features for the amplitude parameters already exist (e.g., Line Spectral Frequencies (LSF), Mel-Frequency Cepstral Coefficients (MFCC)). However, because of the wrapping of the phase parameters, phase features are more difficult to design. To randomize the phase of the harmonic model during synthesis, a voicing feature is commonly used, which distinguishes voiced and unvoiced segments. However, voice production allows smooth transitions between voiced/unvoiced states which makes voicing segmentation sometimes tricky to estimate. In this article, two-phase features are suggested to represent the phase of the harmonic model in a uniform way, without voicing decision. The synthesis quality of the resulting vocoder has been evaluated, using subjective listening tests, in the context of resynthesis, pitch scaling, and Hidden Markov Model (HMM)-based synthesis. The experiments show that the suggested signal model is comparable to STRAIGHT or even better in some scenarios. They also reveal some limitations of the harmonic framework itself in the case of high fundamental frequencies.G. Degottex has been funded by the Swiss National Science Foundation (SNSF) (grants PBSKP2_134325, PBSKP2_140021), Switzerland, and the Foundation for Research and Technology-Hellas (FORTH), Heraklion, Greece. D. Erro has been funded by the Basque Government (BER2TEK, IE12-333) and the Spanish Ministry of Economy and Competitiveness (SpeechTech4All, TEC2012-38939-C03-03)