45,956 research outputs found
FPGA-based Implementation of Concatenative Speech Synthesis Algorithm
The main aim of a text-to-speech synthesis system is to convert ordinary text into an acoustic signal that is indistinguishable from human speech. This thesis presents an architecture to implement a concatenative speech synthesis algorithm targeted to FPGAs. Many current text-to-speech systems are based on the concatenation of acoustic units of recorded speech. Current concatenative speech synthesizers are capable of producing highly intelligible speech. However, the quality of speech often suffers from discontinuities between the acoustic units, due to contextual differences. This is the easiest method to produce synthetic speech. It concatenates prerecorded acoustic elements and forms a continuous speech element. The software implementation of the algorithm is performed in C whereas the hardware implementation is done in structural VHDL. A database of acoustic elements is formed first with recording sounds for different phones. The architecture is designed to concatenate acoustic elements corresponding to the phones that form the target word. Target word corresponds to the word that has to be synthesized. This architecture doesn\u27t address the form discontinuities between the acoustic elements as its ultimate goal is the synthesis of speech. The Hardware implementation is verified on a Virtex (v800hq240-4) FPGA device
FPGA Implementation of an Adaptive Noise Canceller for Robust Speech Enhancement Interfaces
This paper describes the design and implementation results of an adaptive Noise Canceller useful for the construction of Robust Speech Enhancement Interfaces. The algorithm being used has very good performance for real time applications. Its main disadvantage is the requirement of calculating several operations of division, having a high computational cost. Besides that, the accuracy of the algorithm is critical in fixed-point representation due to the wide range of the upper and lower bounds of the variables implied in the algorithm. To solve this problem, the accuracy is studied and according to the results obtained a specific word-length has been adopted for each variable. The algorithm has been implemented for Altera and Xilinx FPGAs using high level synthesis tools. The results for a fixed format of 40 bits for all the variables and for a specific word-length for each variable are analyzed and discussed
Singing synthesis with an evolved physical model
A two-dimensional physical model of the human vocal tract is described. Such a system promises increased realism and control in the synthesis. of both speech and singing. However, the parameters describing the shape of the vocal tract while in use are not easily obtained, even using medical imaging techniques, so instead a genetic algorithm (GA) is applied to the model to find an appropriate configuration. Realistic sounds are produced by this method. Analysis of these, and the reliability of the technique (convergence properties) is provided
Robust tracking of glottal LF-model parameters by multi-estimate fusion
A new approach to robust tracking of glottal LF-model parameters is presented. The approach does not rely on a new glottal source estimation algorithm, but instead
introduces a new extensible multi-estimate fusion framework. Within this framework several existing algorithms are applied in parallel to extract glottal LF-model parameter estimates which are subsequently passed to quantitative data fusion procedures. The preliminary implementation of the fusion algorithm described here
incorporates three glottal inverse filtering methods and one time-domain LF-model fitting algorithm. Experimental results for both synthetic and natural speech signals
demonstrate the effectiveness of the fusion algorithm. The proposed method is flexible and can be easily extended for other speech processing applications such as speech synthesis, speaker identification and prosody analysis
Modern Methods of Time-Frequency Warping of Sound Signals
Tato prĂĄce se zabĂœvĂĄ reprezentacĂ nestacionĂĄrnĂch harmonickĂœch signĂĄlĆŻ s ÄasovÄ promÄnnĂœmi komponentami. PrimĂĄrnÄ je zamÄĆena na Harmonickou transformaci a jeji variantu se subkvadratickou vĂœpoÄetnĂ sloĆŸitostĂ, Rychlou harmonickou transformaci. V tĂ©to prĂĄci jsou prezentovĂĄny dva algoritmy vyuĆŸĂvajĂcĂ Rychlou harmonickou transformaci. Prvni pouĆŸĂvĂĄ jako metodu odhadu zmÄny zĂĄkladnĂho kmitoÄtu sbĂranĂ© logaritmickĂ© spektrum a druhĂĄ pouĆŸĂvĂĄ metodu analĂœzy syntĂ©zou. Oba algoritmy jsou pouĆŸity k analĂœze ĆeÄovĂ©ho segmentu pro porovnĂĄnĂ vystupĆŻ. Nakonec je algoritmus vyuĆŸĂvajĂcĂ metody analĂœzy syntĂ©zou pouĆŸit na reĂĄlnĂ© zvukovĂ© signĂĄly, aby bylo moĆŸnĂ© zmÄĆit zlepĆĄenĂ reprezentace kmitoÄtovÄ modulovanĂœch signĂĄlĆŻ za pouĆŸitĂ HarmonickĂ© transformace.This thesis deals with representation of non-stationary harmonic signals with time-varying components. Its main focus is aimed at Harmonic Transform and its variant with subquadratic computational complexity, the Fast Harmonic Transform. Two algorithms using the Fast Harmonic Transform are presented. The first uses the gathered log-spectrum as fundamental frequency change estimation method, the second uses analysis-by-synthesis approach. Both algorithms are used on a speech segment to compare its output. Further the analysis-by-synthesis algorithm is applied on several real sound signals to measure the increase in the ability to represent real frequency-modulated signals using the Harmonic Transform.
A Phase Vocoder based on Nonstationary Gabor Frames
We propose a new algorithm for time stretching music signals based on the
theory of nonstationary Gabor frames (NSGFs). The algorithm extends the
techniques of the classical phase vocoder (PV) by incorporating adaptive
time-frequency (TF) representations and adaptive phase locking. The adaptive TF
representations imply good time resolution for the onsets of attack transients
and good frequency resolution for the sinusoidal components. We estimate the
phase values only at peak channels and the remaining phases are then locked to
the values of the peaks in an adaptive manner. During attack transients we keep
the stretch factor equal to one and we propose a new strategy for determining
which channels are relevant for reinitializing the corresponding phase values.
In contrast to previously published algorithms we use a non-uniform NSGF to
obtain a low redundancy of the corresponding TF representation. We show that
with just three times as many TF coefficients as signal samples, artifacts such
as phasiness and transient smearing can be greatly reduced compared to the
classical PV. The proposed algorithm is tested on both synthetic and real world
signals and compared with state of the art algorithms in a reproducible manner.Comment: 10 pages, 6 figure
- âŠ