45,956 research outputs found

    FPGA-based Implementation of Concatenative Speech Synthesis Algorithm

    Get PDF
    The main aim of a text-to-speech synthesis system is to convert ordinary text into an acoustic signal that is indistinguishable from human speech. This thesis presents an architecture to implement a concatenative speech synthesis algorithm targeted to FPGAs. Many current text-to-speech systems are based on the concatenation of acoustic units of recorded speech. Current concatenative speech synthesizers are capable of producing highly intelligible speech. However, the quality of speech often suffers from discontinuities between the acoustic units, due to contextual differences. This is the easiest method to produce synthetic speech. It concatenates prerecorded acoustic elements and forms a continuous speech element. The software implementation of the algorithm is performed in C whereas the hardware implementation is done in structural VHDL. A database of acoustic elements is formed first with recording sounds for different phones. The architecture is designed to concatenate acoustic elements corresponding to the phones that form the target word. Target word corresponds to the word that has to be synthesized. This architecture doesn\u27t address the form discontinuities between the acoustic elements as its ultimate goal is the synthesis of speech. The Hardware implementation is verified on a Virtex (v800hq240-4) FPGA device

    FPGA Implementation of an Adaptive Noise Canceller for Robust Speech Enhancement Interfaces

    Get PDF
    This paper describes the design and implementation results of an adaptive Noise Canceller useful for the construction of Robust Speech Enhancement Interfaces. The algorithm being used has very good performance for real time applications. Its main disadvantage is the requirement of calculating several operations of division, having a high computational cost. Besides that, the accuracy of the algorithm is critical in fixed-point representation due to the wide range of the upper and lower bounds of the variables implied in the algorithm. To solve this problem, the accuracy is studied and according to the results obtained a specific word-length has been adopted for each variable. The algorithm has been implemented for Altera and Xilinx FPGAs using high level synthesis tools. The results for a fixed format of 40 bits for all the variables and for a specific word-length for each variable are analyzed and discussed

    Singing synthesis with an evolved physical model

    Get PDF
    A two-dimensional physical model of the human vocal tract is described. Such a system promises increased realism and control in the synthesis. of both speech and singing. However, the parameters describing the shape of the vocal tract while in use are not easily obtained, even using medical imaging techniques, so instead a genetic algorithm (GA) is applied to the model to find an appropriate configuration. Realistic sounds are produced by this method. Analysis of these, and the reliability of the technique (convergence properties) is provided

    Robust tracking of glottal LF-model parameters by multi-estimate fusion

    Get PDF
    A new approach to robust tracking of glottal LF-model parameters is presented. The approach does not rely on a new glottal source estimation algorithm, but instead introduces a new extensible multi-estimate fusion framework. Within this framework several existing algorithms are applied in parallel to extract glottal LF-model parameter estimates which are subsequently passed to quantitative data fusion procedures. The preliminary implementation of the fusion algorithm described here incorporates three glottal inverse filtering methods and one time-domain LF-model fitting algorithm. Experimental results for both synthetic and natural speech signals demonstrate the effectiveness of the fusion algorithm. The proposed method is flexible and can be easily extended for other speech processing applications such as speech synthesis, speaker identification and prosody analysis

    Modern Methods of Time-Frequency Warping of Sound Signals

    Get PDF
    Tato prĂĄce se zabĂœvĂĄ reprezentacĂ­ nestacionĂĄrnĂ­ch harmonickĂœch signĂĄlĆŻ s časově proměnnĂœmi komponentami. PrimĂĄrně je zaměƙena na Harmonickou transformaci a jeji variantu se subkvadratickou vĂœpočetnĂ­ sloĆŸitostĂ­, Rychlou harmonickou transformaci. V tĂ©to prĂĄci jsou prezentovĂĄny dva algoritmy vyuĆŸĂ­vajĂ­cĂ­ Rychlou harmonickou transformaci. Prvni pouĆŸĂ­vĂĄ jako metodu odhadu změny zĂĄkladnĂ­ho kmitočtu sbĂ­ranĂ© logaritmickĂ© spektrum a druhĂĄ pouĆŸĂ­vĂĄ metodu analĂœzy syntĂ©zou. Oba algoritmy jsou pouĆŸity k analĂœze ƙečovĂ©ho segmentu pro porovnĂĄnĂ­ vystupĆŻ. Nakonec je algoritmus vyuĆŸĂ­vajĂ­cĂ­ metody analĂœzy syntĂ©zou pouĆŸit na reĂĄlnĂ© zvukovĂ© signĂĄly, aby bylo moĆŸnĂ© změƙit zlepĆĄenĂ­ reprezentace kmitočtově modulovanĂœch signĂĄlĆŻ za pouĆŸitĂ­ HarmonickĂ© transformace.This thesis deals with representation of non-stationary harmonic signals with time-varying components. Its main focus is aimed at Harmonic Transform and its variant with subquadratic computational complexity, the Fast Harmonic Transform. Two algorithms using the Fast Harmonic Transform are presented. The first uses the gathered log-spectrum as fundamental frequency change estimation method, the second uses analysis-by-synthesis approach. Both algorithms are used on a speech segment to compare its output. Further the analysis-by-synthesis algorithm is applied on several real sound signals to measure the increase in the ability to represent real frequency-modulated signals using the Harmonic Transform.

    A Phase Vocoder based on Nonstationary Gabor Frames

    Full text link
    We propose a new algorithm for time stretching music signals based on the theory of nonstationary Gabor frames (NSGFs). The algorithm extends the techniques of the classical phase vocoder (PV) by incorporating adaptive time-frequency (TF) representations and adaptive phase locking. The adaptive TF representations imply good time resolution for the onsets of attack transients and good frequency resolution for the sinusoidal components. We estimate the phase values only at peak channels and the remaining phases are then locked to the values of the peaks in an adaptive manner. During attack transients we keep the stretch factor equal to one and we propose a new strategy for determining which channels are relevant for reinitializing the corresponding phase values. In contrast to previously published algorithms we use a non-uniform NSGF to obtain a low redundancy of the corresponding TF representation. We show that with just three times as many TF coefficients as signal samples, artifacts such as phasiness and transient smearing can be greatly reduced compared to the classical PV. The proposed algorithm is tested on both synthetic and real world signals and compared with state of the art algorithms in a reproducible manner.Comment: 10 pages, 6 figure
    • 

    corecore