6 research outputs found

    Lexicon, allophony and speech perception in articulatory

    Get PDF
    Orientador: Eleonora Cavalcante AlbanoDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Estudos da LinguagemResumo: A variante posicional de um fonema é chamada de alofonia. Geralmente considerada como processo fonético de aplicação automática e não consciente para os falantes da língua. No entanto, resultados preliminares da literatura indicam que variantes alofônicas podem ser percebidas diferentemente em palavras e não palavras. Seria, portanto, possível, que um fenômeno fonético como a alofonia poderia ser influenciado pelo status lexical? Para testar essa hipótese, realizaram-se dois experimentos perceptuais com a alofonia de /a/ no português do Brasil em posição acentuada e pós-acentuada. Foram coletados dados de tempo de reação e taxa de acerto em paradigma de discriminação ABX. Os estímulos foram com vogais naturais e manipuladas para os contextos acentuais. Os contextos de vizinhança lexical, freqüência de ocorrência e probabilidade fonotática foram controlados no segundo experimento. No total, foram coletados dados de 18 falantes nativos do português do Brasil. Os resultados do experimento sugerem que há mais facilidade na discriminação dos alofones a depender da lexicalidade e do contexto acentual, na direção da hipótese inicial. Os resultados são discutidos à luz das Teorias da Fonologia Articulatória Fonologia e da Teoria de ExemplaresAbstract: The positional variant of a phoneme is known as allophony. It is usually considered a phonetic process of automatic and unconscious execution by speakers of a language. However, preliminary results indicate that allophonic variants can be perceived differently in words and non-words. Would it, then, be possible, that a phonetic phenomenon, such as allophony, could be influenced by lexical status? In order to test this hypothesis, two perceptual experiments with vowel /a/ allophony in stressed and post-stressed position in Brazilian Portuguese were performed. The collected data consisted of reaction time and correct response rate in an ABX discrimination paradigm. The stimuli consisted of naturally produced and modified vowels to match stress context. Lexical neighborhood, frequency of occurrence e phonotactical probability were also controlled for the second experiment. In total, data from 18 native speakers of Brazilian Portuguese subjects were collected. The results suggest it is easier to distinguish allophones depending on lexical status and stress context, as per the initial hypothesis. Results are discussed under the viewpoint of the theories of Articulatory Phonology and Exemplar TheoryMestradoLinguisticaMestre em Linguística01/00137-9CAPESFAPES

    Revisão do módulo de transcrição fonética para implementação no sintetizador de fala da empresa Verbio Technologies SL

    Get PDF
    Dissertação de mest., Processamento de Linguagem Natural e Indústrias da Língua, Faculdade de Ciências Humanas e Sociais, Univ. do Algarve, 2013O objetivo deste trabanlho é contribuir para a melhoria da qualidade do sistema de conversão de texto em fala elaborado para o Português do Brasil e desenvolvido pela empresa Verbio Technologies SL. Tais modificações foram possíveis a partir da revisão minuciosa e das consequentes modificações no módulo de transcrição fonética do sintetizador. Devido às alterações introduzidas pela Nova Ortografia do Português foram feitas modificações nas regras de transformação dos grafemas em fonemas, parte integrante do transcritor fonético que compõe o sistema desenvolvido pela empresa. O novo acordo ortográfico consiste na reestruturação ortográfica da língua portuguesa, deste modo, muitas palavras sofreram modificações e, tais alterações deverão ser abarcadas pelo transcritor grafema-fonema do sistema de síntese de fala. Além das novas regras da ortografia portuguesa, também foi utilizado um dicionário desenvolvido pelo Centro de Pesquisa e Desenvolvimento em Telecomunicações (CPqD), versão 1.4 de maio de 2003. Este dicionário foi usado como ponto de partida para a definição dos fonemas e do subsequente desenvolvimento das novas regras. A metodologia de estudo consistiu na análise detalhada de duas variantes linguísticas do português brasileiro: a variante falada no Rio de Janeiro e a variante falada São Paulo, regiões economicamente desenvolvidas e onde a aplicação de um sintetizador se justica. Além da incorporação das novas regras de ortografia da língua portuguesa, foram definidas também algumas regras que contemplam determinados processos fonológicos frequentes no português brasileiro, como é o caso da epêntese vocálica

    Concatenative speech synthesis: a Framework for Reducing Perceived Distortion when using the TD-PSOLA Algorithm

    Get PDF
    This thesis presents the design and evaluation of an approach to concatenative speech synthesis using the Titne-Domain Pitch-Synchronous OverLap-Add (I'D-PSOLA) signal processing algorithm. Concatenative synthesis systems make use of pre-recorded speech segments stored in a speech corpus. At synthesis time, the `best' segments available to synthesise the new utterances are chosen from the corpus using a process known as unit selection. During the synthesis process, the pitch and duration of these segments may be modified to generate the desired prosody. The TD-PSOLA algorithm provides an efficient and essentially successful solution to perform these modifications, although some perceptible distortion, in the form of `buzzyness', may be introduced into the speech signal. Despite the popularity of the TD-PSOLA algorithm, little formal research has been undertaken to address this recognised problem of distortion. The approach in the thesis has been developed towards reducing the perceived distortion that is introduced when TD-PSOLA is applied to speech. To investigate the occurrence of this distortion, a psychoacoustic evaluation of the effect of pitch modification using the TD-PSOLA algorithm is presented. Subjective experiments in the form of a set of listening tests were undertaken using word-level stimuli that had been manipulated using TD-PSOLA. The data collected from these experiments were analysed for patterns of co- occurrence or correlations to investigate where this distortion may occur. From this, parameters were identified which may have contributed to increased distortion. These parameters were concerned with the relationship between the spectral content of individual phonemes, the extent of pitch manipulation, and aspects of the original recordings. Based on these results, a framework was designed for use in conjunction with TD-PSOLA to minimise the possible causes of distortion. The framework consisted of a novel speech corpus design, a signal processing distortion measure, and a selection process for especially problematic phonemes. Rather than phonetically balanced, the corpus is balanced to the needs of the signal processing algorithm, containing more of the adversely affected phonemes. The aim is to reduce the potential extent of pitch modification of such segments, and hence produce synthetic speech with less perceptible distortion. The signal processingdistortion measure was developed to allow the prediction of perceptible distortion in pitch-modified speech. Different weightings were estimated for individual phonemes,trained using the experimental data collected during the listening tests.The potential benefit of such a measure for existing unit selection processes in a corpus-based system using TD-PSOLA is illustrated. Finally, the special-case selection process was developed for highly problematic voiced fricative phonemes to minimise the occurrence of perceived distortion in these segments. The success of the framework, in terms of generating synthetic speech with reduced distortion, was evaluated. A listening test showed that the TD-PSOLA balanced speech corpus may be capable of generating pitch-modified synthetic sentences with significantly less distortion than those generated using a typical phonetically balanced corpus. The voiced fricative selection process was also shown to produce pitch-modified versions of these phonemes with less perceived distortion than a standard selection process. The listening test then indicated that the signal processing distortion measure was able to predict the resulting amount of distortion at the sentence-level after the application of TD-PSOLA, suggesting that it may be beneficial to include such a measure in existing unit selection processes. The framework was found to be capable of producing speech with reduced perceptible distortion in certain situations, although the effects seen at the sentence-level were less than those seen in the previous investigative experiments that made use of word-level stimuli. This suggeststhat the effect of the TD-PSOLA algorithm cannot always be easily anticipated due to the highly dynamic nature of speech, and that the reduction of perceptible distortion in TD-PSOLA-modified speech remains a challenge to the speech community

    Concatenative speech synthesis : a framework for reducing perceived distortion when using the TD-PSOLA algorithm

    Get PDF
    This thesis presents the design and evaluation of an approach to concatenative speech synthesis using the Titne-Domain Pitch-Synchronous OverLap-Add (I'D-PSOLA) signal processing algorithm. Concatenative synthesis systems make use of pre-recorded speech segments stored in a speech corpus. At synthesis time, the `best' segments available to synthesise the new utterances are chosen from the corpus using a process known as unit selection. During the synthesis process, the pitch and duration of these segments may be modified to generate the desired prosody. The TD-PSOLA algorithm provides an efficient and essentially successful solution to perform these modifications, although some perceptible distortion, in the form of `buzzyness', may be introduced into the speech signal. Despite the popularity of the TD-PSOLA algorithm, little formal research has been undertaken to address this recognised problem of distortion. The approach in the thesis has been developed towards reducing the perceived distortion that is introduced when TD-PSOLA is applied to speech. To investigate the occurrence of this distortion, a psychoacoustic evaluation of the effect of pitch modification using the TD-PSOLA algorithm is presented. Subjective experiments in the form of a set of listening tests were undertaken using word-level stimuli that had been manipulated using TD-PSOLA. The data collected from these experiments were analysed for patterns of co- occurrence or correlations to investigate where this distortion may occur. From this, parameters were identified which may have contributed to increased distortion. These parameters were concerned with the relationship between the spectral content of individual phonemes, the extent of pitch manipulation, and aspects of the original recordings. Based on these results, a framework was designed for use in conjunction with TD-PSOLA to minimise the possible causes of distortion. The framework consisted of a novel speech corpus design, a signal processing distortion measure, and a selection process for especially problematic phonemes. Rather than phonetically balanced, the corpus is balanced to the needs of the signal processing algorithm, containing more of the adversely affected phonemes. The aim is to reduce the potential extent of pitch modification of such segments, and hence produce synthetic speech with less perceptible distortion. The signal processingdistortion measure was developed to allow the prediction of perceptible distortion in pitch-modified speech. Different weightings were estimated for individual phonemes,trained using the experimental data collected during the listening tests.The potential benefit of such a measure for existing unit selection processes in a corpus-based system using TD-PSOLA is illustrated. Finally, the special-case selection process was developed for highly problematic voiced fricative phonemes to minimise the occurrence of perceived distortion in these segments. The success of the framework, in terms of generating synthetic speech with reduced distortion, was evaluated. A listening test showed that the TD-PSOLA balanced speech corpus may be capable of generating pitch-modified synthetic sentences with significantly less distortion than those generated using a typical phonetically balanced corpus. The voiced fricative selection process was also shown to produce pitch-modified versions of these phonemes with less perceived distortion than a standard selection process. The listening test then indicated that the signal processing distortion measure was able to predict the resulting amount of distortion at the sentence-level after the application of TD-PSOLA, suggesting that it may be beneficial to include such a measure in existing unit selection processes. The framework was found to be capable of producing speech with reduced perceptible distortion in certain situations, although the effects seen at the sentence-level were less than those seen in the previous investigative experiments that made use of word-level stimuli. This suggeststhat the effect of the TD-PSOLA algorithm cannot always be easily anticipated due to the highly dynamic nature of speech, and that the reduction of perceptible distortion in TD-PSOLA-modified speech remains a challenge to the speech community.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    A hybrid model for text-to-speech synthesis

    No full text
    This paper describes a hybrid model developed for high-quality, concatenation-based, text-to-speech synthesis. The speech signal is submitted to a pitch-synchronous analysis and decomposed into a harmonic component, with a variable maximum frequency, plus a noise component. The harmonic component is modeled as a sum of sinusoids with frequencies multiple of the pitch. The noise component is modeled as a random excitation applied to an LPC filter. In unvoiced segments, the harmonic component is made equal to zero. In the presence of pitch modifications, a new set of harmonic parameters is evaluated by resampling the spectrum envelope at the new harmonic frequencies. For the synthesis of the harmonic component in the presence of duration and/or pitch modifications, a phase correction is introduced into the harmonic parameters. The sinusoidal model of synthesis is used for the harmonic component and the LPC model combined with an overlap and add procedure is used for the noise synthesis. This hybrid model enables independent and continuous control of duration and pitch of the synthesized speech. Comparative evaluation tests made in a text-to-speech environment have shown that the hybrid model assures better performance than the time-domain pitch-synchronous overlap-add (TD-PSOLA) model.6542643
    corecore