870 research outputs found

    Intonation in a text-to-speech conversion system

    Get PDF

    Prosodic processing and its use in Verbmobil

    Get PDF
    We present the prosody module of the VERBMOBlL speech-to-speech translation system, the world wide first complete system, which successfully uses prosodic information in the linguistic analysis. This is achieved by computing probabilities for clause boundaries, accentuation, and different types of sentence mood for each of the word hypotheses computed by the word recognizer. These probabilities guide the search of the linguistic analysis. Disambiguation is already achieved during the analysis and not by a prosodic verification of different linguistic hypotheses. So far, the most useful prosodic information is provided by clause boundaries. These are detected with a recognition rate of 94%. For the parsing of word hypotheses graphs, the use of clause boundary probabilities yields a speed-up of 92% and a 96% reduction of alternative readings

    Suprasegmental transcription

    Get PDF
    No abstrac

    Acoustic correlates of encoded prosody in written conversation

    Get PDF
    This thesis presents an analysis of certain punctuation devices such as parenthesis, italics and emphatic spellings with respect to their acoustic correlates in read speech. The class of punctuation devices under investigation are referred to as prosodic markers. The thesis therefore presents an analysis of features of the spoken language which are represented symbolically in text. Hence it is a characterization of aspects of the spoken language which have been transcribed or symbolized in the written medium and then translated back into a spoken form by a reader. The thesis focuses in particular on the analysis of parenthesis, the examination of encoded prominence and emphasis, and also addresses the use of paralinguistic markers which signal attitude or emotion.In an effort to avoid the use of self constructed or artificial material containing arbitrary symbolic or prosodic encodings, all material used for empirical analysis was taken from examples of electronic written exchanges on the Internet, such as from electronic mail messages and from articles posted on electronic newsgroups and news bulletins. This medium of language, which is referred to here as written conversation, provides a rich source of material containing encoded prosodic markers. These occur in the form of 'smiley faces' expressing attitudes or feelings, words highlighted by a number of means such as capitalization, italics, underscore characters, or asterisks, and in the form of dashes or parentheses, which provide suggestions on how the information in a text or sentence may be structured with regard to its informational content.Chapter 2 investigates in detail the genre of written conversation with respect to its place in an emerging continuum between written and spoken language, concentrating on transcriptional devices and their function as indicators of prosody. The implications these symbolic representations bear on the task of reading, by humans as well as machines, are then examined.Chapters 3 and 4 turn to the acoustic analysis of parentheticals and emphasis markers respectively. The experimental work in this thesis is based on readings of a corpus of selected materials from written conversation with the acoustic analysis concentrating on the differences between readings of texts with prosodic markers and readings of the same texts from which prosodic markers have been removed. Finally, the effect of prosodic markers is tested in perception experiments involving both human and resynthesized utterances

    Hesitations in Spoken Dialogue Systems

    Get PDF
    Betz S. Hesitations in Spoken Dialogue Systems. Bielefeld: Universität Bielefeld; 2020

    Statistical morphological disambiguation with application to disambiguation of pronunciations in Turkish /

    Get PDF
    The statistical morphological disambiguation of agglutinative languages suffers from data sparseness. In this study, we introduce the notion of distinguishing tag sets (DTS) to overcome the problem. The morphological analyses of words are modeled with DTS and the root major part-of-speech tags. The disambiguator based on the introduced representations performs the statistical morphological disambiguation of Turkish with a recall of as high as 95.69 percent. In text-to-speech systems and in developing transcriptions for acoustic speech data, the problem occurs in disambiguating the pronunciation of a token in context, so that the correct pronunciation can be produced or the transcription uses the correct set of phonemes. We apply the morphological disambiguator to this problem of pronunciation disambiguation and achieve 99.54 percent recall with 97.95 percent precision. Most text-to-speech systems perform phrase level accentuation based on content word/function word distinction. This approach seems easy and adequate for some right headed languages such as English but is not suitable for languages such as Turkish. We then use a a heuristic approach to mark up the phrase boundaries based on dependency parsing on a basis of phrase level accentuation for Turkish TTS synthesizers

    Syntax, morphology, and phonology in text-to-speech systems

    Get PDF
    The paper is concerned with the integration of linguistic information in text-to-speech systems. Research in synthesis proper is at a stage where the need for systematic integration of comprehensive linguistic information in such systems is making itself felt more than ever. A surface structure parsing system is presented whose main virtue is that it permits linguists to express syntactic as well as lexical and morphological regularities and irregularities of a language in a simple and easy-to-learn formalism. Most aspects of the system are seen in the light of Danish and - sporadically - English and Finnish surface structure

    Prosodic Phrasing in Spontaneous Swedish

    Get PDF
    One of the most important functions of prosody is to divide the flow of speech into chunks. The chunking, or prosodic phrasing, of speech plays an important role in both the production and perception of speech. This study represents a move away from the laboratory speech examined in previous, related studies on prosodic phrasing in Swedish, since a spontaneous, Southern Swedish speech material is investigated. The study is, however, not primarily intended as a study of the Southern Swedish dialect; rather Southern Swedish is used as a convenient object on which to test various hypotheses about the phrasing function of prosody in spontaneous speech. The study comprises both analyses of production data and perception experiments, and both the phonetics and phonology of prosodic phrasing is dealt with. First, the distribution of prosodic phrase boundaries in spontaneous speech is examined by considering it as a reflection of optimality theoretic constraints that restrain the production and perception of speech. Secondly, the phonetic realization of prosodic phrase boundaries is investigated in a study on articulation rate changes within the prosodic phrase. Evidence of phrase-final lengthening, a reduction of the articulation rate in the final part of the prosodic phrase, is found. The tonal means used to signal coherence within the prosodic phrase is subsequently investigated. An attempt is made to test the two Lund intonation models’ capacities for describing spontaneous speech. The two approaches have different implications for the amount of preplanning needed, which makes them particularly interesting to compare by testing spontaneous data. The results indicate that no or little preplanning is needed to produce tonally coherent phrases. No evidence is found to suggest e.g. that speakers accommodate for the length of the upcoming phrase by starting longer phrases with a higher F0 than short phrases. An explanation is sought for variation in F0 starting points found in the data despite F0’s insensitivity to phrase length. It is concluded that F0 is used to signal coherence even across prosodic phrase boundaries. It is furthermore found that tonal coherence signals are used to override strong boundary signals in spontaneous speech, thereby making initially unplanned additions possible. Finally, the perception of boundary strength is examined in two perception experiments. Listeners are found to agree well in their perceptual judgments of boundary strength, and it is shown that the main correlate to perceived boundary strength in spontaneous speech is pause length. The useful distinction between weak, prosodic phrase boundaries and strong, prosodic utterance boundaries in descriptions of read speech is found to be inappropriate for descriptions of spontaneous speech. It fails to capture the conflicting local and global signals of boundary strength and coherence that arise when strong boundary signals are overriden by coherence signals. The possibility to use conflicting signals in this way is seen as an important asset to the speaker as it makes changes in the speech plan possible, and it is regarded to be a characteristic of prosodic phrasing in spontaneous speech

    English-to-Malay Speaking Dictionary (E2MSpeaktionary)

    Get PDF
    This report is to provide necessary information pertaining to the Final Year Project carried out. In Chapter 1, we discussed about background, problem statement as well as objective and scope of study/work. This tells basically what the project is all about, the target user, and the areas I attended to throughout the project. Chapter 2 is brief information about all information, literatures, theories, books, research results, and journals that I reviewed earlier. In Chapter 3, the methodology used is prototype method and all relevant project works are listed within the chapter. For Chapter 4, I disclosed my discussions and finding from my research in order to execute the project from time to time while in Chapter 5 is my conclusion and recommendation on the project and relevant matters
    corecore