164,130 research outputs found

    Towards Understanding Spontaneous Speech: Word Accuracy vs. Concept Accuracy

    Full text link
    In this paper we describe an approach to automatic evaluation of both the speech recognition and understanding capabilities of a spoken dialogue system for train time table information. We use word accuracy for recognition and concept accuracy for understanding performance judgement. Both measures are calculated by comparing these modules' output with a correct reference answer. We report evaluation results for a spontaneous speech corpus with about 10000 utterances. We observed a nearly linear relationship between word accuracy and concept accuracy.Comment: 4 pages PS, Latex2e source importing 2 eps figures, uses icslp.cls, caption.sty, psfig.sty; to appear in the Proceedings of the Fourth International Conference on Spoken Language Processing (ICSLP 96

    Identification and correction of speech repairs in the context of an automatic speech recognition system

    Get PDF
    Recent advances in automatic speech recognition systems for read (dictated) speech have led researchers to confront the problem of recognising more spontaneous speech. A number of problems, such as disfluencies, appear when read speech is replaced with spontaneous speech. In this work we deal specifically with what we class as speech-repairs. Most disfluency processes deal with speech-repairs at the sentence level. This is too late in the process of speech understanding. Speech recognition systems have problems recognising speech containing speech-repairs. The approach taken in this work is to deal with speech-repairs during the recognition process. Through an analysis of spontaneous speech the grammatical structure of speech- repairs was identified as a possible source of information. It is this grammatical structure, along with some pattern matching to eliminate false positives, that is used in the approach taken in this work. These repair structures are identified within a word lattice and when found result in a SKIP being added to the lattice to allow the reparandum of the repair to be ignored during the hypothesis generation process. Word fragment information is included using a sub-word pattern matching process and cue phrases are also identified within the lattice and used in the repair detection process. These simple, yet effective, techniques have proved very successful in identifying and correcting speech-repairs in a number of evaluations performed on a speech recognition system incorporating the repair procedure. On an un-seen spontaneous lecture taken from the Durham corpus, using a dictionary of 2,275 words and phoneme corruption of 15%, the system achieved a correction recall rate of 72% and a correction precision rate of 75%.The achievements of the project include the automatic detection and correction of speech-repairs, including word fragments and cue phrases, in the sub-section of an automatic speech recognition system processing spontaneous speech

    Raddoppiamento sintattico and glottalization phenomena in Italian

    Get PDF
    This paper is a preliminary phonetic exploration of aspects of the well-known Italian sandhi phenomenon of Raddoppiamento sintattico (henceforth RS), which involves the gemination of word-initial consonants under certain conditions, eg dei [k]ani ‘some dogs’ but tre [kk]ani ‘three dogs’. It is often assumed that RS C-gemination is regular, but there is increasing evidence that it competes with other phenomena such as vowel lengthening. This paper first discusses results of our auditory study of RS contexts, which show that RS is far less frequent in spontaneous speech than is theoretically predicted. This paper then looks specifically at glottal stop insertion and creak in RS contexts, based on the results of an initial small-scale acoustic investigation. The first has controversially been reported as occurring in RS environments where it serves to block RS (Absalom & Hajek, 1997). In addition, glottal stops have also been claimed to provide a coda to short word-final stressed vowels outside of RS environments (Vayra, 1994). We discuss our unexpected finding that glottalization characterizes phrase boundaries in our spontaneous speech data, and the implications that this evidence may have for the phonetic and phonological description of Italian and for our understanding of RS

    Discourse Structure in Spoken Language: Studies on Speech Corpora

    Get PDF
    A better understanding of the intonational characteristics of spoken discourse may lead to new empirical techniques for identifying discourse structure from speech, as well as new algorithms for enhancing the naturalness of synthetic speech. This paper summarizes results of pilot studies that demonstrate reliable correlations of discourse and speech properties, and reports findings on a new corpus of direction-giving monologues, collected in both spontaneous and read speaking styles. Preliminary analyses of the direction-giving corpus show that the availability of speech significantly affects the reliability of discourse segmentation for a set of trained discourse labelers.Engineering and Applied Science

    The interplay of linguistic structure and breathing in German spontaneous speech

    No full text
    International audienceThis paper investigates the relation between the linguistic structure of the breath group and breathing kinematics in spontaneous speech. 26 female speakers of German were recorded by means of an Inductance Plethysmograph. The breath group was defined as the interval of speech produced on a single exhalation. For each group several linguistic parameters (number and type of clauses, number of syllables, hesitations) were measured and the associated inhalation was characterized. The average duration of the breath group was ~3.5 s. Most of the breath groups consisted of 1-3 clauses; ~53% started with a matrix clause; ~24% with an embedded clause and ~23% with an incomplete clause (continuation, repetition, hesitation). The inhalation depth and duration varied as a function of the first clause type and with respect to the breath group length, showing some interplay between speech-planning and breathing control. Vocalized hesitations were speaker-specific and came with deeper inhalation. These results are informative for a better understanding of the interplay of speech-planning and breathing control in spontaneous speech. The findings are also relevant for applications in speech therapies and technologies

    Detection of accents, phrase boundaries, and sentence modality in German

    Get PDF
    In this paper detectors for accents, phrase boundaries, and sentence modality are described which derive prosodic features only from the speech signal and its fundamental frequency to support other modules of a speech understanding system in an early analysis stage, or in cases where no word hypotheses are available. A new method for interpolating and decomposing the fundamental frequency is suggested. The detectors\u27 underlying Gaussian distribution classifiers were trained and tested with approximately 50 minutes of spontaneous speech, yielding recognition rates of 78 percent for accents, 80 percent for phrase boundaries, and 85 percent for sentence modality

    Subphonemic and suballophonic consonant variation : the role of the phoneme inventory

    Get PDF
    Consonants exhibit more variation in their phonetic realization than is typically acknowledged, but that variation is linguistically constrained. Acoustic analysis of both read and spontaneous speech reveals that consonants are not necessarily realized with the manner of articulation they would have in careful citation form. Although the variation is wider than one would imagine, it is limited by the phoneme inventory. The phoneme inventory of the language restricts the range of variation to protect the system of phonemic contrast. That is, consonants may stray phonetically into unfilled areas of the language's sound space. Listeners are seldom consciously aware of the consonant variation, and perceive the consonants phonemically as in their citation forms. A better understanding of surface phonetic consonant variation can help make predictions in theoretical domains and advances in applied domains

    Toward “English” phonetics: variability in the pre-consonantal voicing effect across English dialects and speakers

    Get PDF
    Recent advances in access to spoken-language corpora and development of speech processing tools have made possible the performance of “large-scale” phonetic and sociolinguistic research. This study illustrates the usefulness of such a large-scale approach—using data from multiple corpora across a range of English dialects, collected, and analyzed with the SPADE project—to examine how the pre-consonantal Voicing Effect (longer vowels before voiced thanvoiceless obstruents, in e.g., bead vs. beat) is realized in spontaneous speech, and varies across dialects and individual speakers. Compared with previous reports of controlled laboratory speech, the Voicing Effect was found to be substantially smaller in spontaneous speech, but still influenced by the expected range of phonetic factors. Dialects of English differed substantially from each other in the size of the Voicing Effect, whilst individual speakers varied little relative to their particular dialect. This study demonstrates the value of large-scale phonetic research as a means of developing our understanding of the structure of speech variability, and illustrates how large-scale studies, such as those carried out within SPADE, can be applied to other questions in phonetic and sociolinguistic research
    corecore