114 research outputs found

    Fully exploiting the potential of speech dialog in automotive applications

    Get PDF
    International audienceToday users are faced with infotainment devices and applications of increasing complexity. The design of easy-to-use and intuitive interfaces becomes a more and more challenging task. Users are usually not aware of the underlying applications and their restrictions when they want to use certain functionalities. Therefore, hierarchical menu structures are difficult to handle especially in situations where eyes and hands are occupied with other tasks, such as driving. For quite a while speech-enabled interfaces have been used to solve this problem since they allow users to control various applications without occupying hands and eyes. However, state-of-the-art multimodal applications often do not exploit the full potential that speech dialog offers simply because this modality is not well integrated with the "traditional" modalities such as graphics and haptics. The resulting speech interfaces do not run smoothly, exhibit plenty of inconsistencies concerning the GUI and are thus more or less tedious to use. Such kind of interfaces result in low acceptance because users do not see the immediate benefit. In this paper we present an approach that develops multimodal interfaces in an integrated way, thus ensuring highly consistent interfaces that closely couple the involved modalities and are thus easier to use

    1 The Prosodic Marking of Phrase Boundaries: Expectations and Results

    Get PDF
    ABSTRACT Using sentence templates and a stochastic context-free grammar a large corpus (10,000 sentences) has been created, where prosodic phrase boundaries are labeled in the sentences automatically during sentence generation. With perception experiments on a subset of 500 utterances we verified that 92 % of the automatically marked boundaries were perceived as prosodically marked. In initial automatic classification experiments for three levels of boundaries recognition rates up to 81 % could be achieved. 1.1 Introduction and Material A successful automatic detection of phrase boundaries can be of great help for parsing a word hypotheses graph in an automatic speech understanding (ASU) system. Our recognition paradigm lies within the statistical approach; we therefore need a large training database, i.e. a corpus with reference labels for prosodically marked phrase boundaries. In this paper we wil

    Pitch determination considering laryngealization effects in spoken dialogs

    Get PDF
    A frequent phenomenon in spoken dialogs of the information seeking type are short elliptic utterances whose mood (declarative or interrogative) can only be distinguished by intonation. The main acoustic evidence is conveyed by the fundamental frequency or Fo-contour. Many algorithms for Fo determination have been reported in the literature. A common problem are irregularities of speech known as "laryngealizations". This article describes an approach based on neural network techniques for the improved determination of fundamental frequency. First, an improved version of our neural network algorithm for reconstruction of the voice source signal (glottis signal) is presented. Second, the reconstructed voice source signal is used as input to another neural network distinguishing the three classes "voiceless", "voiced non-laryngealized", and "voiced laryngealized". Third, the results are used to improve an existing Fo algorithm. Results of this approach are presented and discussed in the context of the application in a spoken dialog system

    Phonetic and prosodic analysis of speech

    Get PDF
    In order to cope with the problems of spontaneous speech (including, for example, hesitations and non-words) it is necessary to extract from the speech signal all information it contains. Modeling of words by segmental units should be supported by suprasegmental units since valuable information is represented in the prosody of an utterance. We present an approach to flexible and efficient modeling of speech by segmental units and describe extraction and use of suprasegmental information

    An integrated model of acoustics and language using semantic classification trees

    Get PDF
    We propose Multi-level Semantic Classication Trees to combine different information sources for predicting speech events (e.g. word chains, phrases, etc.) Traditionally in speech recognition systems these information sources (acoustic evidence, language model) are calculated independently and combined via Bayes rule. The proposed approach allows one to combine sources of different types - is no longer necessary for each source to yield a probability. Moreover the tree can look at several information sources simultaneously. The approach is demonstrated for the prediction of prosodically marked phrase boundaries, combining information about the spoken word chain, word category information, prosodic parameters, and the result of a neural network predicting the boundary on the basis of acoustic-prosodic features. The recognition rates of up to 90% for the two class problem boundary vs. no boundary are already comparable to results achieved with the above mentioned Bayes rule approach that combines the acoustic classifier with a 5-gram categorical language model. This is remarkable, since so far only a small set of questions combining information from different sources have been implemented

    Going back to the source : inverse filtering of the speech signal with ANNs

    Get PDF
    In this paper we present a new method transforming speech signals to voice source signals (VSS) using artificial neural networks (ANN). We will point out that the ANN mapping of speech signals into source signals is quite accurate, and most of the irregularities in the speech signal will lead to an irregularity in the source signal, produced by the ANN (ANN-VSS). We will show that the mapping of the ANN is robust with respect to untrained speakers, different recording conditions and facilities, and different vocabularies. We will also present preliminary results which show that from the ANN source signal pitch periods can be determined accurately

    "Roger", "Sorry", "I'm still listening" : dialog guiding signals in information retrieval dialogs

    Get PDF
    During any kind of information retrieval dialog, the repetition of parts of information just given by the dialog partner can often be observed. As these repetitions are usually elliptic, the intonation is very important for determining the speakers intention. In this paper prototypically the times of day repeated by the customer in train table inquiry dialogs are investigated. A scheme is developed for the officers reactions depending on the intonation of these repetitions; it has been integrated into our speech understanding and dialog system EVAR (cf. [6]). Gaussian classifiers were trained for distinguishing the dialog guiding signals confirmation, question and feedback; recognition rates of up to 87.5% were obtained

    An integrated model of acoustics and language using semantic classification trees

    Get PDF
    We propose Multi-level Semantic Classication Trees to combine different information sources for predicting speech events (e.g. word chains, phrases, etc.) Traditionally in speech recognition systems these information sources (acoustic evidence, language model) are calculated independently and combined via Bayes rule. The proposed approach allows one to combine sources of different types - is no longer necessary for each source to yield a probability. Moreover the tree can look at several information sources simultaneously. The approach is demonstrated for the prediction of prosodically marked phrase boundaries, combining information about the spoken word chain, word category information, prosodic parameters, and the result of a neural network predicting the boundary on the basis of acoustic-prosodic features. The recognition rates of up to 90% for the two class problem boundary vs. no boundary are already comparable to results achieved with the above mentioned Bayes rule approach that combines the acoustic classifier with a 5-gram categorical language model. This is remarkable, since so far only a small set of questions combining information from different sources have been implemented

    A Millimeter Continuum Survey for Massive Protoclusters in the Outer Galaxy

    Full text link
    Our search for the earliest stages of massive star formation turned up twelve massive pre-protocluster candidates plus a few protoclusters. For this search, we selected 47 FIR-bright IRAS sources in the outer Galaxy. We mapped regions of several square arcminutes around the IRAS source in the millimeter continuum in order to find massive cold cloud cores possibly being in a very early stage of massive star formation. Masses and densities are derived for the 128 molecular cloud cores found in the obtained maps. We present these maps together with near-infrared, mid-infrared, and radio data collected from the 2MASS, MSX, and NVSS catalogs. Further data from the literature on detections of high-density tracers, outflows, and masers are added. The multi-wavelength datasets are used to characterize each observed region. The massive cloud cores (M>100 M_sun) are placed in a tentative evolutionary sequence depending on their emission at the investigated wavelengths. Candidates for the youngest stages of massive star formation are identified by the lack of detections in the above-mentioned near-infrared, mid-infrared, and radio surveys. Twelve massive cores prominent in the millimeter continuum fulfill this requirement. Since neither FIR nor radio emission have been detected from these cloud cores massive protostars must be very deeply embedded in these cores. Some of these objects may actually Pre-Proto-cluster cores: an up to now rare object class, where the initial conditions of massive star formation can be studied.Comment: 74 pages, 46 figures, to appear in ApJS December 2005, v161
    corecore