19,174 research outputs found

    The Microsoft 2017 Conversational Speech Recognition System

    Full text link
    We describe the 2017 version of Microsoft's conversational speech recognition system, in which we update our 2016 system with recent developments in neural-network-based acoustic and language modeling to further advance the state of the art on the Switchboard speech recognition task. The system adds a CNN-BLSTM acoustic model to the set of model architectures we combined previously, and includes character-based and dialog session aware LSTM language models in rescoring. For system combination we adopt a two-stage approach, whereby subsets of acoustic models are first combined at the senone/frame level, followed by a word-level voting via confusion networks. We also added a confusion network rescoring step after system combination. The resulting system yields a 5.1\% word error rate on the 2000 Switchboard evaluation set

    Fostering reflection in the training of speech-receptive action

    Get PDF
    Dieser Aufsatz erörtert Möglichkeiten und Probleme der Förderung kommunikativer Fertigkeiten durch die UnterstĂŒtzung der Reflexion eigenen sprachrezeptiven Handelns und des Einsatzes von computerunterstĂŒtzten Lernumgebungen fĂŒr dessen Förderung. Kommunikationstrainings widmen sich meistens der Förderung des beobachtbaren sprachproduktiven Handelns (Sprechen). Die individuellen kognitiven Prozesse, die dem sprachrezeptiven Handeln (Hören und Verstehen) zugrunde liegen, werden hĂ€ufig vernachlĂ€ssigt. Dies wird dadurch begrĂŒndet, dass sprachrezeptives Handeln in einer kommunikativen Situation nur schwer zugĂ€nglich und die Förderung der individuellen Prozesse sprachrezeptiven Handelns sehr zeitaufwĂ€ndig ist. Das zentrale Lernprinzip - die Reflexion des eigenen sprachlich-kommunikativen Handelns - wird aus verschiedenen Perspektiven diskutiert. Vor dem Hintergrund der Reflexionsmodelle wird die computerunterstĂŒtzte Lernumgebung CaiMan© vorgestellt und beschrieben. Daran anschließend werden sieben Erfolgsfaktoren aus der empirischen Forschung zur Lernumgebung CaiMan© abgeleitet. Der Artikel endet mit der Vorstellung von zwei empirischen Studien, die Möglichkeiten der ReflexionsunterstĂŒtzung untersucheThis article discusses the training of communicative skills by fostering the reflection of speech-receptive action and the opportunities for using software for this purpose. Most frameworks for the training of communicative behavior focus on fostering the observable speech-productive action (i.e. speaking); the individual cognitive processes underlying speech-receptive action (hearing and understanding utterances) are often neglected. Computer-supported learning environments employed as cognitive tools can help to foster speech-receptive action. Seven success factors for the integration of software into the training of soft skills have been derived from empirical research. The computer-supported learning environment CaiMan© based on these ideas is presented. One central learning principle in this learning environment reflection of one's own action will be discussed from different perspectives. The article concludes with two empirical studies examining opportunities to foster reflecti

    Robots that Say ‘No’. Affective Symbol Grounding and the Case of Intent Interpretations

    Get PDF
    © 2017 IEEE. This article has been accepted for publication in a forthcoming issue of IEEE Transactions on Cognitive and Developmental Systems. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Modern theories on early child language acquisition tend to focus on referential words, mostly nouns, labeling concrete objects, or physical properties. In this experimental proof-of-concept study, we show how nonreferential negation words, typically belonging to a child's first ten words, may be acquired. A child-like humanoid robot is deployed in speech-wise unconstrained interaction with naïve human participants. In agreement with psycholinguistic observations, we corroborate the hypothesis that affect plays a pivotal role in the socially distributed acquisition process where the adept conversation partner provides linguistic interpretations of the affective displays of the less adept speaker. Negation words are prosodically salient within intent interpretations that are triggered by the learner's display of affect. From there they can be picked up and used by the budding language learner which may involve the grounding of these words in the very affective states that triggered them in the first place. The pragmatic analysis of the robot's linguistic performance indicates that the correct timing of negative utterances is essential for the listener to infer the meaning of otherwise ambiguous negative utterances. In order to assess the robot's performance thoroughly comparative data from psycholinguistic studies of parent-child dyads is needed highlighting the need for further interdisciplinary work.Peer reviewe
    • 

    corecore