65 research outputs found

    Learning Fault-tolerant Speech Parsing with SCREEN

    Get PDF
    This paper describes a new approach and a system SCREEN for fault-tolerant speech parsing. SCREEEN stands for Symbolic Connectionist Robust EnterprisE for Natural language. Speech parsing describes the syntactic and semantic analysis of spontaneous spoken language. The general approach is based on incremental immediate flat analysis, learning of syntactic and semantic speech parsing, parallel integration of current hypotheses, and the consideration of various forms of speech related errors. The goal for this approach is to explore the parallel interactions between various knowledge sources for learning incremental fault-tolerant speech parsing. This approach is examined in a system SCREEN using various hybrid connectionist techniques. Hybrid connectionist techniques are examined because of their promising properties of inherent fault tolerance, learning, gradedness and parallel constraint integration. The input for SCREEN is hypotheses about recognized words of a spoken utterance potentially analyzed by a speech system, the output is hypotheses about the flat syntactic and semantic analysis of the utterance. In this paper we focus on the general approach, the overall architecture, and examples for learning flat syntactic speech parsing. Different from most other speech language architectures SCREEN emphasizes an interactive rather than an autonomous position, learning rather than encoding, flat analysis rather than in-depth analysis, and fault-tolerant processing of phonetic, syntactic and semantic knowledge.Comment: 6 pages, postscript, compressed, uuencoded to appear in Proceedings of AAAI 9

    Speech-Gesture Mapping and Engagement Evaluation in Human Robot Interaction

    Full text link
    A robot needs contextual awareness, effective speech production and complementing non-verbal gestures for successful communication in society. In this paper, we present our end-to-end system that tries to enhance the effectiveness of non-verbal gestures. For achieving this, we identified prominently used gestures in performances by TED speakers and mapped them to their corresponding speech context and modulated speech based upon the attention of the listener. The proposed method utilized Convolutional Pose Machine [4] to detect the human gesture. Dominant gestures of TED speakers were used for learning the gesture-to-speech mapping. The speeches by them were used for training the model. We also evaluated the engagement of the robot with people by conducting a social survey. The effectiveness of the performance was monitored by the robot and it self-improvised its speech pattern on the basis of the attention level of the audience, which was calculated using visual feedback from the camera. The effectiveness of interaction as well as the decisions made during improvisation was further evaluated based on the head-pose detection and interaction survey.Comment: 8 pages, 9 figures, Under review in IRC 201

    Sphinx 4 Speech Recognition in ATC

    Full text link
    Speech Recognition plays a very important role in day to day life. Speech Recognition is widely used and addicted by this world as it allows users to communicate with computers by recognizing their spoken language. Communication with Speech Recognition made our lives easy. There are many types of open source Speech Recognition Engine. Different types of application are built using Speech Recognition Engine. An Application is built for Air Traffic Controller. A new Software library with good grammar is proposed according to Air Traffic Controller commands. This Software library is used to build successful working application

    Subphonetic Modeling for Speech Recognition

    Get PDF
    How to capture important acoustic clues and estimate essential parameters reliably is one of the central issues in speech recognition, since we will never have sufficient training data to model various acoustic-phonetic phenomena. Successful examples include subword models with many smoothing techniques. In comparison with subword models, subphonetic modeling may provide a finer level of details. We propose to model subphonetic events with Markov states and treat the state in phonetic hidden Markov models as our basic subphonetic unit-- senone. A word model is a concatenation of state-dependent senones and senones can be shared across different word models. Senones not only allow parameter sharing, but also enable pronunciation optimization and new word learning, where the phonetic baseform is replaced by the senonic baseform. In this paper, we report preliminary subphonetic modeling results, which not only significantly reduced the word error rate for speaker-independent continuous speech recognition but also demonstrated a novel application for new word learning.

    Comparing SPHINX vs. SONIC Italian Children Speech Recognition Systems

    Get PDF
    Our previous experiences have showed that both CSLR SONIC and CMU SPHINX are two versatile and powerful tools for Automatic Speech Recognition (ASR). Encouraged by the good results we had, these two systems have been compared in another important challenge of ASR: the recognition of children\u27s speech. In this work, SPHINX has been used to build from scratch a recognizer for Italian children\u27s speech and the results have been compared to those obtained with SONIC, both in previous and in some new experiments, which were designed in order to have uniform experimental conditions between the two different systems. This report describes the training process and the evaluation methodology regarding a speaker-independent phonetic-recognition task. First, we briefly describe the system architectures and their differences, and then we analyze the task, the corpus and the techniques adopted to face the recognition problem. The scores of multiple tests in terms of Phonetic Error Rate (PER) and an analysis on differences of the two systems are shown in the final discussion. SONIC has turned out to have the best overall performance and it obtained a minimum PER of 12.4% with VTLN and SMAPLR adaptation. SPHINX was the easiest system to train and test and its performance (PER of 17.2% with comparable adaptations) was only some percentage points far from those in SONIC
    • …
    corecore