7 research outputs found

    Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model

    Full text link
    Multilingual models for Automatic Speech Recognition (ASR) are attractive as they have been shown to benefit from more training data, and better lend themselves to adaptation to under-resourced languages. However, initialisation from monolingual context-dependent models leads to an explosion of context-dependent states. Connectionist Temporal Classification (CTC) is a potential solution to this as it performs well with monophone labels. We investigate multilingual CTC in the context of adaptation and regularisation techniques that have been shown to be beneficial in more conventional contexts. The multilingual model is trained to model a universal International Phonetic Alphabet (IPA)-based phone set using the CTC loss function. Learning Hidden Unit Contribution (LHUC) is investigated to perform language adaptive training. In addition, dropout during cross-lingual adaptation is also studied and tested in order to mitigate the overfitting problem. Experiments show that the performance of the universal phoneme-based CTC system can be improved by applying LHUC and it is extensible to new phonemes during cross-lingual adaptation. Updating all the parameters shows consistent improvement on limited data. Applying dropout during adaptation can further improve the system and achieve competitive performance with Deep Neural Network / Hidden Markov Model (DNN/HMM) systems on limited data

    Comparative Study on Sentence Boundary Prediction for German and English Broadcast News

    Get PDF
    We present a comparative study on sentence boundary prediction for German and English broadcast news that explores generalization across different languages. In the feature extraction stage, word pause duration is firstly extracted from word aligned speech, and forward and backward language models are utilized to extract textual features. Then a gradient boosted machine is optimized by grid search to map these features to punctuation marks. Experimental results confirm that word pause duration is a simple yet effective feature to predict whether there is a sentence boundary after that word. We found that Bayes risk derived from pause duration distributions of sentence boundary words and non-boundary words is an effective measure to assess the inherent difficulty of sentence boundary prediction. The proposed method achieved F-measures of over 90% on reference text and around 90% on ASR transcript for both German broadcast news corpus and English multi-genre broadcast news corpus. This demonstrates the state of the art performance of the proposed method

    Realisierung nutzeradaptiven Interaktionsverhaltens für mobile Assistenzroboter

    Get PDF
    Im Zentrum dieser Dissertation steht die soziale Assistenzrobotik. In den letzten Jahren hat die Bedeutung dieses Teilgebietes der mobilen Robotik stark zugenommen und zusammen mit der Diversifizierung robotischer Fähigkeiten hat sich die Nutzergruppe hin zur breiten Masse mit potentiellen technischen Laien gewandelt. Aus dieser Situation heraus erwachsen an die Interaktionsfähigkeiten sozialer Assistenzroboter umfangreiche Anforderungen. Insbesondere stehen in dieser Arbeit die Multimodalität der Interaktion und die Anpassungsfähigkeiten an den konkreten Nutzer im Vordergrund. Am Beispiel eines Serviceroboters für die häusliche Gesundheitsassistenz, wie er in einem vom Autor mit bearbeiteten Forschungsprojekt realisiert wurde, wird zunächst der Analyse- und Entwurfsprozess für dessen Umsetzung geschildert. Im Anschluss daran wird gezeigt, wie sich aus der Systemspezifikation eine mehrschichtige Systemarchitektur ableiten lässt, welche auch auf andere Robotikanwendungen übertragbar ist. Der Fokus liegt dabei auf der modularen Realisierung einer Ablauf- und Dialogsteuerung. Um dem System eine Persönlichkeit zu geben und ein im Langzeiteinsatz akzeptierbares Dialogverhalten zu generieren, wurde ein frame-basierter Dialogmanager konzipiert und umgesetzt. Dabei wurden Aspekte wie Modularität durch ein App-Konzept, leichte Erweiterbarkeit und die Möglichkeit, nutzeradaptive Dialoge zu realisieren, berücksichtigt. Im Kern des vorgestellten Dialogsystems kommt eine gänzlich neue Methode der probabilistischen online-Planung von Dialogsequenzen zum Einsatz. Ein eigens konzipiertes Realweltexperiment konnte zeigen, dass es mit diesem System möglich ist, anhand von systeminternen aber auch nutzergetriebenen Bewertungen, das Dialogverhalten im Rahmen von durch den Designer vorgegebenen Freiheiten zur Laufzeit zu optimieren. Die Gestaltung des robotischen Gesundheitsassistenten wurde durch weitere Teilsysteme abgerundet. Unter diesen spielen verschiedene taktile Sensoriken und ein Emotionsmodell eine entscheidende Rolle für die Realisierung eines liebenswerten Begleiters. Letztendlich konnte in sehr erfolgreichen teils mehrtägigen Nutzerstudien mit Senioren die Praktikabilität des entwickelten Interaktionskonzepts und der Systemarchitektur nachgewiesen werden.The central topic of this thesis concerns social service robotics. In recent years this branch of mobile robotics in general has seen increasing interest. Due to increasing capabilities and growing fields of application of such robots, the group of potential users has changed. Unexperienced users raise extensive requirements regarding the interaction capabilities of such robots. The multi-modality of human-robot dialog and its adaptivity regarding user's preferences and needs are in the focus of this thesis. First, the analysis and specification process for such a system is explained by means of an example, which is a service robot for health assistance in home environments, as it has been developed in a research project at which the author participated. Following this, it is shown how a multi-layer system architecture is derived from that specification, which is applicable to other robotic applications as well. Though the main focus is on a modular realization for the control structures and the dialog handling. In order to enable a long term acceptability of such a system and to give it a personality, a frame-based dialog manager has been designed and is explained in detail. Aspects of interest there are modularity by means of an app-concept, extendablility, and adaptivity of the interaction skills regarding users' qirks and demands. In the core of the presented dialog system, there is a unique planning mechanism based on probabilistic reasoning in a factor graph model of the dialog going on. In a real world experiment it could be shown that this online learning concept is able to optimize dialog behavior regarding system internal as well as user driven reward signals. During the implementation of the health assistant robot further system components have been developed in order to realize a likeable companion. Among them, there are two kinds of tactile sensors and an emotion model, which are presented in this thesis as well. Finally, very successful real world user trials of the health assistant robot involving 9 elderly people are described to show that the presented concepts for system architecture and dialog modelling are viable
    corecore