7 research outputs found
Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model
Multilingual models for Automatic Speech Recognition (ASR) are attractive as
they have been shown to benefit from more training data, and better lend
themselves to adaptation to under-resourced languages. However, initialisation
from monolingual context-dependent models leads to an explosion of
context-dependent states. Connectionist Temporal Classification (CTC) is a
potential solution to this as it performs well with monophone labels.
We investigate multilingual CTC in the context of adaptation and
regularisation techniques that have been shown to be beneficial in more
conventional contexts. The multilingual model is trained to model a universal
International Phonetic Alphabet (IPA)-based phone set using the CTC loss
function. Learning Hidden Unit Contribution (LHUC) is investigated to perform
language adaptive training. In addition, dropout during cross-lingual
adaptation is also studied and tested in order to mitigate the overfitting
problem.
Experiments show that the performance of the universal phoneme-based CTC
system can be improved by applying LHUC and it is extensible to new phonemes
during cross-lingual adaptation. Updating all the parameters shows consistent
improvement on limited data. Applying dropout during adaptation can further
improve the system and achieve competitive performance with Deep Neural Network
/ Hidden Markov Model (DNN/HMM) systems on limited data
Comparative Study on Sentence Boundary Prediction for German and English Broadcast News
We present a comparative study on sentence boundary prediction for German and English broadcast news that explores generalization across different languages. In the feature extraction stage, word pause duration is firstly extracted from word aligned speech, and forward and backward language models are utilized to extract textual features. Then a gradient boosted machine is optimized by grid search to map these features to punctuation marks. Experimental results confirm that word pause duration is a simple yet effective feature to predict whether there is a sentence boundary after that word. We found that Bayes risk derived from pause duration distributions of sentence boundary words and non-boundary words is an effective measure to assess the inherent difficulty of sentence boundary prediction. The proposed method achieved F-measures of over 90% on reference text and around 90% on ASR transcript for both German broadcast news corpus and English multi-genre broadcast news corpus. This demonstrates the state of the art performance of the proposed method
Realisierung nutzeradaptiven Interaktionsverhaltens für mobile Assistenzroboter
Im Zentrum dieser Dissertation steht die soziale Assistenzrobotik. In den letzten Jahren hat die Bedeutung dieses Teilgebietes der mobilen Robotik stark zugenommen und zusammen mit der Diversifizierung robotischer Fähigkeiten hat sich die Nutzergruppe hin zur breiten Masse mit potentiellen technischen Laien gewandelt. Aus dieser Situation heraus erwachsen an die Interaktionsfähigkeiten sozialer Assistenzroboter umfangreiche Anforderungen. Insbesondere stehen in dieser Arbeit die Multimodalität der Interaktion und die Anpassungsfähigkeiten an den konkreten Nutzer im Vordergrund.
Am Beispiel eines Serviceroboters für die häusliche Gesundheitsassistenz, wie er in einem vom Autor mit bearbeiteten Forschungsprojekt realisiert wurde, wird zunächst der Analyse- und Entwurfsprozess für dessen Umsetzung geschildert. Im Anschluss daran wird gezeigt, wie sich aus der Systemspezifikation eine mehrschichtige Systemarchitektur ableiten lässt, welche auch auf andere Robotikanwendungen übertragbar ist.
Der Fokus liegt dabei auf der modularen Realisierung einer Ablauf- und Dialogsteuerung. Um dem System eine Persönlichkeit zu geben und ein im Langzeiteinsatz akzeptierbares Dialogverhalten zu generieren, wurde ein frame-basierter Dialogmanager konzipiert und umgesetzt. Dabei wurden Aspekte wie Modularität durch ein App-Konzept, leichte Erweiterbarkeit und die Möglichkeit, nutzeradaptive Dialoge zu realisieren, berücksichtigt.
Im Kern des vorgestellten Dialogsystems kommt eine gänzlich neue Methode der probabilistischen online-Planung von Dialogsequenzen zum Einsatz. Ein eigens konzipiertes Realweltexperiment konnte zeigen, dass es mit diesem System möglich ist, anhand von systeminternen aber auch nutzergetriebenen Bewertungen, das Dialogverhalten im Rahmen von durch den Designer vorgegebenen Freiheiten zur Laufzeit zu optimieren.
Die Gestaltung des robotischen Gesundheitsassistenten wurde durch weitere Teilsysteme abgerundet. Unter diesen spielen verschiedene taktile Sensoriken und ein Emotionsmodell eine entscheidende Rolle für die Realisierung eines liebenswerten Begleiters.
Letztendlich konnte in sehr erfolgreichen teils mehrtägigen Nutzerstudien mit Senioren die Praktikabilität des entwickelten Interaktionskonzepts und der Systemarchitektur nachgewiesen werden.The central topic of this thesis concerns social service robotics. In recent years this branch of mobile robotics in general has seen increasing interest. Due to increasing capabilities and growing fields of application of such robots, the group of potential users has changed. Unexperienced users raise extensive requirements regarding the interaction capabilities of such robots. The multi-modality of human-robot dialog and its adaptivity regarding user's preferences and needs are in the focus of this thesis.
First, the analysis and specification process for such a system is explained by means of an example, which is a service robot for health assistance in home environments, as it has been developed in a research project at which the author participated. Following this, it is shown how a multi-layer system architecture is derived from that specification, which is applicable to other robotic applications as well.
Though the main focus is on a modular realization for the control structures and the dialog handling. In order to enable a long term acceptability of such a system and to give it a personality, a frame-based dialog manager has been designed and is explained in detail. Aspects of interest there are modularity by means of an app-concept, extendablility, and adaptivity of the interaction skills regarding users' qirks and demands.
In the core of the presented dialog system, there is a unique planning mechanism based on probabilistic reasoning in a factor graph model of the dialog going on. In a real world experiment it could be shown that this online learning concept is able to optimize dialog behavior regarding system internal as well as user driven reward signals.
During the implementation of the health assistant robot further system components have been developed in order to realize a likeable companion. Among them, there are two kinds of tactile sensors and an emotion model, which are presented in this thesis as well.
Finally, very successful real world user trials of the health assistant robot involving 9 elderly people are described to show that the presented concepts for system architecture and dialog modelling are viable