6 research outputs found

    Automatic detection of hyperarticulated speech

    Get PDF
    Hyperarticulation is a speech adaptation that consists of adopting a clearer form of speech in an attempt to improve recognition levels. However, it has the opposite effect when talking to ASR systems, as they are not trained with such kind of speech. We present approaches for automatic detection of hyperarticulation, which can be used to improve the performance of spoken dialog systems. We performed experiments on Let’s Go data, using multiple feature sets and two classification approaches. Many relevant features are speaker dependent. Thus, we used the first turn in each dialog as the reference for the speaker, since it is typically not hyperarticulated. Our best results were above 80 % accuracy, which represents an improvement of at least 11.6 % points over previously obtained results on similar data. We also assessed the classifiers’ performance in scenarios where hyperarticulation is rare, achieving around 98 % accuracy using different confidence thresholds.info:eu-repo/semantics/acceptedVersio

    An investigation of grammar design in natural-language speech-recognition.

    Get PDF
    With the growing interest and demand for human-machine interaction, much work concerning speech-recognition has been carried out over the past three decades. Although a variety of approaches have been proposed to address speech-recognition issues, such as stochastic (statistical) techniques, grammar-based techniques, techniques integrated with linguistic features, and other approaches, recognition accuracy and robustness remain among the major problems that need to be addressed. At the state of the art, most commercial speech products are constructed using grammar-based speech-recognition technology. In this thesis, we investigate a number of features involved in grammar design in natural-language speech-recognition technology. We hypothesize that: with the same domain, a semantic grammar, which directly encodes some semantic constraints into the recognition grammar, achieves better accuracy, but less robustness; a syntactic grammar defines a language with a larger size, thereby it has better robustness, but less accuracy; a word-sequence grammar, which includes neither semantics nor syntax, defines the largest language, therefore, is the most robust, but has very poor recognition accuracy. In this Master\u27s thesis, we claim that proper grammar design can achieve the appropriate compromise between recognition accuracy and robustness. The thesis has been proven by experiments using the IBM Voice-Server SDK, which consists of a VoiceXML browser, IBM ViaVoice Speech Recognition and Text-To-Speech (TTS) engines, sample applications, and other tools for developing and testing VoiceXML applications. The experimental grammars are written in the Java Speech Grammar Format (JSGF), and the testing applications are written in VoiceXML. The tentative experimental results suggest that grammar design is a good area for further study. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2003 .S555. Source: Masters Abstracts International, Volume: 43-01, page: 0244. Adviser: Richard A. Frost. Thesis (M.Sc.)--University of Windsor (Canada), 2004

    An investigation of the electrolytic plasma oxidation process for corrosion protection of pure magnesium and magnesium alloy AM50.

    Get PDF
    In this study, silicate and phosphate EPO coatings were produced on pure magnesium using an AC power source. It was found that the silicate coatings possess good wear resistance, while the phosphate coatings provide better corrosion protection. A Design of Experiment (DOE) technique, the Taguchi method, was used to systematically investigate the effect of the EPO process parameters on the corrosion protection properties of a coated magnesium alloy AM50 using a DC power. The experimental design consisted of four factors (treatment time, current density, and KOH and NaAlO2 concentrations), with three levels of each factor. Potentiodynamic polarization measurements were conducted to determine the corrosion resistance of the coated samples. The optimized processing parameters are 12 minutes, 12 mA/cm2 current density, 0.9 g/l KOH, 15.0 g/l NaAlO2. The results of the percentage contribution of each factor determined by the analysis of variance (ANOVA) imply that the KOH concentration is the most significant factor affecting the corrosion resistance of the coatings, while treatment time is a major factor affecting the thickness of the coatings. (Abstract shortened by UMI.)Dept. of Electrical and Computer Engineering. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .M323. Source: Masters Abstracts International, Volume: 44-03, page: 1479. Thesis (M.A.Sc.)--University of Windsor (Canada), 2005

    Collaborative Human-Machine Communication: User- and situation-oriented design of automotive Speech Dialog Systems

    Get PDF
    Diese Arbeit adressiert die Implementation zwischenmenschlicher Dialogprinzipien im Rahmen der Gestaltung automotiver Sprachdialogsystemen (SDS). Der Transfer der kollaborativen Strategien, insbesondere die kontinuierliche, nutzer- und situationsabhängige Vermittlung von Feedback soll Gegenstand von empirischen Untersuchungen sein. Obwohl in den letzten Jahrzehnten deutliche Verbesserungen der Spracherkennungstechnologie erreicht werden konnten, übernehmen aktuelle SDS die kooperative Verantwortung des Empfängers, dem Sprecher Indizien über die eigenen Verstehensprozesse zu präsentieren und den gemeinsamen Aufwand zu minimieren, nur unzureichend. Die vorliegende Dissertation diskutiert nicht-technische Lösungsansätze, die die Anpassung des Systemverhaltens an bestehende Kommunikationsprozesse vorsehen, um die Koordination der Wissensstände zwischen Mensch und Maschine zu ermöglichen. Drei verschiedene Grounding-Elemente wurden auf die Mensch-Maschine-Interaktion angewendet. Zunächst wurde ein System implementiert, welches visuelle Repräsentationen der Dialoginhalte und -zustände bot. In einer zweiten Umsetzung wurde ein flexibles System Grounding Criterion in Anlehnung an menschliches Rückfrageverhalten umgesetzt, so dass das System nur dann eine Bestätigungsanfrage erbat, wenn es sich unsicher war. Das dritte System adressierte Angleichungsprozesse in dem die Systemausgabe syntaktisch und lexikalisch an die Nutzereingabe angepasst wurde. Um den Einfluss dieser drei Umsetzungen auf Gebrauchstauglichkeitsbeurteilungen zu untersuchen, wurden umfangreiche Nutzerstudien im Fahrsimulator durchgeführt. Die Ergebnisse der empirischen Untersuchungen zeigen, dass die Anpassung von SDS an bestehende Kommunikationsstrategien zu erhöhter Nutzerzufriedenheit führen kann. Die Implementation eines flexiblen Grounding Criterions stellte dabei den erfolgreichsten Transfer von zwischenmenschlichen Dialogstrategien auf den Mensch-Maschine-Dialog dar.This work addresses the evaluation of speech dialog systems (SDS) that make use of collaborative strategies from human dialog by providing continuous and appropriate feedback whilst showing adaptive interaction structures. Users’ experience with today’s spoken dialog systems is characterized by interaction structures which do not meet their expectations. The fact that users feel uncomfortable while interacting with current systems can be explained as failed grounding processes, in which users lack evidence to coordinate their knowledge states with the SDS. This thesis proposes solutions of how to overcome difficulties with in-vehicle speech dialog systems from a non-technical point of view by adapting the system behavior to existing communication strategies. Three different grounding strategies were applied to the human machine dialog. Firstly, a system was implemented that gave visual representation of the dialog content and processes. Secondly, a flexible system grounding criterion was realized, so that the system only asked for confirmation if it was insecure, similar to what humans do. The third implementation was concerned with alignment strategies namely by adapting the system’s output syntactically and lexically towards the users’ input. User studies were conducted to examine the impact of these three implementations on usability ratings. While driving the simulator, subjects were using the different SDS for several tasks concerning the address book. The results of the evaluations show, that adapting the SDS to existing communication strategies can lead to improved user satisfaction despite the persisting shortcomings of state-of-the-art speech technology. The implementation of a flexible grounding criterion, which could enhance the efficiency and effectiveness of the interaction, was thereby the most successful transfer from human communication strategies to human machine dialog

    On the influence of hyperarticulated speech on recognition performance

    No full text
    Since we cannot exclude that speech recognizers fail sometimes, it is important to examine how users react to recognition errors. In correction situations, speaking style becomes more accentuated to disambiguate the original mistake. We examine the effect of speaking style in such situations on speech recognition performance. Our results indicate that hyperarticulated effects occur in correction situations and decrease word accuracy significantly. 1
    corecore