Search CORE

3,967 research outputs found

Prediction, detection, and correction of misunderstandings in interactive tasks

Author: Villalba Martin Federico
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2019
Field of study

Technology has allowed all kinds of devices and software to come into our lives. Advances in GPS, Virtual Reality, and wearable computers with increased computing power and Internet connectivity open the doors for interactive systems that were considered science fiction less than a decade ago, and are capable of guiding us in a variety of environments. This increased accessibility comes at the cost of increasing both the scale of problems that can be realistically tackled and the capabilities that we expect from such systems. Indoor navigation is an example of such a task: although guiding a car is a solved problem, guiding humans for instance inside a museum is much more challenging. Unlike cars, pedestrians use landmarks rather than absolute distances. They must discriminate from a larger number of distractors, and expect sentences of higher complexity than those appropriate for a car driver. A car driver prefers short, simple instructions that do not distract them from traffic. A tourist inside a museum on the contrary can afford the mental effort that a detailed grounding process would require. Both car and indoor navigation are specific examples of a wider family of collaborative tasks known as “Instruction Following”. In these tasks, agents with the two clearly defined roles of Instruction Giver and Instruction Follower must cooperate to achieve a joint objective. The former has access to all required information about the environment, including (but not limited to) a detailed map of the environment, a clear list of objectives, and a profound understanding of the effect that specific actions have in the environment. The latter is tasked with following the instructions, interacting with the environment and moving the undertaking forward. It is then the Instruction Giver’s responsibility to assess a detailed plan of action, segment it into smaller subgoals, and present instructions to the Instruction Follower in a language that is clear and understandable. No matter how carefully crafted the Instruction Giver’s utterances are, it is expected that misunderstandings will take place. Although some of these misunderstandings are easy to detect and repair, others can be very difficult or even impossible to solve. It is therefore important for the Instruction Giver to generate instructions that are as clear as possible, to detect misunderstandings as early as possible, and to correct them in the most effective way. This thesis introduces several algorithms and strategies designed to tackle the aforementioned problems from end to end, presenting the individual aspects of a system that successfully predicts, detects, and corrects misunderstandings in interactive Instruction Following tasks. We focus on one particular type of instruction: those involving Referring Expressions. A Referring Expression identifies a single object out of many, such as “the red button” or “the tall plant”. Generating Referring Expressions is a key component of Inst. Following tasks, since any kind of object manipulation is likely to require a description of the object. Due to its importance and complexity, this is one of the most widely studied areas of Natural Language Generation. In this thesis we use Semantically Interpreted Grammars, an approach that integrates both Referring Expression Generation (identifying which properties are required for a unique description) and Surface realization (combining those properties into a concrete Noun Phrase). The complexity of performing, recording, and analyzing Instruction Following tasks in the real world is one of the major challenges of Instruction Following research. In order to simplify both the development of new algorithms and the access to those results by the research community, our work is evaluated in what we call a Virtual Environment—an environment that mimics the main aspects of the real world and abstracts distractions while preserving enough characteristics of the real world to be useful for research. Selecting the appropriate virtual environment for a research task ensures that results will be applicable in the real world. We have selected the Virtual Environment of the GIVE Challenge, an environment designed for an Instruction Following task in which a human Instruction Follower is paired with an automated Instruction Giver in a maze-like 3D world. Completing the task requires navigating the space, avoiding alarms, interacting with objects, generating instructions in Natural Language, and preventing mistakes that can bring the task to a premature end. Even under these simplified conditions, the task presents several computational challenges: performing these tasks in real time require fast algorithms, and ensuring the efficiency of our approaches remains a priority at every step. Our first experimental study identifies the most challenging type of mistakes that our system is expected to find. Creating an Inst. Following system that leverages previously-recorded human data and follows instructions using a simple greedy algorithm, we clearly separate those situations for which no further study is warranted from those that are of interest for our research. We test our algorithm with similarity metrics of varying complexity, ranging from overlap measures such as Jaccard and edit distances to advanced machine learning algorithms such as Support Vector Machines. The best performing algorithms achieve not only good accuracy, but we show in fact that mistakes are highly correlated with situations that are also challenging for human annotators. Going a step further, we also study the type of improvement that can be expected from our system if we give it the chance of retrying after a mistake was made. This system has no prior beliefs on which actions are more likely to be selected next, and our results make a good case for this vision to be one of its weakest points. Moving away from a paradigm where all actions are considered equally likely, and moving towards a model in which the Inst. Follower’s own action is taken into account, our subsequent step is the development of a system that explicitly models listener’s understanding. Given an instruction containing a Referring Expression, we approach the Instruction Follower’s understanding of it with a combination of two probabilistic models. The Semantic model uses features of the Referring Expression to identify which object is more likely to be selected: if the instruction mentions a red button, it is unlikely that the Inst. Follower will select a blue one. The Observational model, on the other hand, predicts which object will be selected by the Inst. Follower based on their behavior: if the user is walking straight towards a specific object, it is very likely that this object will be selected. These two log-linear, probabilistic models were trained with recorded human data from the GIVE Challenge, resulting in a model that can effectively predict that a misunderstanding is about to take place several seconds before it actually happens. Using our Combined model, we can easily detect and predict misunderstandings — if the Inst. Giver tells the Inst. Follower to “click the red button”, and the Combined model detects that the Inst. Follower will select a blue one, we know that a misunderstanding took place, we know what the misunderstood object is, and we know both facts early enough to generate a correction that will stop the Inst. Follower from making the mistake in the first place. A follow-up study extends the Observational model introducing features based on the gaze of the Inst. Follower. Gaze has been shown to correlate with human attention, and our study explores whether gaze-based features can improve the accuracy of the Observational model. Using previouslycollected data from the GIVE Environment in which gaze was recorded using eye-tracking equipment, the resulting Extended Observational model improves the accuracy of predictions in challenging scenes where the number of distractors is high. Having a reliable method for the detection of misunderstandings, we turn our attention towards corrections. A corrective Referring Expression is one designed not only for the identification of a single object out of many, but rather, for identifying a previously-wrongly-identified object. The simplest possible corrective Referring Expression is repetition: if the user misunderstood the expression “the red button” the first time, it is possible that they will understand it correctly the second time. A smarter approach, however, is to reformulate the Referring Expression in a way that makes it easier for the Inst. Follower to understand. We designed and evaluated two different strategies for the generation of corrective feedback. The first of these strategies exploits the pragmatics concept of a Context Set, according to which human attention can be segmented into objects that are being attended to (that is, those inside the Context Set) and those that are ignored. According to our theory, we could virtually ignore all objects outside the Context Set and generate Referring Expressions that would not be uniquely identifying with respect to the entire context, but would still be identifying enough for the Inst. Follower. As an example, if the user is undecided between a red button and a blue one, we could generate the Referring Expression “the red one” even if there are other red buttons on the scene that the user is not paying attention to. Using our probabilistic models as a measure for which elements to include in the Context Set, we modified our Referring Expression Generation algorithm to build sentences that explicitly account for this behavior. We performed experiments over the GIVE Challenge Virtual Environment, crowdsourcing the data collection process, with mixed results: even if our definition of a Context Set were correct (a point that our results can neither confirm nor deny), our strategy generates Referring Expressions that prevents some mistakes, but are in general harder to understand than the baseline approach. The results are presented along with an extensive error analysis of the algorithm. They imply that corrections can cause the Instruction Follower to re-evaluate the entire situation in a new light, making our previous definition of Context Set impractical. Our approach also fails at identifying previously grounded referents, compounding the number of pragmatic effects that conspire against this approach. The second strategy for corrective feedback consists on adding Contrastive focus to a second, corrective Referring Expression In a scenario in which the user receives the Referring Expression “the red button” and yet mistakenly selects a blue one, an approach with contrastive focus would generate “no, the RED button” as a correction. Such a Referring Expression makes it clear to the Inst. Follower that on the one hand their selection of an object of type “button” was correct, and that on the other hand it is the property “color” that needs re-evaluation. In our approach, we model a misunderstanding as a noisy channel corruption: the Inst. Giver generates a correct Referring Expression for a given object, but it is corrupted in transit and reaches the Inst. Follower in the form of an altered, incorrect Referring Expression We correct this misconstrual by generating a new, corrective Referring Expression: starting from the original Referring Expression and the misunderstood object, we identify the constituents of the Referring Expression that were corrupted and place contrastive focus on them. Our hypothesis states that the minimum edit sequence between the original and misunderstood Referring Expression correctly identifies the constituents requiring contrastive focus, a claim that we verify experimentally. We perform crowdsourced preference tests over several variations of this idea, evaluating Referring Expressions that either present contrast side by side (as in “no, not the BLUE button, the RED button”) or attempt to remove redundant information (as in “no, the RED one”). We evaluate our approaches using both simple scenes from the GIVE Challenge and more complicated ones showing pictures from the more challenging TUNA people corpus. Our results show that human users significantly prefer our most straightforward contrastive algorithm. In addition to detailing models and strategies for misunderstanding detection and correction, this thesis also includes practical considerations that must be taken into account when dealing with similar tasks to those discussed here. We pay special attention to Crowdsourcing, a practice in which data about tasks can be collected from participants all over the world at a lower cost than traditional alternatives. Researchers interested in using crowdsourced data must often deal both with unmotivated players and with players whose main motivation is to complete as many tasks as possible in the least amount of time. Designing a crowdsourced experiment requires a multifaceted approach: the task must be designed in such a way as to motivate honest players, discourage other players from cheating, implementing technical measures to detect bad data, and prevent undesired behavior looking at the entire pipeline with a Security mindset. We dedicate a Chapter to this issue, presenting a full example that will undoubtedly be of help for future research. We also include sections dedicated to the theory behind our implementations. Background literature includes the pragmatics of dialogue, misunderstandings, and focus, the link between gaze and visual attention, the evolution of approaches towards Referring Expression Generation, and reports on the motivations of crowdsourced workers that borrow from fields such as psychology and economics. This background contextualizes our methods and results with respect to wider fields of study, enabling us to explain not only that our methods work but also why they work. We finish our work with a brief overview of future areas of study. Research on the prediction, detection, and correction of misunderstandings for a multitude of environments is already underway. With the introduction of more advanced virtual environments, modern spoken, dialoguebased tools revolutionizing the market of home devices, and computing power and data being easily available, we expect that the results presented here will prove useful for researchers in several areas of Natural Language Processing for many years to come.Die Technologie hat alle möglichen Arten von unterstützenden Geräten und Softwares in unsere Leben geführt. Fortschritte in GPS, Virtueller Realität, und tragbaren Computern mit wachsender Rechenkraft und Internetverbindung öffnen die Türen für interaktive Systeme, die vor weniger als einem Jahrzehnt als Science Fiction galten, und die in der Lage sind, uns in einer Vielfalt von Umgebungen anzuleiten. Diese gesteigerte Zugänglichkeit kommt zulasten sowohl des Umfangs der Probleme, die realistisch gelöst werden können, als auch der Leistungsfähigkeit, die wir von solchen Systemen erwarten. Innennavigation ist ein Beispiel einer solcher Aufgaben: obwohl Autonavigation ein gelöstes Problem ist, ist das Anleiten von Meschen zum Beispiel in einem Museum eine größere Herausforderung. Anders als Autos, nutzen Fußgänger eher Orientierungspunkte als absolute Distanzen. Sie müssen von einer größeren Anzahl von Ablenkungen unterscheiden können und Sätze höherer Komplexität erwarten, als die, die für Autofahrer angebracht sind. Ein Autofahrer bevorzugt kurze, einfache Instruktionen, die ihn nicht vom Verkehr ablenken. Ein Tourist in einem Museum dagegen kann die metale Leistung erbringen, die ein detaillierter Fundierungsprozess benötigt. Sowohl Auto- als auch Innennavigation sind spezifische Beispiele einer größeren Familie von kollaborativen Aufgaben bekannt als Instruction Following. In diesen Aufgaben müssen die zwei klar definierten Akteure des Instruction Givers und des Instruction Followers zusammen arbeiten, um ein gemeinsames Ziel zu erreichen. Der erstere hat Zugang zu allen benötigten Informationen über die Umgebung, inklusive (aber nicht begrenzt auf) einer detallierten Karte der Umgebung, einer klaren Liste von Zielen und einem genauen Verständnis von Effekten, die spezifische Handlungen in dieser Umgebung haben. Der letztere ist beauftragt, den Instruktionen zu folgen, mit der Umgebung zu interagieren und die Aufgabe voranzubringen. Es ist dann die Verantwortung des Instruction Giver, einen detaillierten Handlungsplan auszuarbeiten, ihn in kleinere Unterziele zu unterteilen und die Instruktionen dem Instruction Follower in einer klaren, verständlichen Sprache darzulegen. Egal wie sorgfältig die Äußerungen des Instruction Givers erarbeitet sind, ist es zu erwarten, dass Missverständnisse stattfinden. Obwohl einige dieser Missverständnisse einfach festzustellen und zu beheben sind, können anderen sehr schwierig oder gar unmöglich zu lösen sein. Daher ist es wichtig, dass der Instruction Giver die Anweisungen so klar wie möglich formuliert, um Missverständnisse so früh wie möglich aufzudecken, und sie in der effektivstenWeise zu berichtigen. Diese Thesis führt mehrere Algorithmen und Strategien ein, die dazu entworfen wurden, die oben genannten Probleme in einem End-to-End Prozess zu lösen. Dabei werden die individuellen Aspekte eines Systems präsentiert, dass erfolgreich Missverständnisse in interaktiven Instruction Following Aufgaben vorhersagen, feststellen und korrigieren kann.Wir richten unsere Aufmerksamkeit auf eine bestimmte Art von Instruktion: die sogennanten Referring Expressions. Eine Referring Expression idenfiziert ein einzelnes Objekt aus vielen, wie zum Beispiel „der rote Knopf” oder „die große Pflanze”. Das Generieren von Referring Expressions ist eine Schlüsselkomponente von Instruction Following Aufgaben, da jegliche Art von Manipulation sehr wahrscheinlich eine Beschreibung des Objektes erfordert. Wegen derWichtigkeit und Komplexität ist dies eine der am meisten untersuchten Gebiete der Textgenerierung. In dieser Thesis verwenden wir Semantisch Interpretierte Grammatik, eine Methode, die sowohl die Generierung von Referring Expressions (Identifizierung von Eigenschaften für eine eindeutige Beschreibung) als auch Surface Realization (Kombinieren dieser Eigenschaften in eine konkrete Substantivgruppe) integriert. Die Komplexität der Durchführung, Aufzeichnung und Analyse von Instruction Following Aufgaben in der realen Welt ist eine der großen Herausforderungen der Instruction Following Forschung. Um sowohl die Entwicklung neuer Algorithmen und den Zugang zu diesen Ergebnissen durch die Wissenschaftsgemeinde zu vereinfachen, wird unsere Arbeit in einer Virtuellen Umgebung bewertet. Eine virtuelle Umgebung ahmt die Hauptaspekte der realen Welt nach und nimmt Ablenkungen weg, während genug Eigenschaften der realen Welt erhalten bleiben, um verwendbar für die Untersuchung zu sein. Die Auswahl der angebrachten virtuellen Umgebung für eine Forschungsaufgabe gewährleistet, dass die Ergebnisse auch in der realenWelt anwendbar sind. Wir haben eine virtuelle Umgebung der GIVE Challenge ausgesucht â˘A ¸S eine Umgebung, die für eine Instruction Following Aufgabe entworfen wurde, in der ein menschlicher Instruction Follower mit einem automatischen Instruction Giver in einer Labyrinth-artigen 3D Welt verbunden wird. Die Aufgabe zu beenden erfordert Navigation im Raum, Vermeidung von Alarmen, Interagieren mit Objekten, Textgenerierung und Verhindern von Fehlern, die zu einer vorzeitigen Beendung der Aufgabe führen. Sogar unter diesen vereinfachten Bedingungen stellt die Aufgabe mehrere rechentechnische Herausforderungen dar: die Aufgabe in Echtzeit durchzuführen erfordert schnelle Algorithmen, und die Effizienz unserer Methode zu gewährleisten bleibt Priorotät in jedem Schritt. Unser erstes Experiment identifiziert die herausfordernste Art von Fehlern, die unser System erwartungsgemäß finden soll. Durch den Entwurf eines Instruction Following Systems, das sich zuvor aufgezeichnete menschliche Daten zu Nutze macht und durch die Nutzung eines einfachen gierigen Algorithmus Intruktionen folgt, grenzen wir klar die Situationen ab, die keine weitere Studie rechtfertigen, von denen, die interessant für unsere Forschung sind. Wir testen unseren Algorithmus mit Ähnlichkeitsmaßen verschiedener Komplexität, die sich von Überlappungsmaßnahmen wie Jaccard und Editierdistanzen, bis zu fortgeschrittenen Algorithmen des Maschinellen Lernens erstrecken. Die am besten ausführenden Algorithmen erreichen nicht nur gute Genauigkeit sondern tatsächlich zeigen wir, dass Fehler hoch korreliert sind mit Situationen, die auch herausfordernd für menschliche Kommentatoren sind. In einem weiteren Schritt untersuchen wir die Art von Verbesserung, die von unserem System erwartet werden kann wenn wir ihm die Chance geben, es wieder zu versuchen nachdem ein Fehler gemacht wurde. Dieses System macht keine vorherigen Annahmen darüber, welche Aktionen am wahrscheinlichsten als nächstes ausgewählt werden und unsere Ergebnisse liefern gute Argumente dafür, dass dieser Ansatz einer der schwächsten Aspekte ist. Um sich von einem Paradigma wegzubewegen, in dem alle Aktionen gleich wahrscheinlich betrachtet werden, zu einem Model, in dem das Handeln des Instruction Followers in Betracht gezogen wird, ist unser folgender Schritt die Entwicklung eines Systems, dass explizit das Verständnis des Anwenders modelliert. Voraussetzend, dass die Instruktion eine Referring Expression beinhaltet, gehen wir das Verstehen des Instruction Followers mit einer Kombination aus zwei probabilistischen Modellen an. Das semantische Modell verwendet Eigenschaften der Referring Expression um zu identifizieren, welches Objekt wahrscheinlicher a

Universaar

Acronym

Augmenting Situated Spoken Language Interaction with Listener Gaze

Author: Mitev Nikolina
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2019
Field of study

Collaborative task solving in a shared environment requires referential success. Human speakers follow the listener’s behavior in order to monitor language comprehension (Clark, 1996). Furthermore, a natural language generation (NLG) system can exploit listener gaze to realize an effective interaction strategy by responding to it with verbal feedback in virtual environments (Garoufi, Staudte, Koller, & Crocker, 2016). We augment situated spoken language interaction with listener gaze and investigate its role in human-human and human-machine interactions. Firstly, we evaluate its impact on prediction of reference resolution using a mulitimodal corpus collection from virtual environments. Secondly, we explore if and how a human speaker uses listener gaze in an indoor guidance task, while spontaneously referring to real-world objects in a real environment. Thirdly, we consider an object identification task for assembly under system instruction. We developed a multimodal interactive system and two NLG systems that integrate listener gaze in the generation mechanisms. The NLG system “Feedback” reacts to gaze with verbal feedback, either underspecified or contrastive. The NLG system “Installments” uses gaze to incrementally refer to an object in the form of installments. Our results showed that gaze features improved the accuracy of automatic prediction of reference resolution. Further, we found that human speakers are very good at producing referring expressions, and showing listener gaze did not improve performance, but elicited more negative feedback. In contrast, we showed that an NLG system that exploits listener gaze benefits the listener’s understanding. Specifically, combining a short, ambiguous instruction with con- trastive feedback resulted in faster interactions compared to underspecified feedback, and even outperformed following long, unambiguous instructions. Moreover, alternating the underspecified and contrastive responses in an interleaved manner led to better engagement with the system and an effcient information uptake, and resulted in equally good performance. Somewhat surprisingly, when gaze was incorporated more indirectly in the generation procedure and used to trigger installments, the non-interactive approach that outputs an instruction all at once was more effective. However, if the spatial expression was mentioned first, referring in gaze-driven installments was as efficient as following an exhaustive instruction. In sum, we provide a proof of concept that listener gaze can effectively be used in situated human-machine interaction. An assistance system using gaze cues is more attentive and adapts to listener behavior to ensure communicative success

Universaar

Acronym

Adapting the use of attributes to the task environment in joint action: results and a model

Author: Bard Ellen
Guhe Markus
Publication venue
Publication date: 01/06/2008
Field of study

Edinburgh Research Explorer

The listening talker: A review of human and algorithmic context-induced modifications of speech

Author: Adriaans
Albin
Alcántara
Andruski
ANSI S3.5-1997
Arai
Assmann
Assmann
Aubanel
Aubanel
Aubanel
Babel
Babel
Bailly
Baran
Barker
Batliner
Beautemps
Beckford Wassink
Beckman
Beckman
Bele
Bell
Benoit
Best
Biersack
Bird
Blamey
Boike
Bond
Bond
Bond
Boril
Bradlow
Bradlow
Bradlow
Bradlow
Branigan
Bregman
Bronkhorst
Brungart
Brungart
Brunskog
Burnham
Burnham
Burnham
Burnham
Castellanos
Chen
Cheskin
Cheyne
Chládková
Chung
Church
Cole
Cooke
Cooke
Cooke
Cooke
Cooke
Cooke
Cooper
Cooper
Cox
Cox
Cristia
Cristià
Cutler
Darwin
Dau
Davis
Davis
Dejonckere
Delvaux
Dodane
Dreher
Dudley
Dunst
Egan
Englund
Eriksson
Erting
Estival
Falk
Farris
Ferguson
Ferguson
Fernald
Fernald
Fernald
Fernald
Fernald
Field
Fisher
Fisher
Fitzpatrick
Floccia
Fogerty
Fogerty
Fowler
Fowler
Freed
Fux
Fux
Fux
Gagne
Gagne
Gagne
Galati
Garnier
Garnier
Garnier
Garnier
Garnier
Garnier
Garnier
Garrod
Giles
Goldwater
Golinkoff
Golinkoff
Gordon-Salant
Granlund
Granlund
Green
Grieser
Hawley
Hazan
Hazan
Hazan
Hazan
Healey
Helfer
Helfer
Hornsby
Horwitz
Howell
Imaizumi
Imaizumi
Ishizuka
Janarthanam
Johnson
Jun
Jung
Junqua
Junqua
Junqua
Kadiri
Kang
Kaplan
Kappes
Kawahara
Kewley-Port
Kim
Kim
Kirchhoff
Kitamura
Kitamura
Kondaurova
Kondaurova
Korn
Krause
Krause
Krause
Krause
Krause
Kretsinger
Kryter
Kuhl
Kusumoto
Lam
Lane
Laures
Laures
Lee
Lienard
Lindblom
Lindblom
Little
Liu
Liu
Liu
Lombard
Long
Long
Lu
Lu
Lu
Malsheen
Maniwa
Marin
Martin Cooke
Masataka
Matthies
Mattys
Mattys
Mattys
Maye
Maye
Mayo
Maëva Garnier
Metz
Michael
Miller
Mokbel
Monsen
Montgomery
Moon
Moon
Moore
Moore
Moulines
Naoi
Natale
Nejime
Newport
Niederjohn
Niwano
Niwano
Ostroff
Oviatt
Owren
Papoušek
Papoušek
Papoušek
Pardo
Patel
Patel
Payne
Payton
Pegg
Pelegrín-García
Perkell
Petkov
Peutz
Phillips
Picheny
Picheny
Picheny
Pickering
Pickett
Pickett
Pisoni
Pittman
Pollack
Pucher
Pye
Rasetshwane
Ratner
Ratner
Ratner
Rieser
Rogers
Rostolland
Rostolland
Ryan
Räsänen
Sachs
Sankowska
Sauert
Scarborough
Schmitt
Schulman
Schum
Shimron
Simon King
Sims
Singh
Skowronski
Smiljanic
Smith
Snow
Song
Stanton
Stern
Stilp
Stylianou
Summers
Summers
Sundberg
Sundberg
Sundberg
Suni
Synnestvedt
Taal
Taal
Tang
Tang
Tang
Tartter
Ternström
Thanavisuth
Titze
Torick
Trainor
Trainor
Traunmuller
Uchanski
Uchanski
Uther
Valentini-Botinhao
Valentini-Botinhao
Valian
Valian
van de Weijer
van Rooij
Vatikiotis-Bateson
Villegas
Vincent Aubanel
Vitevitch
Wang
Warner
Warren
Watson
Webster
Welby
Welby
Werker
World Health Organisation
Xu
Xu
Yamagishi
Yang
Yoo
Zajdó
Zampini
Zangl
Zhao
Zipf
Zorilă
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

Crossref

Hal - Université Grenoble Alpes

Edinburgh Research Explorer

Western Sydney ResearchDirect

RELEVANCE OF SOCIAL MEDIA ANALYSIS FOR OPERATIONAL MANAGEMENT IN HEALTH INSURANCE

Author: Schmelewa Maria
Publication venue
Publication date: 01/01/2014
Field of study

Open Access LMU

Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme

Author: Peter Spyns Jan Odijk
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2020
Field of study

Computational Linguistics; Germanic Languages; Artificial Intelligence (incl. Robotics); Computing Methodologie

Directory of Open Access Books (DOAB)

Novel Pitch Detection Algorithm With Application to Speech Coding

Author: Kura Vijay
Publication venue: ScholarWorks@UNO
Publication date: 19/12/2003
Field of study

This thesis introduces a novel method for accurate pitch detection and speech segmentation, named Multi-feature, Autocorrelation (ACR) and Wavelet Technique (MAWT). MAWT uses feature extraction, and ACR applied on Linear Predictive Coding (LPC) residuals, with a wavelet-based refinement step. MAWT opens the way for a unique approach to modeling: although speech is divided into segments, the success of voicing decisions is not crucial. Experiments demonstrate the superiority of MAWT in pitch period detection accuracy over existing methods, and illustrate its advantages for speech segmentation. These advantages are more pronounced for gain-varying and transitional speech, and under noisy conditions

University of New Orleans

Incremental Disfluency Detection for Spoken Learner English

Author: Skidmore Lucy
Publication venue
Publication date: 01/12/2022
Field of study

Dialogue-based computer-assisted language learning (CALL) concerns the application and analysis of automated systems that engage with a language learner through dialogue. Routed in an interactionist perspective of second language acquisition, dialogue-based CALL systems assume the role of a speaking partner, providing learners the opportunity for spontaneous production of their second language. One area of interest for such systems is the implementation of corrective feedback. However, the feedback strategies employed by such systems remain fairly limited. In particular, there are currently no provisions for learners to initiate the correction of their own errors, despite this being the most frequently occurring and most preferred type of error correction in learner speech. To address this gap, this thesis proposes a framework for implementing such functionality, identifying incremental self-initiated self-repair (i.e. disfluency) detection as a key area for research. Taking an interdisciplinary approach to the exploration of this topic, this thesis outlines the steps taken to optimise an incremental disfluency detection model for use with spoken learner English. To begin, a linguistic comparative analysis of native and learner disfluency corpora explored the differences between the disfluency behaviour of native and learner speech, highlighting key features of learner speech not previously explored in disfluency detection model analysis. Following this, in order to identify a suitable baseline model for further experimentation, two state-of-the-art incremental self-repair detection models were trained and tested with a learner speech corpus. An error analysis of the models' outputs found an LSTM model using word embeddings and part-of-speech tags to be the most suitable for learner speech, thanks to its lower number of false positives triggered by learner errors in the corpus. Following this, several adaptations to the model were tested to improve performance. Namely, the inclusion of character embeddings, silence and laughter features, separating edit term detection from disfluency detection, lemmatization and the inclusion of learners' prior proficiency scores led to over an eight percent model improvement over the baseline. Findings from this thesis illustrate how the analysis of language characteristics specific to learner speech can positively inform model adaptation and provide a starting point for further investigation into the implementation of effective corrective feedback strategies in dialogue-based CALL systems

White Rose E-theses Online

Modeling speech intelligibility based on the signal-to-noise envelope power ratio

Author: Jørgensen Søren
Publication venue: Technical University of Denmark, Department of Electrical Engineering
Publication date: 01/01/2014
Field of study

Online Research Database In Technology