108 research outputs found
Recommended from our members
The effects of speaking style, noise, and semantic context on speech segmentation : evidence from artificial language learning and eye-tracking experiments
This dissertation reports a series of three experimental studies that investigated how variation in speech clarity and intelligibility as well as the presence of semantic context affects listeners’ speech segmentation. An artificial language learning experiment in Study 1 (Chapter 2) showed that speaking clearly relative to speaking conversationally improved segmentation of nonsense words by statistical learning. However, the improvement was observed only in the quiet listening condition but not in noise. Using the visual-world eye-tracking paradigm, Study 2 (Chapter 3) examined the clear speech segmentation benefit during real-time processing of meaningful sentences in which the target word was temporarily ambiguous with a competitor across a word boundary. The results revealed that that relative to conversational speech, clear speech facilitated target word segmentation even before the target and competitor could be disambiguated based on phonemic information. The facilitation not only emerged in quiet but also extended to the noisy listening condition. Finally, built upon Study 2, Study 3 (Chapter 4) employed eye-tracking to further explore how such clear speech facilitation effect is modulated by semantic cues from the preceding context. It was found that while the clear speech segmentation benefit was eliminated when the context already biased listeners towards the target, it was still present when the context favored the unintended competitor. Taken together, the key findings from the three studies advance the understanding of the relative importance of signal-dependent and signal-independent sources of information during segmentation in realistic communicative settings. The results also provide novel insight into the well-documented clear speech processing benefits by demonstrating that improved segmentation may in part underlie these benefits. The dissertation also has theoretical implications, suggesting directions for refining the current spoken word recognition and segmentation models.Linguistic
Recommended from our members
Deep Learning for Automatic Assessment and Feedback of Spoken English
Growing global demand for learning a second language (L2), particularly English, has led to
considerable interest in automatic spoken language assessment, whether for use in computerassisted language learning (CALL) tools or for grading candidates for formal qualifications.
This thesis presents research conducted into the automatic assessment of spontaneous nonnative English speech, with a view to be able to provide meaningful feedback to learners. One
of the challenges in automatic spoken language assessment is giving candidates feedback on
particular aspects, or views, of their spoken language proficiency, in addition to the overall
holistic score normally provided. Another is detecting pronunciation and other types of errors
at the word or utterance level and feeding them back to the learner in a useful way.
It is usually difficult to obtain accurate training data with separate scores for different
views and, as examiners are often trained to give holistic grades, single-view scores can
suffer issues of consistency. Conversely, holistic scores are available for various standard
assessment tasks such as Linguaskill. An investigation is thus conducted into whether
assessment scores linked to particular views of the speaker’s ability can be obtained from
systems trained using only holistic scores.
End-to-end neural systems are designed with structures and forms of input tuned to single
views, specifically each of pronunciation, rhythm, intonation and text. By training each
system on large quantities of candidate data, individual-view information should be possible
to extract. The relationships between the predictions of each system are evaluated to examine
whether they are, in fact, extracting different information about the speaker. Three methods
of combining the systems to predict holistic score are investigated, namely averaging their
predictions and concatenating and attending over their intermediate representations. The
combined graders are compared to each other and to baseline approaches.
The tasks of error detection and error tendency diagnosis become particularly challenging
when the speech in question is spontaneous and particularly given the challenges posed by
the inconsistency of human annotation of pronunciation errors. An approach to these tasks is
presented by distinguishing between lexical errors, wherein the speaker does not know how a
particular word is pronounced, and accent errors, wherein the candidate’s speech exhibits
consistent patterns of phone substitution, deletion and insertion. Three annotated corpora
x
of non-native English speech by speakers of multiple L1s are analysed, the consistency of
human annotation investigated and a method presented for detecting individual accent and
lexical errors and diagnosing accent error tendencies at the speaker level
Melody as Prosody: Toward a Usage-Based Theory of Music
MELODY AS PROSODY: TOWARD A USAGE-BASED THEORY OF MUSIC
Thomas M. Pooley
Gary A. Tomlinson
Rationalist modes of inquiry have dominated the cognitive science of music over the past several decades. This dissertation contests many rationalist assumptions, including its core tenets of nativism, modularity, and computationism, by drawing on a wide range of evidence from psychology, neuroscience, linguistics, and cognitive music theory, as well as original data from a case study of Zulu song prosody. An alternative biocultural approach to the study of music and mind is outlined that takes account of musical diversity by attending to shared cognitive mechanisms. Grammar emerges through use, and cognitive categories are learned and constructed in particular social contexts. This usage-based theory of music shows how domain-general cognitive mechanisms for patterning-finding and intention-reading are crucial to acquisition, and how Gestalt principles are invoked in perception. Unlike generative and other rationalist approaches that focus on a series of idealizations, and the cognitive `competences\u27 codified in texts and musical scores, the usage-based approach investigates actual performances in everyday contexts by using instrumental measures of process.
The study focuses on song melody because it is a property of all known musics. Melody is used for communicative purposes in both song and speech. Vocalized pitch patterning conveys a wide range of affective, propositional, and syntactic information through prosodic features that are shared by the two domains. The study of melody as prosody shows how gradient pitch features are crucial to the design and communicative functions of song melodies. The prosodic features shared by song and speech include: speech tone, intonation, and pitch-accent. A case study of ten Zulu memulo songs shows that pitch is not used in the discrete or contrastive fashion proposed by many cognitive music theorists and most (generative) phonologists. Instead there are a range of pitch categories that include pitch targets, glides, and contours. These analyses also show that song melody has a multi-dimensional pitch structure, and that it is a dynamic adaptive system that is irreducible in its complexity
Faces and hands : modeling and animating anatomical and photorealistic models with regard to the communicative competence of virtual humans
In order to be believable, virtual human characters must be able to communicate in a human-like fashion realistically. This dissertation contributes to improving and automating several aspects of virtual conversations. We have proposed techniques to add non-verbal speech-related facial expressions to audiovisual speech, such as head nods for of emphasis. During conversation, humans experience shades of emotions much more frequently than the strong Ekmanian basic emotions. This prompted us to develop a method that interpolates between facial expressions of emotions to create new ones based on an emotion model. In the area of facial modeling, we have presented a system to generate plausible 3D face models from vague mental images. It makes use of a morphable model of faces and exploits correlations among facial features. The hands also play a major role in human communication. Since the basis for every realistic animation of gestures must be a convincing model of the hand, we devised a physics-based anatomical hand model, where a hybrid muscle model drives the animations. The model was used to visualize complex hand movement captured using multi-exposure photography.Um überzeugend zu wirken, müssen virtuelle Figuren auf dieselbe Art wie lebende Menschen kommunizieren können. Diese Dissertation hat das Ziel, verschiedene Aspekte virtueller Unterhaltungen zu verbessern und zu automatisieren. Wir führten eine Technik ein, die es erlaubt, audiovisuelle Sprache durch nichtverbale sprachbezogene Gesichtsausdrücke zu bereichern, wie z.B. Kopfnicken zur Betonung. Während einer Unterhaltung empfinden Menschen weitaus öfter Emotionsnuancen als die ausgeprägten Ekmanschen Basisemotionen. Dies bewog uns, eine Methode zu entwickeln, die Gesichtsausdrücke für neue Emotionen erzeugt, indem sie, ausgehend von einem Emotionsmodell, zwischen bereits bekannten Gesichtsausdrücken interpoliert. Auf dem Gebiet der Gesichtsmodellierung stellten wir ein System vor, um plausible 3D-Gesichtsmodelle aus vagen geistigen Bildern zu erzeugen. Dieses System basiert auf einem Morphable Model von Gesichtern und nutzt Korrelationen zwischen Gesichtszügen aus. Auch die Hände spielen ein große Rolle in der menschlichen Kommunikation. Da der Ausgangspunkt für jede realistische Animation von Gestik ein überzeugendes Handmodell sein muß, entwikkelten wir ein physikbasiertes anatomisches Handmodell, bei dem ein hybrides Muskelmodell die Animationen antreibt. Das Modell wurde verwendet, um komplexe Handbewegungen zu visualisieren, die aus mehrfach belichteten Photographien extrahiert worden waren
Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018 : 10-12 December 2018, Torino
On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
- …