Search CORE

108 research outputs found

Recommended from our members

The effects of speaking style, noise, and semantic context on speech segmentation : evidence from artificial language learning and eye-tracking experiments

Author: Guo Zhe-chen
Publication venue
Publication date: 19/12/2023
Field of study

This dissertation reports a series of three experimental studies that investigated how variation in speech clarity and intelligibility as well as the presence of semantic context affects listeners’ speech segmentation. An artificial language learning experiment in Study 1 (Chapter 2) showed that speaking clearly relative to speaking conversationally improved segmentation of nonsense words by statistical learning. However, the improvement was observed only in the quiet listening condition but not in noise. Using the visual-world eye-tracking paradigm, Study 2 (Chapter 3) examined the clear speech segmentation benefit during real-time processing of meaningful sentences in which the target word was temporarily ambiguous with a competitor across a word boundary. The results revealed that that relative to conversational speech, clear speech facilitated target word segmentation even before the target and competitor could be disambiguated based on phonemic information. The facilitation not only emerged in quiet but also extended to the noisy listening condition. Finally, built upon Study 2, Study 3 (Chapter 4) employed eye-tracking to further explore how such clear speech facilitation effect is modulated by semantic cues from the preceding context. It was found that while the clear speech segmentation benefit was eliminated when the context already biased listeners towards the target, it was still present when the context favored the unintended competitor. Taken together, the key findings from the three studies advance the understanding of the relative importance of signal-dependent and signal-independent sources of information during segmentation in realistic communicative settings. The results also provide novel insight into the well-documented clear speech processing benefits by demonstrating that improved segmentation may in part underlie these benefits. The dissertation also has theoretical implications, suggesting directions for refining the current spoken word recognition and segmentation models.Linguistic

Texas ScholarWorks

Recommended from our members

Deep Learning for Automatic Assessment and Feedback of Spoken English

Author: Kyriakopoulos Konstantinos
Publication venue: University of Cambridge
Publication date: 12/03/2022
Field of study

Growing global demand for learning a second language (L2), particularly English, has led to considerable interest in automatic spoken language assessment, whether for use in computerassisted language learning (CALL) tools or for grading candidates for formal qualifications. This thesis presents research conducted into the automatic assessment of spontaneous nonnative English speech, with a view to be able to provide meaningful feedback to learners. One of the challenges in automatic spoken language assessment is giving candidates feedback on particular aspects, or views, of their spoken language proficiency, in addition to the overall holistic score normally provided. Another is detecting pronunciation and other types of errors at the word or utterance level and feeding them back to the learner in a useful way. It is usually difficult to obtain accurate training data with separate scores for different views and, as examiners are often trained to give holistic grades, single-view scores can suffer issues of consistency. Conversely, holistic scores are available for various standard assessment tasks such as Linguaskill. An investigation is thus conducted into whether assessment scores linked to particular views of the speaker’s ability can be obtained from systems trained using only holistic scores. End-to-end neural systems are designed with structures and forms of input tuned to single views, specifically each of pronunciation, rhythm, intonation and text. By training each system on large quantities of candidate data, individual-view information should be possible to extract. The relationships between the predictions of each system are evaluated to examine whether they are, in fact, extracting different information about the speaker. Three methods of combining the systems to predict holistic score are investigated, namely averaging their predictions and concatenating and attending over their intermediate representations. The combined graders are compared to each other and to baseline approaches. The tasks of error detection and error tendency diagnosis become particularly challenging when the speech in question is spontaneous and particularly given the challenges posed by the inconsistency of human annotation of pronunciation errors. An approach to these tasks is presented by distinguishing between lexical errors, wherein the speaker does not know how a particular word is pronounced, and accent errors, wherein the candidate’s speech exhibits consistent patterns of phone substitution, deletion and insertion. Three annotated corpora x of non-native English speech by speakers of multiple L1s are analysed, the consistency of human annotation investigated and a method presented for detecting individual accent and lexical errors and diagnosing accent error tendencies at the speaker level

Apollo (Cambridge)

Melody as Prosody: Toward a Usage-Based Theory of Music

Author: Pooley Thomas Mathew
Publication venue: ScholarlyCommons
Publication date: 01/01/2014
Field of study

MELODY AS PROSODY: TOWARD A USAGE-BASED THEORY OF MUSIC Thomas M. Pooley Gary A. Tomlinson Rationalist modes of inquiry have dominated the cognitive science of music over the past several decades. This dissertation contests many rationalist assumptions, including its core tenets of nativism, modularity, and computationism, by drawing on a wide range of evidence from psychology, neuroscience, linguistics, and cognitive music theory, as well as original data from a case study of Zulu song prosody. An alternative biocultural approach to the study of music and mind is outlined that takes account of musical diversity by attending to shared cognitive mechanisms. Grammar emerges through use, and cognitive categories are learned and constructed in particular social contexts. This usage-based theory of music shows how domain-general cognitive mechanisms for patterning-finding and intention-reading are crucial to acquisition, and how Gestalt principles are invoked in perception. Unlike generative and other rationalist approaches that focus on a series of idealizations, and the cognitive `competences\u27 codified in texts and musical scores, the usage-based approach investigates actual performances in everyday contexts by using instrumental measures of process. The study focuses on song melody because it is a property of all known musics. Melody is used for communicative purposes in both song and speech. Vocalized pitch patterning conveys a wide range of affective, propositional, and syntactic information through prosodic features that are shared by the two domains. The study of melody as prosody shows how gradient pitch features are crucial to the design and communicative functions of song melodies. The prosodic features shared by song and speech include: speech tone, intonation, and pitch-accent. A case study of ten Zulu memulo songs shows that pitch is not used in the discrete or contrastive fashion proposed by many cognitive music theorists and most (generative) phonologists. Instead there are a range of pitch categories that include pitch targets, glides, and contours. These analyses also show that song melody has a multi-dimensional pitch structure, and that it is a dynamic adaptive system that is irreducible in its complexity

ScholarlyCommons@Penn

コーパス　ゲンゴガク　オヨビ　ジッケン　ゲンゴガク　二　モトヅク　カクジョシ　コウタイ　ノ　ブンセキ

Author: ナンブサトシ
南部智史
Publication venue: 'Springer Publishing Company'
Publication date
Field of study

Osaka University Knowledge Archive

Prediction of noise in ships by the application of “statistical energy analysis.”

Author: Jensen John Ødegaard
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/1979
Field of study

Crossref

Online Research Database In Technology

Temporal integration of loudness as a function of level

Author: Buus Søren
Florentine Mary
Poulsen Torben
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/1996
Field of study

Crossref

Online Research Database In Technology

Faces and hands : modeling and animating anatomical and photorealistic models with regard to the communicative competence of virtual humans

Author: Albrecht Irene
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 04/09/2007
Field of study

In order to be believable, virtual human characters must be able to communicate in a human-like fashion realistically. This dissertation contributes to improving and automating several aspects of virtual conversations. We have proposed techniques to add non-verbal speech-related facial expressions to audiovisual speech, such as head nods for of emphasis. During conversation, humans experience shades of emotions much more frequently than the strong Ekmanian basic emotions. This prompted us to develop a method that interpolates between facial expressions of emotions to create new ones based on an emotion model. In the area of facial modeling, we have presented a system to generate plausible 3D face models from vague mental images. It makes use of a morphable model of faces and exploits correlations among facial features. The hands also play a major role in human communication. Since the basis for every realistic animation of gestures must be a convincing model of the hand, we devised a physics-based anatomical hand model, where a hybrid muscle model drives the animations. The model was used to visualize complex hand movement captured using multi-exposure photography.Um überzeugend zu wirken, müssen virtuelle Figuren auf dieselbe Art wie lebende Menschen kommunizieren können. Diese Dissertation hat das Ziel, verschiedene Aspekte virtueller Unterhaltungen zu verbessern und zu automatisieren. Wir führten eine Technik ein, die es erlaubt, audiovisuelle Sprache durch nichtverbale sprachbezogene Gesichtsausdrücke zu bereichern, wie z.B. Kopfnicken zur Betonung. Während einer Unterhaltung empfinden Menschen weitaus öfter Emotionsnuancen als die ausgeprägten Ekmanschen Basisemotionen. Dies bewog uns, eine Methode zu entwickeln, die Gesichtsausdrücke für neue Emotionen erzeugt, indem sie, ausgehend von einem Emotionsmodell, zwischen bereits bekannten Gesichtsausdrücken interpoliert. Auf dem Gebiet der Gesichtsmodellierung stellten wir ein System vor, um plausible 3D-Gesichtsmodelle aus vagen geistigen Bildern zu erzeugen. Dieses System basiert auf einem Morphable Model von Gesichtern und nutzt Korrelationen zwischen Gesichtszügen aus. Auch die Hände spielen ein große Rolle in der menschlichen Kommunikation. Da der Ausgangspunkt für jede realistische Animation von Gestik ein überzeugendes Handmodell sein muß, entwikkelten wir ein physikbasiertes anatomisches Handmodell, bei dem ein hybrides Muskelmodell die Animationen antreibt. Das Modell wurde verwendet, um komplexe Handbewegungen zu visualisieren, die aus mehrfach belichteten Photographien extrahiert worden waren

Universaar

Acronym

Electroacoustical simulation of listening room acoustics for project ARCHIMEDES

Author: Bech Søren
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/1989
Field of study

Crossref

Online Research Database In Technology

Application of the PE method to up-slope sound propagation

Author: Arranz Marta Galindo
Rasmussen Karsten Bo
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/1995
Field of study

Crossref

Online Research Database In Technology

Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018 : 10-12 December 2018, Torino

Author: Alessandro Mazzei
Elena Cabrio
Fabio Tamburini
Publication venue: 'OpenEdition'
Publication date: 01/01/2018
Field of study

On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

Directory of Open Access Books (DOAB)