131 research outputs found

    Individual Differences in the Perceptual Learning of Degraded Speech: Implications for Cochlear Implant Aural Rehabilitation

    Get PDF
    abstract: In the noise and commotion of daily life, people achieve effective communication partly because spoken messages are replete with redundant information. Listeners exploit available contextual, linguistic, phonemic, and prosodic cues to decipher degraded speech. When other cues are absent or ambiguous, phonemic and prosodic cues are particularly important because they help identify word boundaries, a process known as lexical segmentation. Individuals vary in the degree to which they rely on phonemic or prosodic cues for lexical segmentation in degraded conditions. Deafened individuals who use a cochlear implant have diminished access to fine frequency information in the speech signal, and show resulting difficulty perceiving phonemic and prosodic cues. Auditory training on phonemic elements improves word recognition for some listeners. Little is known, however, about the potential benefits of prosodic training, or the degree to which individual differences in cue use affect outcomes. The present study used simulated cochlear implant stimulation to examine the effects of phonemic and prosodic training on lexical segmentation. Participants completed targeted training with either phonemic or prosodic cues, and received passive exposure to the non-targeted cue. Results show that acuity to the targeted cue improved after training. In addition, both targeted attention and passive exposure to prosodic features led to increased use of these cues for lexical segmentation. Individual differences in degree and source of benefit point to the importance of personalizing clinical intervention to increase flexible use of a range of perceptual strategies for understanding speech.Dissertation/ThesisDoctoral Dissertation Speech and Hearing Science 201

    Adaptive Cognitive Interaction Systems

    Get PDF
    Adaptive kognitive Interaktionssysteme beobachten und modellieren den Zustand ihres Benutzers und passen das Systemverhalten entsprechend an. Ein solches System besteht aus drei Komponenten: Dem empirischen kognitiven Modell, dem komputationalen kognitiven Modell und dem adaptiven Interaktionsmanager. Die vorliegende Arbeit enthält zahlreiche Beiträge zur Entwicklung dieser Komponenten sowie zu deren Kombination. Die Ergebnisse werden in zahlreichen Benutzerstudien validiert

    Automatic vocal recognition of a child's perceived emotional state within the Speechome corpus

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 137-149).With over 230,000 hours of audio/video recordings of a child growing up in the home setting from birth to the age of three, the Human Speechome Project has pioneered a comprehensive, ecologically valid observational dataset that introduces far-reaching new possibilities for the study of child development. By offering In vivo observation of a child's daily life experience at ultra-dense, longitudinal time scales, the Speechome corpus holds great potential for discovering developmental insights that have thus far eluded observation. The work of this thesis aspires to enable the use of the Speechome corpus for empirical study of emotional factors in early child development. To fully harness the benefits of Speechome for this purpose, an automated mechanism must be created to perceive the child's emotional state within this medium. Due to the latent nature of emotion, we sought objective, directly measurable correlates of the child's perceived emotional state within the Speechome corpus, focusing exclusively on acoustic features of the child's vocalizations and surrounding caretaker speech. Using Partial Least Squares regression, we applied these features to build a model that simulates human perceptual heuristics for determining a child's emotional state. We evaluated the perceptual accuracy of models built across child-only, adult-only, and combined feature sets within the overall sampled dataset, as well as controlling for social situations, vocalization behaviors (e.g. crying, laughing, babble), individual caretakers, and developmental age between 9 and 24 months. Child and combined models consistently demonstrated high perceptual accuracy, with overall adjusted R-squared values of 0.54 and 0.58, respectively, and an average of 0.59 and 0.67 per month. Comparative analysis across longitudinal and socio-behavioral contexts yielded several notable developmental and dyadic insights. In the process, we have developed a data mining and analysis methodology for modeling perceived child emotion and quantifying caretaker intersubjectivity that we hope to extend to future datasets across multiple children, as new deployments of the Speechome recording technology are established. Such large-scale comparative studies promise an unprecedented view into the nature of emotional processes in early childhood and potentially enlightening discoveries about autism and other developmental disorders.by Sophia Yuditskaya.S.M

    An Ordinal Approach to Affective Computing

    Full text link
    Both depression prediction and emotion recognition systems are often based on ordinal ground truth due to subjectively annotated datasets. Yet, both have so far been posed as classification or regression problems. These naive approaches have fundamental issues because they are not focused on ordering, unlike ordinal regression, which is the most appropriate for truly ordinal ground truth. Ordinal regression to date offers comparatively fewer, more limited methods when compared with other branches in machine learning, and its usage has been limited to specific research domains. Accordingly, this thesis presents investigations into ordinal approaches for affective computing by describing a consistent framework to understand all ordinal system designs, proposing ordinal systems for large datasets, and introducing tools and principles to select suitable system designs and evaluation methods. First, three learning approaches are compared using the support vector framework to establish the empirical advantages of ordinal regression, which is lacking from the current literature. Results on depression and emotion corpora indicate that ordinal regression with proper tuning can improve existing depression and emotion systems. Ordinal logistic regression (OLR), which is an extension of logistic regression for ordinal scales, contributes to a number of model structures, from which the best structure must be chosen. Exploiting the newly proposed computationally efficient greedy algorithm for model structure selection (GREP), OLR outperformed or was comparable with state-of-the-art depression systems on two benchmark depression speech datasets. Deep learning has dominated many affective computing fields, and hence ordinal deep learning is an attractive prospect. However, it is under-studied even in the machine learning literature, which motivates an in-depth analysis of appropriate network architectures and loss functions. One of the significant outcomes of this analysis is the introduction of RankCNet, a novel ordinal network which utilises a surrogate loss function of rank correlation. Not only the modelling algorithm but the choice of evaluation measure depends on the nature of the ground truth. Rank correlation measures, which are sensitive to ordering, are more apt for ordinal problems than common classification or regression measures that ignore ordering information. Although rank-based evaluation for ordinal problems is not new, so far in affective computing, ordinality of the ground truth has been widely ignored during evaluation. Hence, a systematic analysis in the affective computing context is presented, to provide clarity and encourage careful choice of evaluation measures. Another contribution is a neural network framework with a novel multi-term loss function to assess the ordinality of ordinally-annotated datasets, which can guide the selection of suitable learning and evaluation methods. Experiments on multiple synthetic and affective speech datasets reveal that the proposed system can offer reliable and meaningful predictions about the ordinality of a given dataset. Overall, the novel contributions and findings presented in this thesis not only improve prediction accuracy but also encourage future research towards ordinal affective computing: a different paradigm, but often the most appropriate

    Observations on the dynamic control of an articulatory synthesizer using speech production data

    Get PDF
    This dissertation explores the automatic generation of gestural score based control structures for a three-dimensional articulatory speech synthesizer. The gestural scores are optimized in an articulatory resynthesis paradigm using a dynamic programming algorithm and a cost function which measures the deviation from a gold standard in the form of natural speech production data. This data had been recorded using electromagnetic articulography, from the same speaker to which the synthesizer\u27s vocal tract model had previously been adapted. Future work to create an English voice for the synthesizer and integrate it into a text-to-speech platform is outlined.Die vorliegende Dissertation untersucht die automatische Erzeugung von gesturalpartiturbasierten Steuerdaten für ein dreidimensionales artikulatorisches Sprachsynthesesystem. Die gesturalen Partituren werden in einem artikulatorischen Resynthese-Paradigma mittels dynamischer Programmierung optimiert, unter Zuhilfenahme einer Kostenfunktion, die den Abstand zu einem "Gold Standard" in Form natürlicher Sprachproduktionsdaten mißt. Diese Daten waren mit elektromagnetischer Artikulographie am selben Sprecher aufgenommen worden, an den zuvor das Vokaltraktmodell des Synthesesystems angepaßt worden war. Weiterführende Forschung, eine englische Stimme für das Synthesesystem zu erzeugen und sie in eine Text-to-Speech-Plattform einzubetten, wird umrissen

    Language In My Mouth: Linguistic Variation in the Nmbo Speech Community of Southern New Guinea

    Get PDF
    This thesis is a mixed-methods investigation into the question of the sociolinguistics of linguistic diversity in Papua New Guinea. Social and cultural traits of New Guinean speech communities have been hypothesised as conducive to language differentiation and diversification (Laycock 1991, Thurston 1987, 1992, Foley 2000, Ross 2001), however there have been few empirical studies to support these hypotheses. In this thesis I investigate linguistic micro-variations within a contemporary New Guinean speech community, with the goal of identifying socio-cultural pressures that affect language variation and change. The community under investigation is the Nmbo speech community located in the Morehead area of Southern New Guinea. It is a highly multilingual community in the middle of the Nambu branch dialect chain, and consists primarily of the three villages Govav, Bevdvn, and Arovwe. The ideologically licensed speakers of Nmbo are the Kerake tribe people, but due to the practice of marriage exogamy, a large portion of non-Kerake people speak Nmbo as an additional language learnt from their parents or spouse. This thesis embraces the complexities of the multilingual ecology by including data from Kerake women who have married out of the Nmbo villages into the neighbouring Nen language village of Bimadbn. The empirical investigations bring data from three directions. First are the qualitative descriptions based on my own ethnographic fieldwork supported by prior ethnographic descriptions. The picture to emerge is of an egalitarian multilingual speech community. The qualitative descriptions also provide basic facts about demographics and social structures of the community. Second is the linguistic description of the Nmbo language. Nmbo is an under-described language without substantial prior description, and this thesis contains a sketch grammar covering the basics aspects of Nmbo grammar. Finally there are three quantitative studies of variation. The vowel sociophonetic study and the word initial [h]-drop study are classic Labovian variationist studies that investigate patterns of variation across a sample of speakers. The former is based of elicited word list data, and the latter on naturalistic speech data. The third quantitative study takes a grammaticalisation approach to an emergent topic marker in a topicalising construction from a relative clause construction. This is the first thesis ever produced providing qualitative, descriptive, and quantitative data from a New Guinean speech community within a language ecology of vital indigenous multilingualism. The contributions of the thesis are two fold. Firstly, this thesis brings grammatical and sociolinguistic descriptions from an under-studied language. It is a socio-grammar (Nagy 2009) that considers language ecology, sociolinguistics, and grammatical description. Secondly, this thesis contributes empirical data on the sociolinguistics of small-scale speech communities. The classic sociolinguistic variable of gender is not found to be particularly significant in the variables studied, despite the community being highly gendered in other social domains. Village, however, shows some significance. As far as the three variables are concerned, Nmbo speakers show little community-internal variation and paint a picture of a tight-knit society of intimates (Trudgill 2011). The conclusion to the question of the sociolinguistics of diversification is that while there is some evidence of sociolinguistic differentiation within the Nmbo speech community, the most important social groups to orient against are the other sister language groups in the Morehead area. The nascent variation within the Nmbo speech community, combined with the ethnographic evidence of a cluster of dense and multiplex social networks, suggest that should the social need to differentiate between other Kerake arise, linguistic differentiation may occur rapidly

    The Linguistic Expression of Affective Stance in Yaminawa (Pano, Peru)

    Get PDF
    This dissertation explores affective expression in Yaminawa, a Panoan language of Peruvian Amazonia. In this study, ‘affect’ is used to refer broadly to the English language concepts of ‘emotion’ and ‘feeling’. Affective expression is approached as an interactional phenomenon and it is analyzed in terms of affective stancetaking, i.e., the way speakers position themselves to objects in the discourse as well as their interlocutors via linguistic performance. This study considers affective resources at the levels of the lexicon, morphology, prosody, acoustics (voice quality, speech rate and volume, etc.), and interactional features (turn duration, complexity of backchannels, etc.). This study contextualizes affective expression in Yaminawa with a detailed description of Yaminawa ethnopsychology and the lexical resources that describe affective states, as well as behaviors and bodily sensations that are associated with particular affects by the Yaminawa. Using methods from Cognitive Anthropology, I investigate the ways that native Yaminawa speakers categorize emotion terms, and show that prosociality vs. antisociality is a major cultural axis along which emotion terms are conceptually organized. This dissertation also provides both a general ethnographic sketch of daily life among the Yaminawa community of Sepahua and a grammar sketch of the Yaminawa language.Yaminawa is notable for its rich inventory of bound morphemes that are used in affective expression. Some of the affective categories expressed by these bound morphemes, such as sadness, appear to be typologically unusual. In everyday conversation, certainmorphological, acoustic, and interactional features cluster together in recurrent affective ways of speaking that are identifiable by speakers even when the propositional content of the utterances cannot be clearly heard. This dissertation describes two salient affective ways of speaking in detail: shĩ́nã̀ì ‘sad’ speech and sídàì ‘angry’ speech. Shĩ́nã̀ì ‘sad’ speech is characterized by creaky voice, low speech volume, and high frequency and complexity of backchannelling by co-participants, among other features. Sídàì ‘angry’ speech is characterized by breathy voice, slow and rhythmic speech rate, and scarcity and simplicity of backchannels. I also briefly describe the key features of three additional, minor affective stances: dúì ‘affection’, rátèì ‘surprise’, and bésèì ‘fear’. Some affective resources are used in more than one type of affective speech, for example, high pitch is used in affectionate speech, surprised speech, and commands issued in angry speech. Other affective resources appear to be unique to a single affective type, such as delayed stop release in fearful speech.While previous descriptions of affective expression in individual languages have tended to focus on single levels of analysis, such as metaphor or morphology, this dissertation aims to provide a model for the holistic description of affective expression in an individual language

    Electroacoustical simulation of listening room acoustics for project ARCHIMEDES

    Get PDF

    Semantic radical consistency and character transparency effects in Chinese: an ERP study

    Get PDF
    BACKGROUND: This event-related potential (ERP) study aims to investigate the representation and temporal dynamics of Chinese orthography-to-semantics mappings by simultaneously manipulating character transparency and semantic radical consistency. Character components, referred to as radicals, make up the building blocks used dur...postprin
    corecore