1,010 research outputs found

    Final Rises in task-oriented and conversational dialogue

    Get PDF

    Controlling for Confounders in Multimodal Emotion Classification via Adversarial Learning

    Full text link
    Various psychological factors affect how individuals express emotions. Yet, when we collect data intended for use in building emotion recognition systems, we often try to do so by creating paradigms that are designed just with a focus on eliciting emotional behavior. Algorithms trained with these types of data are unlikely to function outside of controlled environments because our emotions naturally change as a function of these other factors. In this work, we study how the multimodal expressions of emotion change when an individual is under varying levels of stress. We hypothesize that stress produces modulations that can hide the true underlying emotions of individuals and that we can make emotion recognition algorithms more generalizable by controlling for variations in stress. To this end, we use adversarial networks to decorrelate stress modulations from emotion representations. We study how stress alters acoustic and lexical emotional predictions, paying special attention to how modulations due to stress affect the transferability of learned emotion recognition models across domains. Our results show that stress is indeed encoded in trained emotion classifiers and that this encoding varies across levels of emotions and across the lexical and acoustic modalities. Our results also show that emotion recognition models that control for stress during training have better generalizability when applied to new domains, compared to models that do not control for stress during training. We conclude that is is necessary to consider the effect of extraneous psychological factors when building and testing emotion recognition models.Comment: 10 pages, ICMI 201

    Unusual Prosodic Descriptors in Young, Verbal Children with Autism Spectrum Disorders

    Get PDF
    This study aimed to determine which prosodic descriptors best characterized the speech of children with autism spectrum disorders (ASD) and whether these descriptors (e.g., sing-song and monotone) are acoustically different. Two listeners\u27 auditory perceptions of the speech of the children with ASD and the pitch of the speech samples were analyzed. The results suggest that individual children are characterized by a variety of prosodic descriptors. Some thought groups were described as both sing-song and monotone, however, most children appear to be either more monotone or more sing-song. Furthermore, the subjective and acoustic data suggest a strong relationship between atypical intonation and sing-song perceptions as well as atypical rhythm and monotone perceptions. Implications for an earlier diagnosis of ASD and for the development of therapy tasks to target these deficits are discussed

    Prepositional Phrase Attachment Ambiguities in Declarative and Interrogative Contexts: Oral Reading Data

    Full text link
    Certain English sentences containing multiple prepositional phrases (e.g., She had planned to cram the paperwork in the drawer into her briefcase) have been reported to be prone to mis-parsing of a kind that is standardly called a “garden path.” The mis-parse stems from the temporary ambiguity of the first prepositional phrase (PP1: in the drawer), which tends to be interpreted initially as the goal argument of the verb cram. If the sentence ended there, that would be correct. But that analysis is overridden when the second prepositional phrase (PP2: into her briefcase) is encountered, since the into phrase can only be interpreted as the goal argument of the verb. Thus, PP2 necessarily supplants PP1’s initially assigned position as goal, and PP1 must be reanalyzed as a modifier of the object NP (the paperwork). Interrogative versions of the same sentence structure (Had she planned to cram the paperwork in the drawer into her briefcase?) may have a different profile. They have been informally judged to be easier to process than their declarative counterparts, because they are less susceptible to the initial garden path analysis. The study presented here represents an attempt to find a behavioral correlate of this intuitive difference in processing difficulty. The experiment employs the Double Reading Paradigm (Fodor, Macaulay, Ronkos, Callahan, and Peckenpaugh, 2019). Participants were asked to read aloud a visually presented sentence twice, first without taking any time at all to preview the sentence content (Reading 1), and then again after unlimited preview (Reading 2). The experimental items were created in a 2 x 2 design with one factor being Speech Act (declarative vs. interrogative) and the other being PP2 Status, i.e., PP2 could only be an argument of the verb iv (Arg), as above, or else PP2 could be interpreted as a modifier (Mod) of the NP within the preceding PP, as in She had / Had she planned to cram the paperwork in the drawer of her filing cabinet(?). Participants’ recordings of Reading 1 and Reading 2 were subjected to prosodic coding by a linguist who was naive to the research question. Distributions of prosodic boundaries were statistically analyzed to extract any significant differences in prosodic boundary patterns as a function of Speech Act, Reading, or PP2 Status. Logistic mixed effect regression models indicated, as anticipated, a significant effect of PP2 Status across all analyses of prosodic phrasing, and a significant effect of Reading for both analyses of prosodic phrasing that included boundary strength. Speech Act was a significant predictor in one of prosodic phrasing, but the hypothesized interaction (between Speech Act and PP2 Status) was not significant in any model. Another analysis concerned the amount of time a participant spent silently studying a sentence after Reading 1 to be confident they had understood it before reading it aloud again (Reading 2). The time between readings is referred to as the inter-reading time (IRT). It was assumed that a longer IRT signifies greater processing difficulty of the sentence. Thus, IRT was hypothesized to provide a behavioral correlate of the intuitive judgement that the interrogative garden paths are easier to process than the declarative ones. If a correlate had been found, it would have taken the form of an interaction between the two factors (Speech Act and PP2 Status) such that the IRT difference between Arg and Mod sentence versions was smaller for interrogatives than for declaratives. Ultimately, however, no statistically significant interaction between Speech Act and PP2 Status was found. Further studies seeking behavioral evidence of the informal intuition motivating this research are proposed. Also offered are possible explanations for why the intuition is apparently so strong for some English speakers, and why, if so, it is not reflected in IRT. Significant ancillary findings are that interrogatives are in general more difficult to process than corresponding declaratives. Also, inter-reading time (IRT) in the Double Reading paradigm is confirmed as a useful measure of sentence processing difficulty given that within the declarative sentences, the garden-path (Arg) versions showed significantly longer IRTs than the non-garden-path (Mod) versions

    Speech monitoring and phonologically-mediated eye gaze in language perception and production: a comparison using printed word eye-tracking

    Get PDF
    The Perceptual Loop Theory of speech monitoring assumes that speakers routinely inspect their inner speech. In contrast, Huettig and Hartsuiker (2010) observed that listening to one's own speech during language production drives eye-movements to phonologically related printed words with a similar time-course as listening to someone else's speech does in speech perception experiments. This suggests that speakers use their speech perception system to listen to their own overt speech, but not to their inner speech. However, a direct comparison between production and perception with the same stimuli and participants is lacking so far. The current printed word eye-tracking experiment therefore used a within-subjects design, combining production and perception. Displays showed four words, of which one, the target, either had to be named or was presented auditorily. Accompanying words were phonologically related, semantically related, or unrelated to the target. There were small increases in looks to phonological competitors with a similar time-course in both production and perception. Phonological effects in perception however lasted longer and had a much larger magnitude. We conjecture that this difference is related to a difference in predictability of one's own and someone else's speech, which in turn has consequences for lexical competition in other-perception and possibly suppression of activation in self-perception

    The development of children's ability to track and predict turn structure in conversation

    Get PDF
    Children begin developing turn-taking skills in infancy but take several years to fluidly integrate their growing knowledge of language into their turn-taking behavior. In two eye-tracking experiments, we measured children’s anticipatory gaze to upcoming responders while controlling linguistic cues to turn structure. In Experiment 1, we showed English and non-English conversations to English-speaking adults and children. In Experiment 2, we phonetically controlled lexicosyntactic and prosodic cues in English-only speech. Children spontaneously made anticipatory gaze switches by age two and continued improving through age six. In both experiments, children and adults made more anticipatory switches after hearing questions. Consistent with prior findings on adult turn prediction, prosodic information alone did not increase children’s anticipatory gaze shifts. But, unlike prior work with adults, lexical information alone was not sucient either—children’s performance was best overall with lexicosyntax and prosody together. Our findings support an account in which turn tracking and turn prediction emerge in infancy and then gradually become integrated with children’s online linguistic processing
    corecore