679 research outputs found

    Stop Release in Polish English — Implications for Prosodic Constituency

    Get PDF
    Although there is little consensus on the relevance of non-contrastive allophonic processes in L2 speech acquisition, EFL pronunciation textbooks cover the suppression of stop release in coda position. The tendency for held stops in English is in stark opposition to a number of other languages, including Polish, in which plosive release is obligatory. This paper presents phonetic data on the acquisition of English unreleased stops by Polish learners. Results show that in addition to showing a tendency for the target language pattern of unreleased plosives, advanced learners may acquire more native-like VC formant transitions. From the functional perspective, languages with unreleased stops may be expected to have robust formant patterns on the final portion of the preceding vowel, which allow listeners to identify the final consonant when it lacks an audible release burst (see e.g. Wright 2004). From the perspective of syllabic positions, it may be said that ‘coda’ stops are obligatorily released in Polish, yet may be unreleased in English. Thus, the traditional term ‘coda’ is insufficient to describe the prosodic properties of post-vocalic stops in Polish and English. These differences may be captured in the Onset Prominence framework (Schwartz 2013). In languages with unreleased stops, the mechanism of submersion places post-vocalic stops at the bottom of the representational hierarchy where they may be subject to weakening. Submersion produces larger prosodic constituents and thus has phonological consequences beyond ‘coda’ behavior

    Cracking the social code of speech prosody using reverse correlation

    Get PDF
    Human listeners excel at forming high-level social representations about each other, even from the briefest of utterances. In particular, pitch is widely recognized as the auditory dimension that conveys most of the information about a speaker's traits, emotional states, and attitudes. While past research has primarily looked at the influence of mean pitch, almost nothing is known about how intonation patterns, i.e., finely tuned pitch trajectories around the mean, may determine social judgments in speech. Here, we introduce an experimental paradigm that combines state-of-the-art voice transformation algorithms with psychophysical reverse correlation and show that two of the most important dimensions of social judgments, a speaker's perceived dominance and trustworthiness, are driven by robust and distinguishing pitch trajectories in short utterances like the word "Hello," which remained remarkably stable whether male or female listeners judged male or female speakers. These findings reveal a unique communicative adaptation that enables listeners to infer social traits regardless of speakers' physical characteristics, such as sex and mean pitch. By characterizing how any given individual's mental representations may differ from this generic code, the method introduced here opens avenues to explore dysprosody and social-cognitive deficits in disorders like autism spectrum and schizophrenia. In addition, once derived experimentally, these prototypes can be applied to novel utterances, thus providing a principled way to modulate personality impressions in arbitrary speech signals

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Faked Speech Detection with Zero Knowledge

    Full text link
    Audio is one of the most used ways of human communication, but at the same time it can be easily misused to trick people. With the revolution of AI, the related technologies are now accessible to almost everyone thus making it simple for the criminals to commit crimes and forgeries. In this work, we introduce a neural network method to develop a classifier that will blindly classify an input audio as real or mimicked; the word 'blindly' refers to the ability to detect mimicked audio without references or real sources. The proposed model was trained on a set of important features extracted from a large dataset of audios to get a classifier that was tested on the same set of features from different audios. The data was extracted from two raw datasets, especially composed for this work; an all English dataset and a mixed dataset (Arabic plus English). These datasets have been made available, in raw form, through GitHub for the use of the research community at https://github.com/SaSs7/Dataset. For the purpose of comparison, the audios were also classified through human inspection with the subjects being the native speakers. The ensued results were interesting and exhibited formidable accuracy.Comment: 14 pages, 4 figures (6 if you count subfigures), 2 table

    Towards Tutoring an Interactive Robot

    Get PDF
    Wrede B, Rohlfing K, Spexard TP, Fritsch J. Towards tutoring an interactive robot. In: Hackel M, ed. Humanoid Robots, Human-like Machines. ARS; 2007: 601-612.Many classical approaches developed so far for learning in a human-robot interaction setting have focussed on rather low level motor learning by imitation. Some doubts, however, have been casted on whether with this approach higher level functioning will be achieved. Higher level processes include, for example, the cognitive capability to assign meaning to actions in order to learn from the tutor. Such capabilities involve that an agent not only needs to be able to mimic the motoric movement of the action performed by the tutor. Rather, it understands the constraints, the means and the goal(s) of an action in the course of its learning process. Further support for this hypothesis comes from parent-infant instructions where it has been observed that parents are very sensitive and adaptive tutors who modify their behavior to the cognitive needs of their infant. Based on these insights, we have started our research agenda on analyzing and modeling learning in a communicative situation by analyzing parent-infant instruction scenarios with automatic methods. Results confirm the well known observation that parents modify their behavior when interacting with their infant. We assume that these modifications do not only serve to keep the infant’s attention but do indeed help the infant to understand the actual goal of an action including relevant information such as constraints and means by enabling it to structure the action into smaller, meaningful chunks. We were able to determine first objective measurements from video as well as audio streams that can serve as cues for this information in order to facilitate learning of actions

    French Face-to-Face Interaction: Repetition as a Multimodal Resource

    Get PDF
    International audienceIn this chapter, after presenting the corpus as well as some of theannotations developed in the OTIM project, we then focus on the specificphenomenon of repetition. After briefly discussing this notion, we showthat different degrees of convergence can be achieved by speakersdepending on the multimodal complexity of the repetition and on thetiming in between the repeated element and the model. Although we focusmore specifically on the gestural level, we present a multimodal analysis ofgestural repetitions in which we met several issues linked to multimodalannotations of any type. This gives an overview of crucial issues in crosslevellinguistic annotation, such as the definition of a phenomenonincluding formal and/or functional categorization
    corecore