3,789 research outputs found

    From Holistic to Discrete Speech Sounds: The Blind Snow-Flake Maker Hypothesis

    Get PDF
    Sound is a medium used by humans to carry information. The existence of this kind of medium is a pre-requisite for language. It is organized into a code, called speech, which provides a repertoire of forms that is shared in each language community. This code is necessary to support the linguistic interactions that allow humans to communicate. How then may a speech code be formed prior to the existence of linguistic interactions? Moreover, the human speech code is characterized by several properties: speech is digital and compositional (vocalizations are made of units re-used systematically in other syllables); phoneme inventories have precise regularities as well as great diversity in human languages; all the speakers of a language community categorize sounds in the same manner, but each language has its own system of categorization, possibly very different from every other. How can a speech code with these properties form? These are the questions we will approach in the paper. We will study them using the method of the artificial. We will build a society of artificial agents, and study what mechanisms may provide answers. This will not prove directly what mechanisms were used for humans, but rather give ideas about what kind of mechanism may have been used. This allows us to shape the search space of possible answers, in particular by showing what is sufficient and what is not necessary. The mechanism we present is based on a low-level model of sensory-motor interactions. We show that the integration of certain very simple and non language-specific neural devices allows a population of agents to build a speech code that has the properties mentioned above. The originality is that it pre-supposes neither a functional pressure for communication, nor the ability to have coordinated social interactions (they do not play language or imitation games). It relies on the self-organizing properties of a generic coupling between perception and production both within agents, and on the interactions between agents

    Perceptual Calibration of F0 Production: Evidence from Feedback Perturbation

    Get PDF
    Hearing one’s own speech is important for language learning and maintenance of accurate articulation. For example, people with postlinguistically acquired deafness often show a gradual deterioration of many aspects of speech production. In this manuscript, data are presented that address the role played by acoustic feedback in the control of voice fundamental frequency (F0). Eighteen subjects produced vowels under a control ~normal F0 feedback! and two experimental conditions: F0 shifted up and F0 shifted down. In each experimental condition subjects produced vowels during a training period in which their F0 was slowly shifted without their awareness. Following this exposure to transformed F0, their acoustic feedback was returned to normal. Two effects were observed. Subjects compensated for the change in F0 and showed negative aftereffects. When F0 feedback was returned to normal, the subjects modified their produced F0 in the opposite direction to the shift. The results suggest that fundamental frequency is controlled using auditory feedback and with reference to an internal pitch representation. This is consistent with current work on internal models of speech motor control

    (Un)markedness of trills : the case of Slavic r-palatalisation

    Get PDF
    This paper evaluates trills [r] and their palatalized counterparts [rj] from the point of view of markedness. It is argued that [r]s are unmarked sounds in comparison to [rj]s which follows from the examination of the following parameters: (a) frequency of occurrence, (b) articulatory and aerodynamic characteristics, (c) perceptual features, (d) emergence in the process of language acquisition, (e) stability from a diachronic point of view, (f) phonotactic distribution, and (g) implications. Several markedness aspects of [r]s and [rj] are analyzed on the basis of Slavic languages which offer excellent material for the evaluation of trills. Their phonetic characteristics incorporated into phonetically grounded constraints are employed for a phonological OT-analysis of r-palatalization in two selected languages: Polish and Czech

    Speaker-normalized sound representations in the human auditory cortex

    Get PDF
    The acoustic dimensions that distinguish speech sounds (like the vowel differences in “boot” and “boat”) also differentiate speakers’ voices. Therefore, listeners must normalize across speakers without losing linguistic information. Past behavioral work suggests an important role for auditory contrast enhancement in normalization: preceding context affects listeners’ perception of subsequent speech sounds. Here, using intracranial electrocorticography in humans, we investigate whether and how such context effects arise in auditory cortex. Participants identified speech sounds that were preceded by phrases from two different speakers whose voices differed along the same acoustic dimension as target words (the lowest resonance of the vocal tract). In every participant, target vowels evoke a speaker-dependent neural response that is consistent with the listener’s perception, and which follows from a contrast enhancement model. Auditory cortex processing thus displays a critical feature of normalization, allowing listeners to extract meaningful content from the voices of diverse speakers

    The self-organization of combinatoriality and phonotactics in vocalization systems

    Get PDF
    This paper shows how a society of agents can self-organize a shared vocalization system that is discrete, combinatorial and has a form of primitive phonotactics, starting from holistic inarticulate vocalizations. The originality of the system is that: (1) it does not include any explicit pressure for communication; (2) agents do not possess capabilities of coordinated interactions, in particular they do not play language games; (3) agents possess no specific linguistic capacities; and (4) initially there exists no convention that agents can use. As a consequence, the system shows how a primitive speech code may bootstrap in the absence of a communication system between agents, i.e. before the appearance of language

    From Analogue to Digital Vocalizations

    Get PDF
    Sound is a medium used by humans to carry information. The existence of this kind of medium is a pre-requisite for language. It is organized into a code, called speech, which provides a repertoire of forms that is shared in each language community. This code is necessary to support the linguistic interactions that allow humans to communicate. How then may a speech code be formed prior to the existence of linguistic interactions? Moreover, the human speech code is characterized by several properties: speech is digital and compositional (vocalizations are made of units re-used systematically in other syllables); phoneme inventories have precise regularities as well as great diversity in human languages; all the speakers of a language community categorize sounds in the same manner, but each language has its own system of categorization, possibly very different from every other. How can a speech code with these properties form? These are the questions we will approach in the paper. We will study them using the method of the artificial. We will build a society of artificial agents, and study what mechanisms may provide answers. This will not prove directly what mechanisms were used for humans, but rather give ideas about what kind of mechanism may have been used. This allows us to shape the search space of possible answers, in particular by showing what is sufficient and what is not necessary. The mechanism we present is based on a low-level model of sensory-motor interactions. We show that the integration of certain very simple and non language-specific neural devices allows a population of agents to build a speech code that has the properties mentioned above. The originality is that it pre-supposes neither a functional pressure for communication, nor the ability to have coordinated social interactions (they do not play language or imitation games). It relies on the self-organizing properties of a generic coupling between perception and production both within agents, and on the interactions between agents

    /u/ fronting and /t/ aspiration in Māori and New Zealand English

    Get PDF
    This article examines the relationship between the frontness of /u/ and the aspiration of /t/ in both Māori and New Zealand English (NZE). In both languages, these processes can be observed since the earliest recordings dating from the latter part of the nineteenth century. We report analyses of these developments for three groups of male speakers of Māori spanning the twentieth century. We compare the Māori analyses with analyses of related features of the speakers' English and of the English of monolingual contemporaries. The occurrence of these processes in Māori cannot be seen simply as interference from NZE as the Māori-speaking population became increasingly bilingual. We conclude that it was the arrival of English with its contrast between aspirated and unaspirated plosives, rather than direct borrowing, that was the trigger for the fronting of the hitherto stable back Māori /u/ vowel together with increased aspiration of /t/ before both /i/ and /u/

    Speech vocoding for laboratory phonology

    Get PDF
    Using phonological speech vocoding, we propose a platform for exploring relations between phonology and speech processing, and in broader terms, for exploring relations between the abstract and physical structures of a speech signal. Our goal is to make a step towards bridging phonology and speech processing and to contribute to the program of Laboratory Phonology. We show three application examples for laboratory phonology: compositional phonological speech modelling, a comparison of phonological systems and an experimental phonological parametric text-to-speech (TTS) system. The featural representations of the following three phonological systems are considered in this work: (i) Government Phonology (GP), (ii) the Sound Pattern of English (SPE), and (iii) the extended SPE (eSPE). Comparing GP- and eSPE-based vocoded speech, we conclude that the latter achieves slightly better results than the former. However, GP - the most compact phonological speech representation - performs comparably to the systems with a higher number of phonological features. The parametric TTS based on phonological speech representation, and trained from an unlabelled audiobook in an unsupervised manner, achieves intelligibility of 85% of the state-of-the-art parametric speech synthesis. We envision that the presented approach paves the way for researchers in both fields to form meaningful hypotheses that are explicitly testable using the concepts developed and exemplified in this paper. On the one hand, laboratory phonologists might test the applied concepts of their theoretical models, and on the other hand, the speech processing community may utilize the concepts developed for the theoretical phonological models for improvements of the current state-of-the-art applications

    Engaging the articulators enhances perception of concordant visible speech movements

    Full text link
    PURPOSE This study aimed to test whether (and how) somatosensory feedback signals from the vocal tract affect concurrent unimodal visual speech perception. METHOD Participants discriminated pairs of silent visual utterances of vowels under 3 experimental conditions: (a) normal (baseline) and while holding either (b) a bite block or (c) a lip tube in their mouths. To test the specificity of somatosensory-visual interactions during perception, we assessed discrimination of vowel contrasts optically distinguished based on their mandibular (English /ɛ/-/é/) or labial (English /u/-French /u/) postures. In addition, we assessed perception of each contrast using dynamically articulating videos and static (single-frame) images of each gesture (at vowel midpoint). RESULTS Engaging the jaw selectively facilitated perception of the dynamic gestures optically distinct in terms of jaw height, whereas engaging the lips selectively facilitated perception of the dynamic gestures optically distinct in terms of their degree of lip compression and protrusion. Thus, participants perceived visible speech movements in relation to the configuration and shape of their own vocal tract (and possibly their ability to produce covert vowel production-like movements). In contrast, engaging the articulators had no effect when the speaking faces did not move, suggesting that the somatosensory inputs affected perception of time-varying kinematic information rather than changes in target (movement end point) mouth shapes. CONCLUSIONS These findings suggest that orofacial somatosensory inputs associated with speech production prime premotor and somatosensory brain regions involved in the sensorimotor control of speech, thereby facilitating perception of concordant visible speech movements. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.9911846R01 DC002852 - NIDCD NIH HHSAccepted manuscrip

    Short-and medium-term plasticity for speaker adaptation seem to be independent

    Get PDF
    The author wishes to thank James McQueen and Elizabeth Johnson for comments made on an earlier drafts of this paper.In a classic paper, Ladefoged and Broadbent [1] showed that listeners adapt to speakers based on short-term exposure of a single phrase. Recently, Norris, McQueen, and Cutler [2] presented evidence for a lexically conditioned medium-term adaptation to a particular speaker based on an exposure of 40 critical words among 200 items. In two experiments, I investigated whether there is a connection between the two findings. To this end, a vowel-normalization paradigm (similar to [1]) was used with a carrier phrase that consisted of either words or nonwords. The range of the second formant was manipulated and this affected the perception of a target vowel in a compensatory fashion: A low F2-range made it more likely that a target vowel was perceived as a front vowel, that is, with an inherently high F2. Manipulation of the lexical status of the carrier phrase, however, did not affect vowel normalization. In contrast, the range of vowels in the carrier phrase did influence vowel normalization. If the carrier phrase consisted of high-front vowels only, vowel categories shifted only for high-front vowels. This may indicate that the short-term and medium-term adaptations are brought about by different mechanisms.peer-reviewe
    • 

    corecore