3,789 research outputs found
From Holistic to Discrete Speech Sounds: The Blind Snow-Flake Maker Hypothesis
Sound is a medium used by humans to carry information.
The existence of this kind of
medium is a pre-requisite for language. It is organized
into a code, called speech, which
provides a repertoire of forms that is shared in each
language community. This code is necessary to support the linguistic
interactions that allow humans to communicate.
How then may a speech code be formed prior to the
existence of linguistic interactions?
Moreover, the human speech code is characterized by several
properties: speech is digital and compositional (vocalizations
are made of units re-used systematically in other syllables);
phoneme inventories have precise regularities as well as
great diversity in human languages; all the speakers of a
language community categorize sounds in the same manner,
but each language has its own system of categorization,
possibly very different from every other.
How can a speech code with these properties form?
These are the questions we will approach in the paper. We will
study them using the method of the artificial. We will
build a society of artificial agents, and study what mechanisms
may provide answers. This will not prove directly what mechanisms
were used for humans, but rather give ideas about what kind
of mechanism may have been used. This allows us to shape the
search space of possible answers, in particular by showing
what is sufficient and what is not necessary.
The mechanism we present is based on a low-level model of
sensory-motor interactions. We show that the integration of certain very
simple and non language-specific neural devices
allows a population of agents to build a speech code that
has the properties mentioned above. The originality is
that it pre-supposes neither a functional pressure for
communication, nor the ability to have coordinated
social interactions (they do not play language or imitation
games). It relies on the self-organizing properties of a generic
coupling between perception and production both
within agents, and on the interactions between agents
Perceptual Calibration of F0 Production: Evidence from Feedback Perturbation
Hearing oneâs own speech is important for language learning and maintenance of accurate articulation. For example, people with postlinguistically acquired deafness often show a gradual deterioration of many aspects of speech production. In this manuscript, data are presented that address the role played by acoustic feedback in the control of voice fundamental frequency (F0). Eighteen subjects produced vowels under a control ~normal F0 feedback! and two experimental conditions: F0 shifted up and F0 shifted down. In each experimental condition subjects produced vowels during a training period in which their F0 was slowly shifted without their awareness. Following this exposure to transformed F0, their acoustic feedback was returned to normal. Two effects were observed. Subjects compensated for the change in F0 and showed negative aftereffects. When F0 feedback was returned to normal, the subjects modified their produced F0 in the opposite direction to the shift. The results suggest that fundamental frequency is controlled using auditory feedback and with reference to an internal pitch representation. This is consistent with current work on internal models of speech motor control
(Un)markedness of trills : the case of Slavic r-palatalisation
This paper evaluates trills [r] and their palatalized counterparts [rj] from the point of view of markedness. It is argued that [r]s are unmarked sounds in comparison to [rj]s which follows from the examination of the following parameters: (a) frequency of occurrence, (b) articulatory and aerodynamic characteristics, (c) perceptual features, (d) emergence in the process of language acquisition, (e) stability from a diachronic point of view, (f) phonotactic distribution, and (g) implications. Several markedness aspects of [r]s and [rj] are analyzed on the basis of Slavic languages which offer excellent material for the evaluation of trills. Their phonetic characteristics incorporated into phonetically grounded constraints are employed for a phonological OT-analysis of r-palatalization in two selected languages: Polish and Czech
Speaker-normalized sound representations in the human auditory cortex
The acoustic dimensions that distinguish speech sounds (like the vowel differences in âbootâ and âboatâ) also differentiate speakersâ voices. Therefore, listeners must normalize across speakers without losing linguistic information. Past behavioral work suggests an important role for auditory contrast enhancement in normalization: preceding context affects listenersâ perception of subsequent speech sounds. Here, using intracranial electrocorticography in humans, we investigate whether and how such context effects arise in auditory cortex. Participants identified speech sounds that were preceded by phrases from two different speakers whose voices differed along the same acoustic dimension as target words (the lowest resonance of the vocal tract). In every participant, target vowels evoke a speaker-dependent neural response that is consistent with the listenerâs perception, and which follows from a contrast enhancement model. Auditory cortex processing thus displays a critical feature of normalization, allowing listeners to extract meaningful content from the voices of diverse speakers
The self-organization of combinatoriality and phonotactics in vocalization systems
This paper shows how a society of agents can self-organize a shared vocalization system that is
discrete, combinatorial and has a form of primitive phonotactics, starting from holistic inarticulate
vocalizations. The originality of the system is that: (1) it does not include any explicit pressure for
communication; (2) agents do not possess capabilities of coordinated interactions, in particular they
do not play language games; (3) agents possess no specific linguistic capacities; and (4) initially
there exists no convention that agents can use. As a consequence, the system shows how a primitive
speech code may bootstrap in the absence of a communication system between agents, i.e. before the
appearance of language
From Analogue to Digital Vocalizations
Sound is a medium used by humans to carry information.
The existence of this kind of
medium is a pre-requisite for language. It is organized
into a code, called speech, which
provides a repertoire of forms that is shared in each
language community. This code is necessary to support the linguistic
interactions that allow humans to communicate.
How then may a speech code be formed prior to the
existence of linguistic interactions?
Moreover, the human speech code is characterized by several
properties: speech is digital and compositional (vocalizations
are made of units re-used systematically in other syllables);
phoneme inventories have precise regularities as well as
great diversity in human languages; all the speakers of a
language community categorize sounds in the same manner,
but each language has its own system of categorization,
possibly very different from every other.
How can a speech code with these properties form?
These are the questions we will approach in the paper. We will
study them using the method of the artificial. We will
build a society of artificial agents, and study what mechanisms
may provide answers. This will not prove directly what mechanisms
were used for humans, but rather give ideas about what kind
of mechanism may have been used. This allows us to shape the
search space of possible answers, in particular by showing
what is sufficient and what is not necessary.
The mechanism we present is based on a low-level model of
sensory-motor interactions. We show that the integration of certain very
simple and non language-specific neural devices
allows a population of agents to build a speech code that
has the properties mentioned above. The originality is
that it pre-supposes neither a functional pressure for
communication, nor the ability to have coordinated
social interactions (they do not play language or imitation
games). It relies on the self-organizing properties of a generic
coupling between perception and production both
within agents, and on the interactions between agents
/u/ fronting and /t/ aspiration in MÄori and New Zealand English
This article examines the relationship between the frontness of /u/ and the aspiration of /t/ in both MÄori and New Zealand English (NZE). In both languages, these processes can be observed since the earliest recordings dating from the latter part of the nineteenth century. We report analyses of these developments for three groups of male speakers of MÄori spanning the twentieth century. We compare the MÄori analyses with analyses of related features of the speakers' English and of the English of monolingual contemporaries. The occurrence of these processes in MÄori cannot be seen simply as interference from NZE as the MÄori-speaking population became increasingly bilingual. We conclude that it was the arrival of English with its contrast between aspirated and unaspirated plosives, rather than direct borrowing, that was the trigger for the fronting of the hitherto stable back MÄori /u/ vowel together with increased aspiration of /t/ before both /i/ and /u/
Speech vocoding for laboratory phonology
Using phonological speech vocoding, we propose a platform for exploring
relations between phonology and speech processing, and in broader terms, for
exploring relations between the abstract and physical structures of a speech
signal. Our goal is to make a step towards bridging phonology and speech
processing and to contribute to the program of Laboratory Phonology. We show
three application examples for laboratory phonology: compositional phonological
speech modelling, a comparison of phonological systems and an experimental
phonological parametric text-to-speech (TTS) system. The featural
representations of the following three phonological systems are considered in
this work: (i) Government Phonology (GP), (ii) the Sound Pattern of English
(SPE), and (iii) the extended SPE (eSPE). Comparing GP- and eSPE-based vocoded
speech, we conclude that the latter achieves slightly better results than the
former. However, GP - the most compact phonological speech representation -
performs comparably to the systems with a higher number of phonological
features. The parametric TTS based on phonological speech representation, and
trained from an unlabelled audiobook in an unsupervised manner, achieves
intelligibility of 85% of the state-of-the-art parametric speech synthesis. We
envision that the presented approach paves the way for researchers in both
fields to form meaningful hypotheses that are explicitly testable using the
concepts developed and exemplified in this paper. On the one hand, laboratory
phonologists might test the applied concepts of their theoretical models, and
on the other hand, the speech processing community may utilize the concepts
developed for the theoretical phonological models for improvements of the
current state-of-the-art applications
Engaging the articulators enhances perception of concordant visible speech movements
PURPOSE
This study aimed to test whether (and how) somatosensory feedback signals from the vocal tract affect concurrent unimodal visual speech perception.
METHOD
Participants discriminated pairs of silent visual utterances of vowels under 3 experimental conditions: (a) normal (baseline) and while holding either (b) a bite block or (c) a lip tube in their mouths. To test the specificity of somatosensory-visual interactions during perception, we assessed discrimination of vowel contrasts optically distinguished based on their mandibular (English /É/-/ĂŠ/) or labial (English /u/-French /u/) postures. In addition, we assessed perception of each contrast using dynamically articulating videos and static (single-frame) images of each gesture (at vowel midpoint).
RESULTS
Engaging the jaw selectively facilitated perception of the dynamic gestures optically distinct in terms of jaw height, whereas engaging the lips selectively facilitated perception of the dynamic gestures optically distinct in terms of their degree of lip compression and protrusion. Thus, participants perceived visible speech movements in relation to the configuration and shape of their own vocal tract (and possibly their ability to produce covert vowel production-like movements). In contrast, engaging the articulators had no effect when the speaking faces did not move, suggesting that the somatosensory inputs affected perception of time-varying kinematic information rather than changes in target (movement end point) mouth shapes.
CONCLUSIONS
These findings suggest that orofacial somatosensory inputs associated with speech production prime premotor and somatosensory brain regions involved in the sensorimotor control of speech, thereby facilitating perception of concordant visible speech movements.
SUPPLEMENTAL MATERIAL
https://doi.org/10.23641/asha.9911846R01 DC002852 - NIDCD NIH HHSAccepted manuscrip
Short-and medium-term plasticity for speaker adaptation seem to be independent
The author wishes to thank James McQueen and
Elizabeth Johnson for comments made on an earlier
drafts of this paper.In a classic paper, Ladefoged and Broadbent [1] showed that
listeners adapt to speakers based on short-term exposure of a
single phrase. Recently, Norris, McQueen, and Cutler [2]
presented evidence for a lexically conditioned medium-term
adaptation to a particular speaker based on an exposure of 40
critical words among 200 items. In two experiments, I
investigated whether there is a connection between the two
findings. To this end, a vowel-normalization paradigm
(similar to [1]) was used with a carrier phrase that consisted of
either words or nonwords. The range of the second formant
was manipulated and this affected the perception of a target
vowel in a compensatory fashion: A low F2-range made it
more likely that a target vowel was perceived as a front vowel,
that is, with an inherently high F2. Manipulation of the lexical
status of the carrier phrase, however, did not affect vowel
normalization. In contrast, the range of vowels in the carrier
phrase did influence vowel normalization. If the carrier
phrase consisted of high-front vowels only, vowel categories
shifted only for high-front vowels. This may indicate that the
short-term and medium-term adaptations are brought about by
different mechanisms.peer-reviewe
- âŠ