2,795 research outputs found
A computational model of perceptuo-motor processing in speech perception: learning to imitate and categorize synthetic CV syllables
International audienceThis paper presents COSMO, a Bayesian computational model, which is expressive enough to carry out syllable production, perception and imitation tasks using motor, auditory or perceptuo-motor information. An imitation algorithm enables to learn the articulatory-to-acoustic mapping and the link between syllables and correspond- ing articulatory gestures, from acoustic inputs only: syn- thetic CV syllables generated with a human vocal tract model. We compare purely auditory, purely motor and perceptuo-motor syllable categorization under various noise levels
From Holistic to Discrete Speech Sounds: The Blind Snow-Flake Maker Hypothesis
Sound is a medium used by humans to carry information.
The existence of this kind of
medium is a pre-requisite for language. It is organized
into a code, called speech, which
provides a repertoire of forms that is shared in each
language community. This code is necessary to support the linguistic
interactions that allow humans to communicate.
How then may a speech code be formed prior to the
existence of linguistic interactions?
Moreover, the human speech code is characterized by several
properties: speech is digital and compositional (vocalizations
are made of units re-used systematically in other syllables);
phoneme inventories have precise regularities as well as
great diversity in human languages; all the speakers of a
language community categorize sounds in the same manner,
but each language has its own system of categorization,
possibly very different from every other.
How can a speech code with these properties form?
These are the questions we will approach in the paper. We will
study them using the method of the artificial. We will
build a society of artificial agents, and study what mechanisms
may provide answers. This will not prove directly what mechanisms
were used for humans, but rather give ideas about what kind
of mechanism may have been used. This allows us to shape the
search space of possible answers, in particular by showing
what is sufficient and what is not necessary.
The mechanism we present is based on a low-level model of
sensory-motor interactions. We show that the integration of certain very
simple and non language-specific neural devices
allows a population of agents to build a speech code that
has the properties mentioned above. The originality is
that it pre-supposes neither a functional pressure for
communication, nor the ability to have coordinated
social interactions (they do not play language or imitation
games). It relies on the self-organizing properties of a generic
coupling between perception and production both
within agents, and on the interactions between agents
From Analogue to Digital Vocalizations
Sound is a medium used by humans to carry information.
The existence of this kind of
medium is a pre-requisite for language. It is organized
into a code, called speech, which
provides a repertoire of forms that is shared in each
language community. This code is necessary to support the linguistic
interactions that allow humans to communicate.
How then may a speech code be formed prior to the
existence of linguistic interactions?
Moreover, the human speech code is characterized by several
properties: speech is digital and compositional (vocalizations
are made of units re-used systematically in other syllables);
phoneme inventories have precise regularities as well as
great diversity in human languages; all the speakers of a
language community categorize sounds in the same manner,
but each language has its own system of categorization,
possibly very different from every other.
How can a speech code with these properties form?
These are the questions we will approach in the paper. We will
study them using the method of the artificial. We will
build a society of artificial agents, and study what mechanisms
may provide answers. This will not prove directly what mechanisms
were used for humans, but rather give ideas about what kind
of mechanism may have been used. This allows us to shape the
search space of possible answers, in particular by showing
what is sufficient and what is not necessary.
The mechanism we present is based on a low-level model of
sensory-motor interactions. We show that the integration of certain very
simple and non language-specific neural devices
allows a population of agents to build a speech code that
has the properties mentioned above. The originality is
that it pre-supposes neither a functional pressure for
communication, nor the ability to have coordinated
social interactions (they do not play language or imitation
games). It relies on the self-organizing properties of a generic
coupling between perception and production both
within agents, and on the interactions between agents
Articulatory optimisation in perturbed vowel articulation
A two-week perturbation EMA-experiment was carried out with palatal prostheses. Articulatory effort for five speakers was assessed by means of peak acceleration and jerk during the tongue tip gestures from /t/ towards /i, e, o, y, u/. After a period of no change speakers showed an increase in these values. Towards the end of the experiment the values decreased. The results are interpreted as three phases of carrying out changes in the internal model. At first, the complete production system is shifted in relation to the palatal change, afterwards speakers explore different production mechanisms which involves more articulatory effort. This second phase can be seen as a training phase where several articulatory strategies are explored. In the third phase speakers start to select an optimal movement strategy to produce the sounds so that the values decrease
Phonetic variability and grammatical knowledge: an articulatory study of Korean place assimilation.
The study reported here uses articulatory data to investigate Korean place assimilation
of coronal stops followed by labial or velar stops, both within words and
across words. The results show that this place-assimilation process is highly
variable, both within and across speakers, and is also sensitive to factors such as the
place of articulation of the following consonant, the presence of a word boundary
and, to some extent, speech rate. Gestures affected by the process are generally
reduced categorically (deleted), while sporadic gradient reduction of gestures is
also observed. We further compare the results for coronals to our previous findings
on the assimilation of labials, discussing implications of the results for grammatical
models of phonological/phonetic competence. The results suggest that speakers’
language-particular knowledge of place assimilation has to be relatively
detailed and context-sensitive, and has to encode systematic regularities about its
obligatory/variable application as well as categorical/gradient realisation
Defective neural motor speech mappings as a source for apraxia of speech : evidence from a quantitative neural model of speech processing
This unique resource reviews research evidence pertaining to best practice in the clinical assessment of established areas such as intelligibility and physiological functioning, as well as introducing recently developed topics such as conversational analysis, participation measures, and telehealth. In addition, new and established research methods from areas such as phonetics, kinematics, imaging, and neural modeling are reviewed in relation to their applicability and value for the study of disordered speech. Based on the broad coverage of topics and methods, the textbook represents a valuable resource for a wide ranging audience, including clinicians, researchers, as well as students with an interest in speech pathology and clinical phonetics
The Self-Organization of Speech Sounds
The speech code is a vehicle of language: it defines
a set of forms used by a community to carry information.
Such a code is necessary to support the linguistic
interactions that allow humans to communicate.
How then may a speech code be formed prior to the
existence of linguistic interactions?
Moreover, the human speech code is discrete and compositional,
shared by all the individuals of a community but different
across communities, and phoneme inventories are characterized by
statistical regularities. How can a speech code with these properties form?
We try to approach these questions in the paper,
using the ``methodology of the artificial''. We
build a society of artificial agents, and detail a mechanism that
shows the formation of a discrete speech code without pre-supposing
the existence of linguistic capacities or of coordinated interactions.
The mechanism is based on a low-level model of
sensory-motor interactions. We show that the integration of certain very
simple and non language-specific neural devices
leads to the formation of a speech code that
has properties similar to the human speech code.
This result relies on the self-organizing properties of a generic
coupling between perception and production
within agents, and on the interactions between agents.
The artificial system helps us to develop better intuitions on how speech
might have appeared, by showing how self-organization
might have helped natural selection to find speech
The self-organization of combinatoriality and phonotactics in vocalization systems
This paper shows how a society of agents can self-organize a shared vocalization system that is
discrete, combinatorial and has a form of primitive phonotactics, starting from holistic inarticulate
vocalizations. The originality of the system is that: (1) it does not include any explicit pressure for
communication; (2) agents do not possess capabilities of coordinated interactions, in particular they
do not play language games; (3) agents possess no specific linguistic capacities; and (4) initially
there exists no convention that agents can use. As a consequence, the system shows how a primitive
speech code may bootstrap in the absence of a communication system between agents, i.e. before the
appearance of language
Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI
Vocal tract configurations play a vital role in generating distinguishable
speech sounds, by modulating the airflow and creating different resonant
cavities in speech production. They contain abundant information that can be
utilized to better understand the underlying speech production mechanism. As a
step towards automatic mapping of vocal tract shape geometry to acoustics, this
paper employs effective video action recognition techniques, like Long-term
Recurrent Convolutional Networks (LRCN) models, to identify different
vowel-consonant-vowel (VCV) sequences from dynamic shaping of the vocal tract.
Such a model typically combines a CNN based deep hierarchical visual feature
extractor with Recurrent Networks, that ideally makes the network
spatio-temporally deep enough to learn the sequential dynamics of a short video
clip for video classification tasks. We use a database consisting of 2D
real-time MRI of vocal tract shaping during VCV utterances by 17 speakers. The
comparative performances of this class of algorithms under various parameter
settings and for various classification tasks are discussed. Interestingly, the
results show a marked difference in the model performance in the context of
speech classification with respect to generic sequence or video classification
tasks.Comment: To appear in the INTERSPEECH 2018 Proceeding
- …