17,187 research outputs found

    The Self-Organization of Speech Sounds

    Get PDF
    The speech code is a vehicle of language: it defines a set of forms used by a community to carry information. Such a code is necessary to support the linguistic interactions that allow humans to communicate. How then may a speech code be formed prior to the existence of linguistic interactions? Moreover, the human speech code is discrete and compositional, shared by all the individuals of a community but different across communities, and phoneme inventories are characterized by statistical regularities. How can a speech code with these properties form? We try to approach these questions in the paper, using the ``methodology of the artificial''. We build a society of artificial agents, and detail a mechanism that shows the formation of a discrete speech code without pre-supposing the existence of linguistic capacities or of coordinated interactions. The mechanism is based on a low-level model of sensory-motor interactions. We show that the integration of certain very simple and non language-specific neural devices leads to the formation of a speech code that has properties similar to the human speech code. This result relies on the self-organizing properties of a generic coupling between perception and production within agents, and on the interactions between agents. The artificial system helps us to develop better intuitions on how speech might have appeared, by showing how self-organization might have helped natural selection to find speech

    From Holistic to Discrete Speech Sounds: The Blind Snow-Flake Maker Hypothesis

    Get PDF
    Sound is a medium used by humans to carry information. The existence of this kind of medium is a pre-requisite for language. It is organized into a code, called speech, which provides a repertoire of forms that is shared in each language community. This code is necessary to support the linguistic interactions that allow humans to communicate. How then may a speech code be formed prior to the existence of linguistic interactions? Moreover, the human speech code is characterized by several properties: speech is digital and compositional (vocalizations are made of units re-used systematically in other syllables); phoneme inventories have precise regularities as well as great diversity in human languages; all the speakers of a language community categorize sounds in the same manner, but each language has its own system of categorization, possibly very different from every other. How can a speech code with these properties form? These are the questions we will approach in the paper. We will study them using the method of the artificial. We will build a society of artificial agents, and study what mechanisms may provide answers. This will not prove directly what mechanisms were used for humans, but rather give ideas about what kind of mechanism may have been used. This allows us to shape the search space of possible answers, in particular by showing what is sufficient and what is not necessary. The mechanism we present is based on a low-level model of sensory-motor interactions. We show that the integration of certain very simple and non language-specific neural devices allows a population of agents to build a speech code that has the properties mentioned above. The originality is that it pre-supposes neither a functional pressure for communication, nor the ability to have coordinated social interactions (they do not play language or imitation games). It relies on the self-organizing properties of a generic coupling between perception and production both within agents, and on the interactions between agents

    From Analogue to Digital Vocalizations

    Get PDF
    Sound is a medium used by humans to carry information. The existence of this kind of medium is a pre-requisite for language. It is organized into a code, called speech, which provides a repertoire of forms that is shared in each language community. This code is necessary to support the linguistic interactions that allow humans to communicate. How then may a speech code be formed prior to the existence of linguistic interactions? Moreover, the human speech code is characterized by several properties: speech is digital and compositional (vocalizations are made of units re-used systematically in other syllables); phoneme inventories have precise regularities as well as great diversity in human languages; all the speakers of a language community categorize sounds in the same manner, but each language has its own system of categorization, possibly very different from every other. How can a speech code with these properties form? These are the questions we will approach in the paper. We will study them using the method of the artificial. We will build a society of artificial agents, and study what mechanisms may provide answers. This will not prove directly what mechanisms were used for humans, but rather give ideas about what kind of mechanism may have been used. This allows us to shape the search space of possible answers, in particular by showing what is sufficient and what is not necessary. The mechanism we present is based on a low-level model of sensory-motor interactions. We show that the integration of certain very simple and non language-specific neural devices allows a population of agents to build a speech code that has the properties mentioned above. The originality is that it pre-supposes neither a functional pressure for communication, nor the ability to have coordinated social interactions (they do not play language or imitation games). It relies on the self-organizing properties of a generic coupling between perception and production both within agents, and on the interactions between agents

    The self-organization of combinatoriality and phonotactics in vocalization systems

    Get PDF
    This paper shows how a society of agents can self-organize a shared vocalization system that is discrete, combinatorial and has a form of primitive phonotactics, starting from holistic inarticulate vocalizations. The originality of the system is that: (1) it does not include any explicit pressure for communication; (2) agents do not possess capabilities of coordinated interactions, in particular they do not play language games; (3) agents possess no specific linguistic capacities; and (4) initially there exists no convention that agents can use. As a consequence, the system shows how a primitive speech code may bootstrap in the absence of a communication system between agents, i.e. before the appearance of language

    Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State Vowel Identification

    Full text link
    Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. Such a transformation enables speech to be understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitchindependent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State vowel Categorization

    Full text link
    Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    Neural Dynamics of Phonetic Trading Relations for Variable-Rate CV Syllables

    Full text link
    The perception of CV syllables exhibits a trading relationship between voice onset time (VOT) of a consonant and duration of a vowel. Percepts of [ba] and [wa] can, for example, depend on the durations of the consonant and vowel segments, with an increase in the duration of the subsequent vowel switching the percept of the preceding consonant from [w] to [b]. A neural model, called PHONET, is proposed to account for these findings. In the model, C and V inputs are filtered by parallel auditory streams that respond preferentially to transient and sustained properties of the acoustic signal, as in vision. These streams are represented by working memories that adjust their processing rates to cope with variable acoustic input rates. More rapid transient inputs can cause greater activation of the transient stream which, in turn, can automatically gain control the processing rate in the sustained stream. An invariant percept obtains when the relative activations of C and V representations in the two streams remain uncha.nged. The trading relation may be simulated as a result of how different experimental manipulations affect this ratio. It is suggested that the brain can use duration of a subsequent vowel to make the [b]/[w] distinction because the speech code is a resonant event that emerges between working mernory activation patterns and the nodes that categorize them.Advanced Research Projects Agency (90-0083); Air Force Office of Scientific Reseearch (F19620-92-J-0225); Pacific Sierra Research Corporation (91-6075-2

    A Constructive Model of Mother-Infant Interaction towards Infant’s Vowel Articulation

    Get PDF
    Human infants seem to develop to acquire common phonemes to adults without the capability to articulate or any explicit knowledge. To understand such unrevealed human cognitive development, building a robot which reproduces such a developmental process seems effective. It will also contribute to a design principle for a robot that can communicate with human beings. This paper hypothesizes that the caregiver’s parrotry to the coo of the robot plays an important role in the phoneme acquisition process based on the implication from behavioral studies, and propose a constructive model for it. We validate the proposed model by examining whether a real robot can acquire Japanese vowels through interactions with its caregiver

    A Neural Model for Self Organizing Feature Detectors and Classifiers in a Network Hierarchy

    Full text link
    Many models of early cortical processing have shown how local learning rules can produce efficient, sparse-distributed codes in which nodes have responses that are statistically independent and low probability. However, it is not known how to develop a useful hierarchical representation, containing sparse-distributed codes at each level of the hierarchy, that incorporates predictive feedback from the environment. We take a step in that direction by proposing a biologically plausible neural network model that develops receptive fields, and learns to make class predictions, with or without the help of environmental feedback. The model is a new type of predictive adaptive resonance theory network called Receptive Field ARTMAP, or RAM. RAM self organizes internal category nodes that are tuned to activity distributions in topographic input maps. Each receptive field is composed of multiple weight fields that are adapted via local, on-line learning, to form smooth receptive ftelds that reflect; the statistics of the activity distributions in the input maps. When RAM generates incorrect predictions, its vigilance is raised, amplifying subtractive inhibition and sharpening receptive fields until the error is corrected. Evaluation on several classification benchmarks shows that RAM outperforms a related (but neurally implausible) model called Gaussian ARTMAP, as well as several standard neural network and statistical classifters. A topographic version of RAM is proposed, which is capable of self organizing hierarchical representations. Topographic RAM is a model for receptive field development at any level of the cortical hierarchy, and provides explanations for a variety of perceptual learning data.Defense Advanced Research Projects Agency and Office of Naval Research (N00014-95-1-0409

    Complex systems and the history of the English language

    Get PDF
    Complexity theory (Mitchell 2009, Kretzschmar 2009) is something that historical linguists not only can use but should use in order to improve the relationship between the speech we observe in historical settings and the generalizations we make from it. Complex systems, as described in physics, ecology, and many other sciences, are made up of massive numbers of components interacting with one another, and this results in self-organization and emergent order. For speech, the “components” of a complex system are all of the possible variant realizations of linguistic features as they are deployed by human agents, speakers and writers. The order that emerges in speech is simply the fact that our use of words and other linguistic features is significantly clustered in the spatial and social and textual groups in which we actually communicate. Order emerges from such systems by means of self-organization, but the order that arises from speech is not the same as what linguists study under the rubric of linguistic structure. In both texts and regional/social groups, the frequency distribution of features occurs as the same pattern: an asymptotic hyperbolic curve (or “A-curve”). Formal linguistic systems, grammars, are thus not the direct result of the complex system, and historical linguists must use complexity to mediate between the language production observed in the community and the grammars we describe. The history of the English language does not proceed as regularly as like clockwork, and an understanding of complex systems helps us to see why and how, and suggests what we can do about it. First, the scaling property of complex systems tells us that there are no representative speakers, and so our observation of any small group of speakers is unlikely to represent any group at a larger scale—and limited evidence is the necessary condition of many of our historical studies. The fact that underlying complex distributions follow the 80/20 rule, i.e. 80% of the word tokens in a data set will be instances of only 20% of the word types, while the other 80% of the word types will amount to only 20% of the tokens, gives us an effective tool for estimating the status of historical states of the language. Such a frequency-based technique is opposed to the typological “fit” technique that relies on a few texts that can be reliably located in space, and which may not account for the crosscutting effects of text type, another dimension in which the 80/20 rule applies. Besides issues of sampling, the frequency-based approach also affects how we can think about change. The A-curve immediately translates to the S-curve now used to describe linguistic change, and explains that “change” cannot reasonably be considered to be a qualitative shift. Instead, we can use to model of “punctuated equilibrium” from evolutionary biology (e.g., see Gould and Eldredge 1993), which suggests that multiple changes occur simultaneously and compete rather than the older idea of “phyletic gradualism” in evolution that corresponds to the traditional method of historical linguistics. The Great Vowel Shift, for example, is a useful overall generalization, but complex systems and punctuated equilibrium explain why we should not expect it ever to be “complete” or to appear in the same form in different places. These applications of complexity can help us to understand and interpret our existing studies better, and suggest how new studies in the history of the English language can be made more valid and reliable
    • …
    corecore