7,838 research outputs found

    A Dynamic Approach to Rhythm in Language: Toward a Temporal Phonology

    Full text link
    It is proposed that the theory of dynamical systems offers appropriate tools to model many phonological aspects of both speech production and perception. A dynamic account of speech rhythm is shown to be useful for description of both Japanese mora timing and English timing in a phrase repetition task. This orientation contrasts fundamentally with the more familiar symbolic approach to phonology, in which time is modeled only with sequentially arrayed symbols. It is proposed that an adaptive oscillator offers a useful model for perceptual entrainment (or `locking in') to the temporal patterns of speech production. This helps to explain why speech is often perceived to be more regular than experimental measurements seem to justify. Because dynamic models deal with real time, they also help us understand how languages can differ in their temporal detail---contributing to foreign accents, for example. The fact that languages differ greatly in their temporal detail suggests that these effects are not mere motor universals, but that dynamical models are intrinsic components of the phonological characterization of language.Comment: 31 pages; compressed, uuencoded Postscrip

    A Relational Event Approach to Modeling Behavioral Dynamics

    Full text link
    This chapter provides an introduction to the analysis of relational event data (i.e., actions, interactions, or other events involving multiple actors that occur over time) within the R/statnet platform. We begin by reviewing the basics of relational event modeling, with an emphasis on models with piecewise constant hazards. We then discuss estimation for dyadic and more general relational event models using the relevent package, with an emphasis on hands-on applications of the methods and interpretation of results. Statnet is a collection of packages for the R statistical computing system that supports the representation, manipulation, visualization, modeling, simulation, and analysis of relational data. Statnet packages are contributed by a team of volunteer developers, and are made freely available under the GNU Public License. These packages are written for the R statistical computing environment, and can be used with any computing platform that supports R (including Windows, Linux, and Mac).

    Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State Vowel Identification

    Full text link
    Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. Such a transformation enables speech to be understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitchindependent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State vowel Categorization

    Full text link
    Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    Moving in time: simulating how neural circuits enable rhythmic enactment of planned sequences

    Full text link
    Many complex actions are mentally pre-composed as plans that specify orderings of simpler actions. To be executed accurately, planned orderings must become active in working memory, and then enacted one-by-one until the sequence is complete. Examples include writing, typing, and speaking. In cases where the planned complex action is musical in nature (e.g. a choreographed dance or a piano melody), it appears to be possible to deploy two learned sequences at the same time, one composed from actions and a second composed from the time intervals between actions. Despite this added complexity, humans readily learn and perform rhythm-based action sequences. Notably, people can learn action sequences and rhythmic sequences separately, and then combine them with little trouble (Ullén & Bengtsson 2003). Related functional MRI data suggest that there are distinct neural regions responsible for the two different sequence types (Bengtsson et al. 2004). Although research on musical rhythm is extensive, few computational models exist to extend and inform our understanding of its neural bases. To that end, this article introduces the TAMSIN (Timing And Motor System Integration Network) model, a systems-level neural network model capable of performing arbitrary item sequences in accord with any rhythmic pattern that can be represented as a sequence of integer multiples of a base interval. In TAMSIN, two Competitive Queuing (CQ) modules operate in parallel. One represents and controls item order (the ORD module) and the second represents and controls the sequence of inter-onset-intervals (IOIs) that define a rhythmic pattern (RHY module). Further circuitry helps these modules coordinate their signal processing to enable performative output consistent with a desired beat and tempo.Accepted manuscrip

    Transient Information Flow in a Network of Excitatory and Inhibitory Model Neurons: Role of Noise and Signal Autocorrelation

    Get PDF
    We investigate the performance of sparsely-connected networks of integrate-and-fire neurons for ultra-short term information processing. We exploit the fact that the population activity of networks with balanced excitation and inhibition can switch from an oscillatory firing regime to a state of asynchronous irregular firing or quiescence depending on the rate of external background spikes. We find that in terms of information buffering the network performs best for a moderate, non-zero, amount of noise. Analogous to the phenomenon of stochastic resonance the performance decreases for higher and lower noise levels. The optimal amount of noise corresponds to the transition zone between a quiescent state and a regime of stochastic dynamics. This provides a potential explanation on the role of non-oscillatory population activity in a simplified model of cortical micro-circuits.Comment: 27 pages, 7 figures, to appear in J. Physiology (Paris) Vol. 9

    Example Based Caricature Synthesis

    Get PDF
    The likeness of a caricature to the original face image is an essential and often overlooked part of caricature production. In this paper we present an example based caricature synthesis technique, consisting of shape exaggeration, relationship exaggeration, and optimization for likeness. Rather than relying on a large training set of caricature face pairs, our shape exaggeration step is based on only one or a small number of examples of facial features. The relationship exaggeration step introduces two definitions which facilitate global facial feature synthesis. The first is the T-Shape rule, which describes the relative relationship between the facial elements in an intuitive manner. The second is the so called proportions, which characterizes the facial features in a proportion form. Finally we introduce a similarity metric as the likeness metric based on the Modified Hausdorff Distance (MHD) which allows us to optimize the configuration of facial elements, maximizing likeness while satisfying a number of constraints. The effectiveness of our algorithm is demonstrated with experimental results

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output
    corecore