222 research outputs found

    Network dynamics in the neural control of birdsong

    Full text link
    Sequences of stereotyped actions are central to the everyday lives of humans and animals, from the kingfisher's dive to the performance of a piano concerto. Lashley asked how neural circuits managed this feat nearly 6 decades ago, and to this day it remains a fundamental question in neuroscience. Toward answering this question, vocal performance in the songbird was used as a model to study the performance of learned, stereotyped motor sequences. The first component of this work considers the song motor cortical zone HVC in the zebra finch, an area that sends precise timing signals to both the descending motor pathway, responsible for stereotyped vocal performance in the adult, and the basal ganglia, which is responsible for both motor variability and song learning. Despite intense interest in HVC, previous research has exclusively focused on describing the activity of small numbers of neurons recorded serially as the bird sings. To better understand HVC network dynamics, both single units and local field potentials were sampled across multiple electrodes simultaneously in awake behaving zebra finches. The local field potential and spiking data reveal a stereotyped spatio-temporal pattern of inhibition operating on a 30 ms time-scale that coordinates the neural sequences in principal cells underlying song. The second component addresses the resilience of the song circuit through cutting the motor cortical zone HVC in half along one axis. Despite this large-scale perturbation, the finch quickly recovers and sings a near-perfect song within a single day. These first two studies suggest that HVC is functionally organized to robustly generate neural dynamics that enable vocal performance. The final component concerns a statistical study of the complex, flexible songs of the domesticated canary. This study revealed that canary song is characterized by specific long-range correlations up to 7 seconds long-a time-scale more typical of human music than animal vocalizations. Thus, the neural sequences underlying birdsong must be capable of generating more structure and complexity than previously thought

    Hearing the Moment: Measures and Models of the Perceptual Centre

    Get PDF
    The perceptual centre (P-centre) is the hypothetical specific moment at which a brief event is perceived to occur. Several P-centre models are described in the literature and the first collective implementation and rigorous evaluation of these models using a common corpus is described in this thesis, thus addressing a significant open question: which model should one use? The results indicate that none of the models reliably handles all sound types. Possibly this is because the data for model development are too sparse, because inconsistent measurement methods have been used, or because the assumptions underlying the measurement methods are untested. To address this, measurement methods are reviewed and two of them, rhythm adjustment and tap asynchrony, are evaluated alongside a new method based on the phase correction response (PCR) in a synchronized tapping task. Rhythm adjustment and the PCR method yielded consistent P-centre estimates and showed no evidence of P-centre context dependence. Moreover, the PCR method appears most time efficient for generating accurate P-centre estimates. Additionally, the magnitude of the PCR is shown to vary systematically with the onset complexity of speech sounds, which presumably reflects the perceived clarity of a sound’s P-centre. The ideal outcome of any P-centre measurement technique is to detect the true moment of perceived event occurrence. To this end a novel P-centre measurement method, based on auditory evoked potentials, is explored as a possible objective alternative to the conventional approaches examined earlier. The results are encouraging and suggest that a neuroelectric correlate of the P-centre does exist, thus opening up a new avenue of P-centre research. Finally, an up to date and comprehensive review of the P-centre is included, integrating recent findings and reappraising previous research. The main open questions are identified, particularly those most relevant to P-centre modelling

    Learning [Voice]

    Get PDF
    The [voice] distinction between homorganic stops and fricatives is made by a number of acoustic correlates including voicing, segment duration, and preceding vowel duration. The present work looks at [voice] from a number of multidimensional perspectives. This dissertation\u27s focus is a corpus study of the phonetic realization of [voice] in two English-learning infants aged 1;1--3;5. While preceding vowel duration has been studied before in infants, the other correlates of post-vocalic voicing investigated here --- preceding F1, consonant duration, and closure voicing intensity --- had not been measured before in infant speech. The study makes empirical contributions regarding the development of the production of [voice] in infants, not just from a surface-level perspective but also with implications for the phonetics-phonology interface in the adult and developing linguistic systems. Additionally, several methodological contributions will be made in the use of large sized corpora and data modeling techniques. The study revealed that even in infants, F1 at the midpoint of a vowel preceding a voiced consonant was lower by roughly 50 Hz compared to a vowel before a voiceless consonant, which is in line with the effect found in adults. But while the effect has been considered most likely to be a physiological and nonlinguistic phenomenon in adults, it actually appeared to be correlated in the wrong direction with other aspects of [voice] here, casting doubt on a physiological explanation. Some of the consonant pairs had statistically significant differences in duration and closure voicing. Additionally, a preceding vowel duration difference was found and as well a preliminary indication of a developmental trend that suggests the preceding vowel duration difference is being learned. The phonetics of adult speech is also considered. Results are presented from a dialectal corpus study of North American English and a lab speech experiment which clarifies the relationship between preceding vowel duration and flapping and the relationship between [voice] and F1 in preceding vowels. Fluent adult speech is also described and machine learning algorithms are applied to learning the [voice] distinction using multidimensional acoustic input plus some lexical knowledge

    Measuring perceptual centers using the phase correction response

    Get PDF
    The perceptual center (P-center) is fundamental to the timing of heterogeneous event sequences, including music and speech. Unfortunately, there is currently no comprehensive and reliable model of P-centers in acoustic events, so P-centers must instead be measured empirically. This study reviews existing measurement methods and evaluates two methods in detail—the rhythm adjustment method and a new method based on the phase correction response (PCR) in a synchronous tapping task. The two methods yielded consistent P-center estimates and showed no evidence of P-center context dependence. The PCR method appears promising because it is accurate and efficient and does not require explicit perceptual judgments. As a secondary result, the magnitude of the PCR is shown to vary systematically with the onset complexity of speech sounds,which presumably reflects the perceived clarity of a sound’s P-center

    Syllable-based constraints on properties of English sounds

    Get PDF
    Also issued as Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1989.Includes bibliographical references (p. 169-174).Work sponsored in part by the Office of Naval Research. N00014-82-K-0727Mark A. Randolph

    Functional timing or rhythmical timing, or both? A corpus study of English and Mandarin duration

    Get PDF
    It has been long held that languages of the world are divided into rhythm classes so that they are either stress-timed, syllable-timed or mora-timed. It is also known for a long time that duration serves various informational functions in speech. But it is unclear whether these two kinds of uses of duration are complementary to each other, or they are actually one and the same. There has been much empirical research that raises questions about the rhythm class hypothesis due to lack of evidence of the suggested isochrony in any language. Yet the alleged cross-language rhythm classification is still widely taken for granted and continues to be researched. Here we conducted a corpus study of English, an archetype of a stress-timed language, and Mandarin, an alleged syllable-timed language, to look for evidence of at least a tendency toward isochrony when much of the informational use of duration is controlled for. We examined the relationship between segment and syllable duration and the relationship of syllable and phrase duration in the two languages. The results show that in English syllables are largely incompressible to allow stress-timing because segment duration is inflexible to allow variable syllable duration beyond its functional use. Surprisingly, Mandarin does show a small tendency toward both equal syllable duration and equal phrase duration. Additionally, the duration of pre-boundary syllables in English increases linearly with break index, whereas in Mandarin, the duration increase stops after break index 2, which is accompanied by the insertion of silent pauses. We conclude, therefore, timing and duration in speech are predominantly used for encoding information rather being controlled by a rhythmic principle, and the residual equal-duration tendency in the two languages examined here show exactly the opposite patterns from the predictions of the rhythm class hypothesis

    Neural mechanisms of early motor control in the vocal behavior of juvenile songbirds

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 201-211).An infant reaches out for her new toy, struggling day after day to simply grasp her fingers around it. A few years later, she hits a tennis serve, perfect in the timing of its intricately choreographed movements. How does a young brain learn to use the muscles it controls, to properly coordinate motor gestures into complex behavioral sequences? To a surprising extent, for many advanced vertebrate behaviors this knowledge is neither innately programmed nor acquired via deterministic developmental rules, but must be learned through trial-and-error exploration. This thesis is an investigation of the neural mechanisms that underlie the production and maturation of one exploratory behavior - the babbling, or subsong, of a juvenile zebra finch. Using lesions and inactivations of brain areas in the song system, I identified neural circuits involved in the production of subsong. Remarkably, subsong did not require the high vocal center (HVC) - a premotor structure long known as the key region for controlling singing in adult birds - but did require the lateral magnocellular nucleus of the nidopallium (LMAN) - the output region of basal ganglia-forebrain circuitry previously considered a modulatory area. Recordings in LMAN during subsong revealed premotor activity related to the vocal output on a fast timescale. These results show, for the first time, the existence of a specialized circuit for driving exploratory motor control, distinct from the one that produces the adult behavior. The existence of two neural pathways for singing has raised the question of how motor control is transferred from one pathway to another and, in particular, how the control of song timing develops in these pathways. I found that early singing can be decomposed into mechanistically distinct "modes" of syllable and silent gap timing - randomly-timed modes that are LMAN-dependent and developmentally-acquired, consistently-timed modes that are HVCdependent. Combining acoustic analysis with respiratory measurements, I found that the consistently-timed mode in gap durations is formed by brief inspiratory pressure pulses, indicating an early involvement of HVC in coordinating singing with respiration. Using mild localized cooling - a manipulation that slows down biophysical processes in a targeted brain area - I found that the circuit dynamics intrinsic to HVC and LMAN are actively involved in controlling the timescales of distinct behavioral modes. In summary, this work demonstrates the existence of two motor circuits in the song system. These circuits are specialized for the generation of distinct types of neural dynamics - random exploratory dynamics, which are dominant early in life, and stereotyped sequential dynamics, which become dominant during development. Characterization of behaviorally-relevant dynamics produced by neural circuits may be a general framework for understanding motor control and learning.by Dmitriy Aronov.Ph.D

    Prosody generation for text-to-speech synthesis

    Get PDF
    The absence of convincing intonation makes current parametric speech synthesis systems sound dull and lifeless, even when trained on expressive speech data. Typically, these systems use regression techniques to predict the fundamental frequency (F0) frame-by-frame. This approach leads to overlysmooth pitch contours and fails to construct an appropriate prosodic structure across the full utterance. In order to capture and reproduce larger-scale pitch patterns, we propose a template-based approach for automatic F0 generation, where per-syllable pitch-contour templates (from a small, automatically learned set) are predicted by a recurrent neural network (RNN). The use of syllable templates mitigates the over-smoothing problem and is able to reproduce pitch patterns observed in the data. The use of an RNN, paired with connectionist temporal classification (CTC), enables the prediction of structure in the pitch contour spanning the entire utterance. This novel F0 prediction system is used alongside separate LSTMs for predicting phone durations and the other acoustic features, to construct a complete text-to-speech system. Later, we investigate the benefits of including long-range dependencies in duration prediction at frame-level using uni-directional recurrent neural networks. Since prosody is a supra-segmental property, we consider an alternate approach to intonation generation which exploits long-term dependencies of F0 by effective modelling of linguistic features using recurrent neural networks. For this purpose, we propose a hierarchical encoder-decoder and multi-resolution parallel encoder where the encoder takes word and higher level linguistic features at the input and upsamples them to phone-level through a series of hidden layers and is integrated into a Hybrid system which is then submitted to Blizzard challenge workshop. We then highlight some of the issues in current approaches and a plan for future directions of investigation is outlined along with on-going work

    Seeing sound: a new way to illustrate auditory objects and their neural correlates

    Full text link
    This thesis develops a new method for time-frequency signal processing and examines the relevance of the new representation in studies of neural coding in songbirds. The method groups together associated regions of the time-frequency plane into objects defined by time-frequency contours. By combining information about structurally stable contour shapes over multiple time-scales and angles, a signal decomposition is produced that distributes resolution adaptively. As a result, distinct signal components are represented in their own most parsimonious forms.  Next, through neural recordings in singing birds, it was found that activity in song premotor cortex is significantly correlated with the objects defined by this new representation of sound. In this process, an automated way of finding sub-syllable acoustic transitions in birdsongs was first developed, and then increased spiking probability was found at the boundaries of these acoustic transitions. Finally, a new approach to study auditory cortical sequence processing more generally is proposed. In this approach, songbirds were trained to discriminate Morse-code-like sequences of clicks, and the neural correlates of this behavior were examined in primary and secondary auditory cortex. It was found that a distinct transformation of auditory responses to the sequences of clicks exists as information transferred from primary to secondary auditory areas. Neurons in secondary auditory areas respond asynchronously and selectively -- in a manner that depends on the temporal context of the click. This transformation from a temporal to a spatial representation of sound provides a possible basis for the songbird's natural ability to discriminate complex temporal sequences
    • …
    corecore