46 research outputs found

    Changes in neuronal representations of consonants in the ascending auditory system and their role in speech recognition

    Get PDF
    A fundamental task of the ascending auditory system is to produce representations that facilitate the recognition of complex sounds. This is particularly challenging in the context of acoustic variability, such as that between different talkers producing the same phoneme. These representations are transformed as information is propagated throughout the ascending auditory system from the inner ear to the auditory cortex (AI). Investigating these transformations and their role in speech recognition is key to understanding hearing impairment and the development of future clinical interventions. Here, we obtained neural responses to an extensive set of natural vowel-consonant-vowel phoneme sequences, each produced by multiple talkers, in three stages of the auditory processing pathway. Auditory nerve (AN) representations were simulated using a model of the peripheral auditory system and extracellular neuronal activity was recorded in the inferior colliculus (IC) and primary auditory cortex (AI) of anaesthetized guinea pigs. A classifier was developed to examine the efficacy of these representations for recognizing the speech sounds. Individual neurons convey progressively less information from AN to AI. Nonetheless, at the population level, representations are sufficiently rich to facilitate recognition of consonants with a high degree of accuracy at all stages indicating a progression from a dense, redundant representation to a sparse, distributed one. We examined the timescale of the neural code for consonant recognition and found that optimal timescales increase throughout the ascending auditory system from a few milliseconds in the periphery to several tens of milliseconds in the cortex. Despite these longer timescales, we found little evidence to suggest that representations up to the level of AI become increasingly invariant to across-talker differences. Instead, our results support the idea that the role of the subcortical auditory system is one of dimensionality expansion, which could provide a basis for flexible classification of arbitrary speech sounds

    A Parametric Sound Object Model for Sound Texture Synthesis

    Get PDF
    This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed

    Statistical methods for sparse functional object data: elastic curves, shapes and densities

    Get PDF
    Many applications naturally yield data that can be viewed as elements in non-linear spaces. Consequently, there is a need for non-standard statistical methods capable of handling such data. The work presented here deals with the analysis of data in complex spaces derived from functional L2-spaces as quotient spaces (or subsets of such spaces). These data types include elastic curves represented as d-dimensional functions modulo re-parametrization, planar shapes represented as 2-dimensional functions modulo rotation, scaling and translation, and elastic planar shapes combining all of these invariances. Moreover, also probability densities can be thought of as non-negative functions modulo scaling. Since these functional object data spaces lack a natural Hilbert space structure, this work proposes specialized methods that integrate techniques from functional data analysis with those for metric and manifold data. In particular, but not exclusively, novel regression methods for specific metric quotient spaces are discussed. Special attention is given to handling discrete observations, since in practice curves and shapes are typically observed only as a discrete (often sparse or irregular) set of points. Similarly, density functions are usually not directly observed, but a (small) sample from the corresponding probability distribution is available. Overall, this work comprises six contributions that propose new methods for sparse functional object data and apply them to relevant real-world datasets, predominantly in a biomedical context

    Characteristic time courses of electrocorticographic signals during speech

    Get PDF
    Electrophysiology has produced a wealth of information concerning characteristic patterns of neural activity underlying movement control in non-human primates. Such patterns differentiate functional classes of neurons and illuminate neural computations underlying different stages of motor planning and execution. The scarcity of high-resolution electrophysiological recordings in humans has hindered such descriptions of brain activity during uniquely human acts such as speech production. The goal of this dissertation was to identify and quantitatively characterize canonical temporal profiles of neural activity measured using surface and depth electrocorticography electrodes while pre-surgical epilepsy patients read aloud monosyllabic utterances. An unsupervised iterative clustering procedure was combined with a novel Kalman filter-based trend analysis to identify characteristic activity time courses that occurred across multiple subjects. A nonlinear distance measure was used to emphasize similarity at key portions of the activity profiles, including signal peaks. Eight canonical activity patterns were identified. These activity profiles fell broadly into two classes: symmetric profiles in which activity rises and falls at approximately the same rate, and ramp profiles in which activity rises relatively quickly and falls off gradually. Distinct characteristic time courses were found during four different task stages: early processing of the orthographic stimulus, phonological-to-motor processing, motor execution, and auditory processing of self-produced speech, with activity offset ramps in earlier stages approximately matching activity onset rates in later stages. The addition of an anatomical constraint to the distance measure to encourage clusters to form within local brain regions did not significantly change results. The anatomically constrained results showed a further subdivision of the eight canonical activity patterns, with the subdivisions primarily stemming from sub-clusters that are anatomically distinct across different brain regions, but maintained the base activity pattern of their parent cluster from the analysis without the anatomically constrained distance measure. The analysis tools developed herein provide a powerful means for identifying and quantitatively characterizing the neural computations underlying human speech production and may apply to other cognitive and behavioral domains

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

    Temporal integration of loudness as a function of level

    Get PDF

    Complex systems approach to natural language

    Full text link
    The review summarizes the main methodological concepts used in studying natural language from the perspective of complexity science and documents their applicability in identifying both universal and system-specific features of language in its written representation. Three main complexity-related research trends in quantitative linguistics are covered. The first part addresses the issue of word frequencies in texts and demonstrates that taking punctuation into consideration restores scaling whose violation in the Zipf's law is often observed for the most frequent words. The second part introduces methods inspired by time series analysis, used in studying various kinds of correlations in written texts. The related time series are generated on the basis of text partition into sentences or into phrases between consecutive punctuation marks. It turns out that these series develop features often found in signals generated by complex systems, like long-range correlations or (multi)fractal structures. Moreover, it appears that the distances between punctuation marks comply with the discrete variant of the Weibull distribution. In the third part, the application of the network formalism to natural language is reviewed, particularly in the context of the so-called word-adjacency networks. Parameters characterizing topology of such networks can be used for classification of texts, for example, from a stylometric perspective. Network approach can also be applied to represent the organization of word associations. Structure of word-association networks turns out to be significantly different from that observed in random networks, revealing genuine properties of language. Finally, punctuation seems to have a significant impact not only on the language's information-carrying ability but also on its key statistical properties, hence it is recommended to consider punctuation marks on a par with words.Comment: 113 pages, 49 figure
    corecore