3,521 research outputs found

    Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State vowel Categorization

    Full text link
    Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    Listening-Mode-Centered Sonification Design for Data Exploration

    Get PDF
    Grond F. Listening-Mode-Centered Sonification Design for Data Exploration. Bielefeld: Bielefeld University; 2013.From the Introduction to this thesis: Through the ever growing amount of data and the desire to make them accessible to the user through the sense of listening, sonification, the representation of data by using sound has been subject of active research in the computer sciences and the field of HCI for the last 20 years. During this time, the field of sonification has diversified into different application areas: today, sound in auditory display informs the user about states and actions on the desktop and in mobile devices; sonification has been applied in monitoring applications, where sound can range from being informative to alarming; sonification has been used to give sensory feedback in order to close the action and perception loop; last but not least, sonifications have also been developed for exploratory data analysis, where sound is used to represent data with unknown structures for hypothesis building. Coming from the computer sciences and HCI, the conceptualization of sonification has been mostly driven by application areas. On the other hand, the sonic arts who have always contributed to the community of auditory display have a genuine focus on sound. Despite this close interdisciplinary relation of communities of sound practitioners, a rich and sound- (or listening)-centered concept about sonification is still missing as a point of departure for a more application and task overarching approach towards design guidelines. Complementary to the useful organization along fields of applications, a conceptual framework that is proper to sound needs to abstract from applications and also to some degree from tasks, as both are not directly related to sound. I hence propose in this thesis to conceptualize sonifications along two poles where sound serves either a normative or a descriptive purpose. In the beginning of auditory display research, a continuum between a symbolic and an analogic pole has been proposed by Kramer (1994a, page 21). In this continuum, symbolic stands for sounds that coincide with existing schemas and are more denotative, analogic stands for sounds that are informative through their connotative aspects. (compare Worrall (2009, page 315)). The notions of symbolic and analogic illustrate the struggle to find apt descriptions of how the intention of the listener subjects audible phenomena to a process of meaning making and interpretation. Complementing the analogic-symbolic continuum with descriptive and normative purposes of displays is proposed in the light of the recently increased research interest in listening modes and intentions. Similar to the terms symbolic and analogic, listening modes have been discussed in auditory display since the beginning usually in dichotomic terms which were either identified with the words listening and hearing or understood as musical listening and everyday listening as proposed by Gaver (1993a). More than 25 years earlier, four direct listening modes have been introduced by Schaeffer (1966) together with a 5th synthetic mode of reduced listening which leads to the well-known sound object. Interestingly, Schaeffer’s listening modes remained largely unnoticed by the auditory display community. Particularly the notion of reduced listening goes beyond the connotative and denotative poles of the continuum proposed by Kramer and justifies the new terms descriptive and normative. Recently, a new taxonomy of listening modes has been proposed by Tuuri and Eerola (2012) that is motivated through an embodied cognition approach. The main contribution of their taxonomy is that it convincingly diversifies the connotative and denotative aspects of listening modes. In the recently published sonification handbook, multimodal and interactive aspects in combination with sonification have been discussed as promising options to expand and advance the field by Hunt and Hermann (2011), who point out that there is a big need for a better theoretical foundation in order to systematically integrate these aspects. The main contribution of this thesis is to address this need by providing alternative and complementary design guidelines with respect to existing approaches, all of which have been conceived before the recently increased research interest in listening modes. None of the existing contributions to design frameworks integrates multimodality, and listening modes with a focus on exploratory data analysis, where sonification is conceived to support the understanding of complex data potentially helping to identify new structures therein. In order to structure this field the following questions are addressed in this thesis: • How do natural listening modes and reduced listening relate to the proposed normative and descriptive display purposes? • What is the relationship of multimodality and interaction with listening modes and display purposes? • How can the potential of embodied cognition based listening modes be put to use for exploratory data sonification? • How can listening modes and display purposes be connected to questions of aesthetics in the display? • How do data complexity and Parameter-mapping sonification relate to exploratory data analysis and listening modes

    Non-hexagonal neural dynamics in vowel space

    Get PDF
    Are the grid cells discovered in rodents relevant to human cognition? Following up on two seminal studies by others, we aimed to check whether an approximate 6-fold, grid-like symmetry shows up in the cortical activity of humans who "navigate" between vowels, given that vowel space can be approximated with a continuous trapezoidal 2D manifold, spanned by the first and second formant frequencies. We created 30 vowel trajectories in the assumedly flat central portion of the trapezoid. Each of these trajectories had a duration of 240 milliseconds, with a steady start and end point on the perimeter of a "wheel". We hypothesized that if the neural representation of this "box" is similar to that of rodent grid units, there should be an at least partial hexagonal (6-fold) symmetry in the EEG response of participants who navigate it. We have not found any dominant n-fold symmetry, however, but instead, using PCAs, we find indications that the vowel representation may reflect phonetic features, as positioned on the vowel manifold. The suggestion, therefore, is that vowels are encoded in relation to their salient sensory-perceptual variables, and are not assigned to arbitrary gridlike abstract maps. Finally, we explored the relationship between the first PCA eigenvector and putative vowel attractors for native Italian speakers, who served as the subjects in our study

    Maturing Temporal Bones as Non-Neural Sites for Transforming the Speech Signal during Language Development

    Get PDF
    Developmental events in the temporal bones shift the pattern of a given speech sounds acoustic profile through the time children are mapping linguistic sound systems. Before age 5 years, frequency information in vowels is differentially accessible through the years children are acquiring the sound systems of their native language(s). To model the acoustic effects caused by developing temporal bones, data collected to elicit steady-state vowels from adult native speakers of English and Diné were modified to reflect the form of children\u27s hearing sensitivities at different ages based on patterns established in the psychoacoustic literature. It was assumed, based on the work of psychacousticians (e.g., Werner, Fay & Popper 2012; and Werner & Marean 1996), that the effects caused by immature temporal bones were conductive immaturities, and the age-sensitive filters were constructed based on psychoacoustic research into the hearing of infants and children. Data were partitioned by language, sex, and individual vowels and compared for points of similarity and difference in the way information in vowels is filtered because of the constraints imposed by the immaturity of the temporal bones. Results show that the early formant pattern becomes successively modified in a constrained pattern reflecting maturational processes. Results also suggest that children may well be switching strategies for processing vowels, using a more adult-like process after 18 months. Future research should explore if early hearing not only affects individual speech sounds but their relationships to one another in the vowel space as well. Additionally, there is an interesting artifact in the observed gradual progression to full adult hearing which may be the effect of the foramen of Huschke contributing to the filters at 1 year and 18 months. Given that immature temporal bones reflect brain expansion and rotational birth in hominids, these results contribute to the discussion of the biological underpinnings of the evolution of language.\u2

    A Tool for Differential Diagnosis of Childhood Apraxia of Speech and Dysarthria in Children: A Tutorial

    Get PDF
    Purpose: While there has been mounting research centered on the diagnosis of childhood apraxia of speech (CAS), little has focused on differentiating CAS from pediatric dysarthria. Because CAS and dysarthria share overlapping speech symptoms and some children have both motor speech disorders, differential diagnosis can be challenging. There is a need for clinical tools that facilitate assessment of both CAS and dysarthria symptoms in children. The goals of this tutorial are to (a) determine confidence levels of clinicians in differentially diagnosing dysarthria and CAS and (b) provide a systematic procedure for differentiating CAS and pediatric dysarthria in children. Method: Evidence related to differential diagnosis of CAS and dysarthria is reviewed. Next, a web-based survey of 359 pediatric speech-language pathologists is used to determine clinical confidence levels in diagnosing CAS and dysarthria. Finally, a checklist of pediatric auditory–perceptual motor speech features is presented along with a procedure to identify CAS and dysarthria in children with suspected motor speech impairments. Case studies illustrate application of this protocol, and treatment implications for complex cases are discussed. Results: The majority (60%) of clinician respondents reported low or no confidence in diagnosing dysarthria in children, and 40% reported they tend not to make this diagnosis as a result. Going forward, clinicians can use the feature checklist and protocol in this tutorial to support the differential diagnosis of CAS and dysarthria in clinical practice. Conclusions: Incorporating this diagnostic protocol into clinical practice should help increase confidence and accuracy in diagnosing motor speech disorders in children. Future research should test the sensitivity and specificity of this protocol in a large sample of children with varying speech sound disorders. Graduate programs and continuing education trainings should provide opportunities to practice rating speech features for children with dysarthria and CAS

    Phonaesthetic Phonological Iconicity in Literary Analysis Illustrated by Angela Carter’s “The Bloody Chamber”

    Get PDF
    The article offers a phonosemantic analysis of Angela Carter’s “The Bloody Chamber.” The phonosemantic investigation has been based on the corpus of nineteen relevant sound-related descriptions of the sea. Although most excerpts identified contain aural metaphors and are not phonologically iconic per se, there seem to exist at least three fragments which are particularly interesting from a phonosemantic point of view. Most notably, phonaesthemes /gl/, /l/, /r/ have been found to carry substantial meaning contributing to the overall interpretation of the story in question. Accounting for the inevitable subjectivity concerning iconicity, and in this case phonological iconicity, a few theories are presented in order to support the author’s reading of each phonaestheme’s contextual significance. The paper briefly reviews the chronological development of the field of phonosemantics and then combines the aural images theory (proposed by Richard Rhodes) with the “aural semiotic process” theory (the term coined by the author). Each analysis is further supplemented with scholarly views on respective phonaesthemes. On the whole, the paper does not aim to polemicize with the well-established definition of a phoneme and its generally accepted arbitrariness. Nevertheless, it has been observed that a speculative phonosemantic analysis of a literary work may yield noteworthy results

    Music information retrieval: conceptuel framework, annotation and user behaviour

    Get PDF
    Understanding music is a process both based on and influenced by the knowledge and experience of the listener. Although content-based music retrieval has been given increasing attention in recent years, much of the research still focuses on bottom-up retrieval techniques. In order to make a music information retrieval system appealing and useful to the user, more effort should be spent on constructing systems that both operate directly on the encoding of the physical energy of music and are flexible with respect to users’ experiences. This thesis is based on a user-centred approach, taking into account the mutual relationship between music as an acoustic phenomenon and as an expressive phenomenon. The issues it addresses are: the lack of a conceptual framework, the shortage of annotated musical audio databases, the lack of understanding of the behaviour of system users and shortage of user-dependent knowledge with respect to high-level features of music. In the theoretical part of this thesis, a conceptual framework for content-based music information retrieval is defined. The proposed conceptual framework - the first of its kind - is conceived as a coordinating structure between the automatic description of low-level music content, and the description of high-level content by the system users. A general framework for the manual annotation of musical audio is outlined as well. A new methodology for the manual annotation of musical audio is introduced and tested in case studies. The results from these studies show that manually annotated music files can be of great help in the development of accurate analysis tools for music information retrieval. Empirical investigation is the foundation on which the aforementioned theoretical framework is built. Two elaborate studies involving different experimental issues are presented. In the first study, elements of signification related to spontaneous user behaviour are clarified. In the second study, a global profile of music information retrieval system users is given and their description of high-level content is discussed. This study has uncovered relationships between the users’ demographical background and their perception of expressive and structural features of music. Such a multi-level approach is exceptional as it included a large sample of the population of real users of interactive music systems. Tests have shown that the findings of this study are representative of the targeted population. Finally, the multi-purpose material provided by the theoretical background and the results from empirical investigations are put into practice in three music information retrieval applications: a prototype of a user interface based on a taxonomy, an annotated database of experimental findings and a prototype semantic user recommender system. Results are presented and discussed for all methods used. They show that, if reliably generated, the use of knowledge on users can significantly improve the quality of music content analysis. This thesis demonstrates that an informed knowledge of human approaches to music information retrieval provides valuable insights, which may be of particular assistance in the development of user-friendly, content-based access to digital music collections
    corecore