12 research outputs found

    Speech segmentation and clustering methods for a new speech recognition architecture

    Get PDF
    Perinteiset automaattiset puheentunnistusmenetelmät eivät pärjää suorituskyvyssä ihmisen puheenhavaintokyvylle. Voidaksemme kuroa tämän eron umpeen, on kehitettävä täysin uudentyyppisiä arkkitehtuureja puheentunnistusta varten. Puhetta ja kieltä itsestään ihmisen lailla oppiva järjestelmä on yksi tällainen vaihtoehto. Tämä diplomityö esittelee erään lähtökohdan oppivalle järjestelmälle, koostuen uudenlaisesta sokeasta puheen segmentointialgoritmista, segmenttien piirteistyksestä, sekä menetelmistä vähittäiselle puhedatan luokittelulle klusteroinnin avulla. Kaikki metodit arvioitiin kattavilla kokeilla, ja itse arviontimenetelmien luonteeseen kiinnitettiin huomiota. Segmentoinnissa saavutettiin alan kirjallisuuteen nähden hyvät tulokset. Järjestelmän mahdollisia jatkokehityssuuntauksia on hahmoteltu muunmuassa mahdollisten muistiarkkitehtuurien ja älykkään top-down palautteen osalta.To reduce the gap between performance of traditional speech recognition systems and human speech recognition skills, a new architecture is required. A system that is capable of incremental learning offers one such solution to this problem. This thesis introduces a bottom-up approach for such a speech processing system, consisting of a novel blind speech segmentation algorithm, a segmental feature extraction methodology, and data classification by incremental clustering. All methods were evaluated by extensive experiments with a broad range of test material and the evaluation methodology was itself also scrutinized. The segmentation algorithm achieved above standard quality results compared to what is found in current literature regarding blind segmentation. Possibilities for follow-up research of memory structures and intelligent top-down feedback in speech processing are also outlined

    Spoken language processing: piecing together the puzzle

    No full text
    Attempting to understand the fundamental mechanisms underlying spoken language processing, whether it is viewed as behaviour exhibited by human beings or as a faculty simulated by machines, is one of the greatest scientific challenges of our age. Despite tremendous achievements over the past 50 or so years, there is still a long way to go before we reach a comprehensive explanation of human spoken language behaviour and can create a technology with performance approaching or exceeding that of a human being. It is argued that progress is hampered by the fragmentation of the field across many different disciplines, coupled with a failure to create an integrated view of the fundamental mechanisms that underpin one organism's ability to communicate with another. This paper weaves together accounts from a wide variety of different disciplines concerned with the behaviour of living systems - many of them outside the normal realms of spoken language - and compiles them into a new model: PRESENCE (PREdictive SENsorimotor Control and Emulation). It is hoped that the results of this research will provide a sufficient glimpse into the future to give breath to a new generation of research into spoken language processing by mind or machine. (c) 2007 Elsevier B.V. All rights reserved

    Manipulations of List Type in the DRM Paradigm: A Review of How Structural and Conceptual Similarity Affect False Memory

    Get PDF
    The use of list-learning paradigms to explore false memory has revealed several critical findings about the contributions of similarity and relatedness in memory phenomena more broadly. Characterizing the nature of “similarity and relatedness” can inform researchers about factors contributing to memory distortions and about the underlying associative and semantic networks that support veridical memory. Similarity can be defined in terms of semantic properties (e.g., shared conceptual and taxonomic features), lexical/associative properties (e.g., shared connections in associative networks), or structural properties (e.g., shared orthographic or phonological features). By manipulating the type of list and its relationship to a non-studied critical item, we review the effects of these types of similarity on veridical and false memory. All forms of similarity reviewed here result in reliable error rates and the effects on veridical memory are variable. The results across a variety of paradigms and tests provide partial support for a number of theoretical explanations of false memory phenomena, but none of the theories readily account for all results

    Investigating Hybrid Models Of Speech Perception

    Get PDF
    The ability to perceive sounds as words involves a transformation from detailed speech signals to invariant meanings, which are separate from information about the speaker of a particular word. The nature of this transformation is a central issue in the field of speech perception. A particular focus of ongoing debate concerns talker-specific details: are they causally relevant to lexical perception, or are they useful only for tasks like speaker recognition? One common way to investigate the impact of voice information is to examine the time-course of its effects on future perceptual events. Early research reported no consistent long-lasting effects, implying that speech representations do not contain talker-specific detail (Jackson & Morton, 1984). However, subsequent work reported long-lasting effects, leading to a focus on modelling speech representations as abstractions over detail-rich episodic memories (Goldinger, 1996). Current hybrid models (Church & Schacter, 1994; McLennan & Luce, 2005; Goldinger, 2007) incorporate abstract and detail-rich speech representations but differ in the relative importance assigned each. Two types of hybrid models are differentiated: a) models with combined representations, where abstraction occurs over detailed memories of speech episodes; versus b) models with separate representations, where different processing paths exist from the speech signal to word and speaker recognition. To investigate these models, this thesis reports multiple experiments investigating the time-course of the decay patterns of voice effects in repetition priming. Results from auditory lexical decision indicate that voice information only affects the speed of future perceptual processes within a short time window: until around three items intervene between prime and target. This finding clarifies previous results, which found no long-lasting effects, by providing an exact time-course of voice information’s impact. Nevertheless, the results reported here differ from the predictions of studies investigating recognition accuracy, where long-lasting effects are commonly found. To address these differences, additional experiments using continuous and blocked word recognition paradigms were conducted. Again, talker-specific effects only persist within the same short time window, while abstract repetition priming effects persist much longer. By de-emphasizing the contribution of voice information, these findings assert the importance of abstract linguistic representations in hybrid models with separate representations

    Semantic Memory

    Get PDF
    How is it that we know what a dog and a tree are, or, for that matter, what knowledge is? Our semantic memory consists of knowledge about the world, including concepts, facts and beliefs. This knowledge is essential for recognizing entities and objects, and for making inferences and predictions about the world. In essence, our semantic knowledge determines how we understand and interact with the world around us. In this chapter, we examine semantic memory from cognitive, sensorimotor, cognitive neuroscientific, and computational perspectives. We consider the cognitive and neural processes (and biases) that allow people to learn and represent concepts, and discuss how and where in the brain sensory and motor information may be integrated to allow for the perception of a coherent “concept”. We suggest that our understanding of semantic memory can be enriched by considering how semantic knowledge develops across the lifespan within individuals

    Proceedings of KogWis 2012. 11th Biannual Conference of the German Cognitive Science Society

    Get PDF
    The German cognitive science conference is an interdisciplinary event where researchers from different disciplines -- mainly from artificial intelligence, cognitive psychology, linguistics, neuroscience, philosophy of mind, and anthropology -- and application areas -- such as eduction, clinical psychology, and human-machine interaction -- bring together different theoretical and methodological perspectives to study the mind. The 11th Biannual Conference of the German Cognitive Science Society took place from September 30 to October 3 2012 at Otto-Friedrich-Universität in Bamberg. The proceedings cover all contributions to this conference, that is, five invited talks, seven invited symposia and two symposia, a satellite symposium, a doctoral symposium, three tutorials, 46 abstracts of talks and 23 poster abstracts

    Hearing the message and seeing the messenger: The role of talker information in spoken language comprehension

    Get PDF
    The acoustic signal consists of various layers of information that we often process unconsciously. Most importantly, they contain both linguistic and indexical information, which are the two fundamental components within the sound input. Even though the meaning of the word does not change when spoken by multiple speakers, the same word never sounds exactly the same. That is because individuals introduce all kinds of variation to the speech input. Hence, through segmental and suprasegmental information, listeners can discern the nativeness (native vs. non-native) of the talker and the age of the talker (adult vs. child). Both non-native talkers and child talkers deviate from the standard norms of pronunciation of native adults and show variation both within and between talkers. The main difference between non-native adults and native children is that, for non-native talkers, variation is driven by their native language, meaning that the phonological structures of their native language interact with their second language; therefore, they maintain a foreign accent. For children, however, variation is driven by development, such that children's competencies in their motor skills depend on their current stage of language development. While there has been extensive research on foreign-accented speech, there is little knowledge about child speech. Especially the processing of child speech has only been investigated by a few studies so far. Hence, the central question of the dissertation is "What is the role of talker information in spoken language comprehension?" This question was investigated from three distinct angles: The first project examined talker information from an auditory-only perspective, the second project investigated talker information from an audio-visual perspective, and the third project studied the impact of talker information on listeners' credibility ratings in the socio-linguistic context

    AXMEDIS 2008

    Get PDF
    The AXMEDIS International Conference series aims to explore all subjects and topics related to cross-media and digital-media content production, processing, management, standards, representation, sharing, protection and rights management, to address the latest developments and future trends of the technologies and their applications, impacts and exploitation. The AXMEDIS events offer venues for exchanging concepts, requirements, prototypes, research ideas, and findings which could contribute to academic research and also benefit business and industrial communities. In the Internet as well as in the digital era, cross-media production and distribution represent key developments and innovations that are fostered by emergent technologies to ensure better value for money while optimising productivity and market coverage

    Semantic, phonological and episodic representations in verbal immediate serial recall

    Get PDF
    Psycholinguistic frameworks provide contemporary accounts of immediate serial recall (e.g., N. Martin & Saffran, 1997; R. C. Martin, Lesch, & Bartha, 1999). These models emphasise the inclusion of semantic/associative and phonological representations in verbal short-term memory but have difficulty explaining how serial order is represented and maintained. Conversely, computational models of immediate serial recall (e.g., Brown, Preece, & Hulme, 2000; Henson, 1998b; Lewandowsky & Farrell, 2008b; Page & Norris, 1998) have typically concentrated on the role of temporary episodic representations on short-term recall but have trouble accounting for the influence of multiple representations on performance. The aim of this research was to combine these two lines of research to form a more integrative approach to immediate serial recall. The intention was to contribute to current understandings of verbal short-term memory by exploring how the binding of semantic/associative, phonological and episodic representations would influence immediate serial recall..
    corecore