17 research outputs found

    Sequential grouping constraints on across-channel auditory processing

    Get PDF

    SĂžren Buus. Thirty years of psychoacoustic inspiration

    Get PDF

    Sequential grouping constraints on across‐channel auditory processing

    Full text link

    A speaker classification framework for non-intrusive user modeling : speech-based personalization of in-car services

    Get PDF
    Speaker Classification, i.e. the automatic detection of certain characteristics of a person based on his or her voice, has a variety of applications in modern computer technology and artificial intelligence: As a non-intrusive source for user modeling, it can be employed for personalization of human-machine interfaces in numerous domains. This dissertation presents a principled approach to the design of a novel Speaker Classification system for automatic age and gender recognition which meets these demands. Based on literature studies, methods and concepts dealing with the underlying pattern recognition task are developed. The final system consists of an incremental GMM-SVM supervector architecture with several optimizations. An extensive data-driven experiment series explores the parameter space and serves as evaluation of the component. Further experiments investigate the language-independence of the approach. As an essential part of this thesis, a framework is developed that implements all tasks associated with the design and evaluation of Speaker Classification in an integrated development environment that is able to generate efficient runtime modules for multiple platforms. Applications from the automotive field and other domains demonstrate the practical benefit of the technology for personalization, e.g. by increasing local danger warning lead time for elderly drivers.Die Sprecherklassifikation, also die automatische Erkennung bestimmter Merkmale einer Person anhand ihrer Stimme, besitzt eine Vielzahl von Anwendungsmöglichkeiten in der modernen Computertechnik und KĂŒnstlichen Intelligenz: Als nicht-intrusive Wissensquelle fĂŒr die Benutzermodellierung kann sie zur Personalisierung in vielen Bereichen eingesetzt werden. In dieser Dissertation wird ein fundierter Ansatz zum Entwurf eines neuartigen Sprecherklassifikationssystems zur automatischen Bestimmung von Alter und Geschlecht vorgestellt, welches diese Anforderungen erfĂŒllt. Ausgehend von Literaturstudien werden Konzepte und Methoden zur Behandlung des zugrunde liegenden Mustererkennungsproblems entwickelt, welche zu einer inkrementell arbeitenden GMM-SVM-Supervector-Architektur mit diversen Optimierungen fĂŒhren. Eine umfassende datengetriebene Experimentalreihe dient der Erforschung des Parameterraumes und zur Evaluierung der Komponente. Weitere Studien untersuchen die SprachunabhĂ€ngigkeit des Ansatzes. Als wesentlicher Bestandteil der Arbeit wird ein Framework entwickelt, das alle im Zusammenhang mit Entwurf und Evaluierung von Sprecherklassifikation anfallenden Aufgaben in einer integrierten Entwicklungsumgebung implementiert, welche effiziente Laufzeitmodule fĂŒr verschiedene Plattformen erzeugen kann. Anwendungen aus dem Automobilbereich und weiteren DomĂ€nen demonstrieren den praktischen Nutzen der Technologie zur Personalisierung, z.B. indem die Vorlaufzeit von lokalen Gefahrenwarnungen fĂŒr Ă€ltere Fahrer erhöht wird

    Proceedings of the 7th Sound and Music Computing Conference

    Get PDF
    Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010

    Towards multi-domain speech understanding with flexible and dynamic vocabulary

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.Includes bibliographical references (p. 201-208).In developing telephone-based conversational systems, we foresee future systems capable of supporting multiple domains and flexible vocabulary. Users can pursue several topics of interest within a single telephone call, and the system is able to switch transparently among domains within a single dialog. This system is able to detect the presence of any out-of-vocabulary (OOV) words, and automatically hypothesizes each of their pronunciation, spelling and meaning. These can be confirmed with the user and the new words are subsequently incorporated into the recognizer lexicon for future use. This thesis will describe our work towards realizing such a vision, using a multi-stage architecture. Our work is focused on organizing the application of linguistic constraints in order to accommodate multiple domain topics and dynamic vocabulary at the spoken input. The philosophy is to exclusively apply below word-level linguistic knowledge at the initial stage. Such knowledge is domain-independent and general to all of the English language. Hence, this is broad enough to support any unknown words that may appear at the input, as well as input from several topic domains. At the same time, the initial pass narrows the search space for the next stage, where domain-specific knowledge that resides at the word-level or above is applied. In the second stage, we envision several parallel recognizers, each with higher order language models tailored specifically to its domain. A final decision algorithm selects a final hypothesis from the set of parallel recognizers.(cont.) Part of our contribution is the development of a novel first stage which attempts to maximize linguistic constraints, using only below word-level information. The goals are to prevent sequences of unknown words from being pruned away prematurely while maintaining performance on in-vocabulary items, as well as reducing the search space for later stages. Our solution coordinates the application of various subword level knowledge sources. The recognizer lexicon is implemented with an inventory of linguistically motivated units called morphs, which are syllables augmented with spelling and word position. This first stage is designed to output a phonetic network so that we are not committed to the initial hypotheses. This adds robustness, as later stages can propose words directly from phones. To maximize performance on the first stage, much of our focus has centered on the integration of a set of hierarchical sublexical models into this first pass. To do this, we utilize the ANGIE framework which supports a trainable context-free grammar, and is designed to acquire subword-level and phonological information statistically. Its models can generalize knowledge about word structure, learned from in-vocabulary data, to previously unseen words. We explore methods for collapsing the ANGIE models into a finite-state transducer (FST) representation which enables these complex models to be efficiently integrated into recognition. The ANGIE-FST needs to encapsulate the hierarchical knowledge of ANGIE and replicate ANGIE's ability to support previously unobserved phonetic sequences ...by Grace Chung.Ph.D

    Concept & form : post-philosophical studies in contemporary art

    Get PDF
    This thesis identifies a problem within current philosophical perspectives concerning contemporary visual art, namely, the underestimation of the unique qualities of a concept in visual form. There is a related deficit in the literature about both the practice of contemporary art making as a cognitive manipulation of concept and form, and the ways in which the viewer might dissect the relationship between concept and form in philosophical inquiry. This thesis explores two central claims. First, that visual art allows for a spatial and temporal conflation of concept that manufactures a unique philosophical realm more readily cognitively assimilated than with the written or spoken word. Second, that a post-philosophical reading of some contemporary art works is possible whereby both pursuits might inform each other, forging expanded potential in inquiry. The thesis takes the form of detailed case studies of single works of art and their relationship with particular models/instances/paradigms of philosophical thinking. Presenting select works of art by Joseph Beuys, Anselm Kiefer and Hanne Darboven, the thesis explores how this range of contemporary works of art engage concurrently produced works of philosophy. This thesis ends with the author's personal account of the cognitive manipulation of concept and form as an insight into the creation of a work of art. The thesis submits that a greater understanding of contemporary art practice - from conception to exhibition - can vitalize philosophical inquiry by illuminating the cognitive process beyond written and spoken language. Scope for further research might incorporate questions concerning the emancipatory qualities of a more accessible philosophical realm, particularly concerning pedagogical or political engagement with visual representation. Such research would necessitate ongoing attention to the method and practice of 'readmg' visual representation
    corecore