Search CORE

109 research outputs found

Hierachical methods for large population speaker identification using telephone speech

Author: Lerato Lerato
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2003
Field of study

This study focuses on speaker identificat ion. Several problems such as acoustic noise, channel noise, speaker variability, large population of known group of speakers wi thin the system and many others limit good SiD performance. The SiD system extracts speaker specific features from digitised speech signa] for accurate identification. These feature sets are clustered to form the speaker template known as a speaker model. As the number of speakers enrolling into the system gets larger, more models accumulate and the interspeaker confusion results. This study proposes the hierarchical methods which aim to split the large population of enrolled speakers into smaller groups of model databases for minimising interspeaker confusion

Cape Town University OpenUCT

Zweistufige kontextsensitive Sprecherklassifikation am Beispiel von Alter und Geschlecht

Author: Müller Christian
Publication venue
Publication date: 30/08/2007
Field of study

In der vorliegenden Dissertation wird ein zweistufiger Ansatz zur Sprecherklassifikation am Beispiel Alter und Geschlecht vorgestellt. Dazu werden zunächst die Ergebnisse umfangreicher Korpusanalysen präsentiert, die als Referenzbasis humanwissenschaftlicher Studien geeignet sind. Es wird gezeigt, dass die Modelle, die mithilfe dieser Daten trainiert wurden, in der Lage sind, die genannten Sprechereigenschaften mit einer Genauigkeit zu erkennen, die teilweise das Fünffache des jeweiligen Zufallsniveaus beträgt. Darüber hinaus zeichnet sich der vorgestellte Ansatz vor allen Dingen durch die so genannte Zweite Ebene aus, auf der mithilfe von Dynamischen Bayesschen Netzen eine Fusion multipler Klassifikationsergebnisse unter Berücksichtigung des auditiven Kontextes erfolgt. In der Arbeit wird außerdem ein konkretes Sprecherklassifikationssystem beschrieben, welches für das Anwendungsszenario von mobilen, sprachbasierten Dialogsystemen entwickelt worden ist.This dissertation describes a two-layered speaker classification approach on the example of age and gender. First of all, the results of comprehensive corpus analyses are presented that are suitable to serve as a reference basis for further studies in human sciences. It is showed, that the models which are trained using these data are able to recognize the above mentioned characteristics with an accuracy that is up to five times better than the respective chance level. In addition, the presented approach distinguishes itself by the so called Second Layer, on which a context sensitive fusion of multiple classification results is accomplished using Dynamic Bayesian Networks. The dissertation also describes a concrete speaker classification system which was developed for the application scenario of mobile spoken dialog systems

Universaar

Acronym

Abstracts of the 2014 Brains, Minds, and Machines Summer School

Author: Amir Nadav
Besold Tarek R.
Camoriano Rafaello
de Brito Carols Stein N.
Erdogan Goker
Flynn Thomas
Gillary Grant
Gomez Jesse
Herbert-Voss Ariel
Hotan Gladia
Kadmon Jonathan
Linderman Scott W.
Liu Tina T.
Marantan Andrew
Olson Joseph
Orchard Garrick
Pal Dipan K.
Pasquale Giulia
Sanders Honi
Silberer Carina
Smith Kevin A.
Suchow Jordan W.
Tessler M. H.
Viejo Guillaume
Walker Drew
Wehbe Leila
Publication venue: Center for Brains, Minds and Machines (CBMM)
Publication date: 26/09/2014
Field of study

A compilation of abstracts from the student projects of the 2014 Brains, Minds, and Machines Summer School, held at Woods Hole Marine Biological Lab, May 29 - June 12, 2014.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216

DSpace@MIT

Positive emotions in the voice:Towards an ethological understanding

Author: Kamiloğlu R.G.
Publication venue
Publication date: 01/01/2023
Field of study

International Migration, Integration and Social Cohesion online publications

Positive emotions in the voice:Towards an ethological understanding

Author: Kamiloğlu R.G.
Publication venue
Publication date: 01/01/2023
Field of study

International Migration, Integration and Social Cohesion online publications

Recommended from our members

Extracting Spatiotemporal Word and Semantic Representations from Multiscale Neurophysiological Recordings in Humans

Author: Chan Alexander Mark
Publication venue: 'Harvard University Botany Libraries'
Publication date: 21/06/2014
Field of study

With the recent advent of neuroimaging techniques, the majority of the research studying the neural basis of language processing has focused on the localization of various lexical and semantic functions. Unfortunately, the limited time resolution of functional neuroimaging prevents a detailed analysis of the dynamics involved in word recognition, and the hemodynamic basis of these techniques prevents the study of the underlying neurophysiology. Compounding this problem, current techniques for the analysis of high-dimensional neural data are mainly sensitive to large effects in a small area, preventing a thorough study of the distributed processing involved for representing semantic knowledge. This thesis demonstrates the use of multivariate machine-learning techniques for the study of the neural representation of semantic and speech information in electro/magneto-physiological recordings with high temporal resolution. Support vector machines (SVMs) allow for the decoding of semantic category and word-specific information from non-invasive electroencephalography (EEG) and magnetoenecephalography (MEG) and demonstrate the consistent, but spatially and temporally distributed nature of such information. Moreover, the anteroventral temporal lobe (avTL) may be important for coordinating these distributed representations, as supported by the presence of supramodal category-specific information in intracranial recordings from the avTL as early as 150ms after auditory or visual word presentation. Finally, to study the inputs to this lexico-semantic system, recordings from a high density microelectrode array in anterior superior temporal gyrus (aSTG) are obtained, and the recorded spiking activity demonstrates the presence of single neurons that respond specifically to speech sounds. The successful decoding of word identity from this firing rate information suggests that the aSTG may be involved in the population coding of acousto-phonetic speech information that is likely on the pathway for mapping speech-sounds to meaning in the avTL. The feasibility of extracting semantic and phonological information from multichannel neural recordings using machine learning techniques provides a powerful method for studying language using large datasets and has potential implications for the development of fast and intuitive communication prostheses.Engineering and Applied Science

Harvard University - DASH

Let’s lie together:Co-presence effects on children’s deceptive skills

Author: Swerts M.G.J.
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2012
Field of study

Tilburg University Repository

Backwards is the way forward: feedback in the cortical hierarchy predicts the expected future

Author: Muckli L.
Petro L.S.
Smith F.W.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 10/05/2013
Field of study

Clark offers a powerful description of the brain as a prediction machine, which offers progress on two distinct levels. First, on an abstract conceptual level, it provides a unifying framework for perception, action, and cognition (including subdivisions such as attention, expectation, and imagination). Second, hierarchical prediction offers progress on a concrete descriptive level for testing and constraining conceptual elements and mechanisms of predictive coding models (estimation of predictions, prediction errors, and internal models)

Enlighten

University of East Anglia digital repository

Emotions in archetypal media content

Author: Chang Huang-Ming
Publication venue: Universitat Politècnica de Catalunya
Publication date: 16/09/2014
Field of study

Emotion is an intriguing and mysterious psychological phenomenon. While everyone seems to know what it is, researchers have not yet come to consensus on its definition, and many questions still remain unanswered. While the nature of emotion is yet to discover, the design community has noticed is importance, and poses the challenge of how emotion could inform design. We see the necessity to follow the state of the art in psychology and initiate the undertaking by exploring the emotional qualities in various types of media content. The first part of this thesis aims at constructing a theoretical framework. Recent years have seen empirical studies suggest that emotion could be unconscious. While this is to be further justified, scientists are motivated to reconsider current theories of emotion to account for this phenomenon. In light of this, we integrate these studies about unconscious emotion into our literature review. An overview from theory to practice is illustrated to provide a reference for viewing the current states in application domains, such as affective computing and emotional design. This review offers a holistic understanding about emotion from various perspectives, which allow us to look for new directions in future studies. Based on our review, we see a promising direction by applying psychoanalysis methods to analyze the media content as affective stimuli, and these stimuli can be evaluated by using quantitative measures to investigate the connection between the content and the corresponding emotions. The analysis on the media content is based on a psychoanalysis theory¿the theory of archetypes¿proposed by Carl Jung. He argues that there exists a universal pattern in humans¿ unconscious thoughts, which can be manifested as symbolic content in various forms of narratives, such as myth and fairy tales. Today, this archetypal symbolic content can be seen in modern media, particularly in movies. By applying the Jungian approach, we analyzed the symbolic meaning in movie scenes and edit these feature scenes into a collection of archetypal media content, which serve as the experimental materials for later explorations. In the second part of this thesis, we present three experimental studies that aim at determining if archetypal media content can be differentiated based on emotional responses. We adopted the psychoanalytical approach described earlier to collect feature scenes in movies as archetypal media content. Meanwhile, affective stimuli of explicit emotions are also included as benchmarks for comparison, such as sadness and joy. Self-reports and physiological signals are both adopted for measuring emotional responses. These three studies follow similar experimental design: presenting stimuli and measuring emotion concurrently. The results of these studies confirm that emotions induced by archetypal content are different from explicit emotions, and the statistical analysis further indicate that the predictive model obtained from physiological signals outperforms the model generated from self-reports while viewing archetypal media content. These results, however, are opposite to the results gained from affective stimuli of explicit emotions, leading us to the conclusion that archetypal media content might induce unconscious emotions, and physiological signals are more effective than self-reports for recognizing emotions induced by archetypal media content.La emoción es un fenómeno psicológico intrigante y misterioso. Aunque todo el mundo parece saber lo que es, los investigadores aún no han llegado a un consenso sobre su definición, y todavía quedan muchas preguntas sin respuesta. Si bien la naturaleza de las emociones está aún por descubrir, la comunidad de profesionales del diseño ha entendido su importancia, y se plantea el desafío de interrelacionar ambos mundos, explorando de las cualidades emocionales en diversos tipos de contenido en medios de comunicación. La primera parte de esta tesis tiene como objetivo la construcción de un marco teórico. Recientemente se han realizado estudios empíricos que sugieren que las emociones puede ser inconscientes. Si bien esto debe justificarse mejor, los científicos están motivados a reconsiderar las teorías actuales de la emoción para explicar este fenómeno. En vista de ello, integramos estos estudios sobre las emociones inconscientes en nuestra revisión de referencias bibliográficas incluyendo dominios de aplicación recientes, tales como la Computación Afectiva y el Diseño Emocional. Una dirección prometedora de investigación se basa en la aplicación de métodos del psicoanálisis para analizar contenidos multimedia como estímulos afectivos, y estos estímulos pueden ser evaluados mediante el uso de medidas cuantitativas para investigar la conexión entre el contenido y las emociones correspondientes. Este análisis se basa en la teoría de los arquetipos propuesto por el psicólogo Carl Jung. El autor sostiene que existe una patrón universal en los pensamientos inconscientes de los personas, que puede manifestarse como un símbolo contenido en las diversas formas de narrativas, como en los mitos y los cuentos de hadas. Hoy en día, estos arquetipos de contenido simbólico se puede ver frecuentemente en los contenidos multimedia modernos, sobre todo en las películas. Mediante la aplicación del enfoque de Jung, analizamos el significado simbólico en escenas de películas seleccionando las correspondientes a diversos arquetipos, que servirá como material experimental para exploraciones posteriores. En la segunda parte de esta tesis, se presentan tres estudios experimentales que apuntan a determinar si el contenido multimedia arquetípico puede diferenciarse en base a respuestas emocionales. Con el enfoque psicoanalítico descrito anteriormente para los arquetipos, también se incluye los estímulos afectivos de emociones explícitas son como puntos de referencia para la comparación, como la tristeza y la alegría. Se realizan auto-informes y se miden señales fisiológicas para la determinación de las respuestas emocionales en todos los experimentos realizados. Los resultados de estos estudios confirman que las emociones inducidas por arquetipos son diferentes de las emociones explícitas, y el análisis estadístico indica además que el modelo predictivo obtenido a partir de señales fisiológicas supera el modelo generado por los auto-informes durante la visualización de contenidos multimedia arquetípicos. Estos resultados, sin embargo, son opuestos a los resultados obtenidos a partir de los estímulos afectivos de emociones explícitas, llevándonos a la conclusión de que los contenidos de los medios arquetípicos podría inducir emociones inconscientes, y que las señales fisiológicas son más eficaces que los auto informes para el reconocimiento de las emociones inducidas por el contenido de medios arquetípico. En la tercera parte de esta tesis, exploramos cómo los contenidos arquetípicos podrían utilizarse para diseñar contenido multimedia mediante "mood boards". Se realizaron dos estudios con diseñadores para responder a la pregunta de investigación de si es posible generar contenido emocionalmente rico a través de la generación automática de contenido arquetípico por "mood boards" en comparación con el contenido multimedia no arquetípico

Tesis Doctorals en Xarxa