109 research outputs found
Hierachical methods for large population speaker identification using telephone speech
This study focuses on speaker identificat ion. Several problems such as acoustic noise, channel noise, speaker variability, large population of known group of speakers wi thin the system and many others limit good SiD performance. The SiD system extracts speaker specific features from digitised speech signa] for accurate identification. These feature sets are clustered to form the speaker template known as a speaker model. As the number of speakers enrolling into the system gets larger, more models accumulate and the interspeaker confusion results. This study proposes the hierarchical methods which aim to split the large population of enrolled speakers into smaller groups of model databases for minimising interspeaker confusion
Zweistufige kontextsensitive Sprecherklassifikation am Beispiel von Alter und Geschlecht
In der vorliegenden Dissertation wird ein zweistufiger Ansatz zur Sprecherklassifikation am Beispiel Alter und Geschlecht vorgestellt. Dazu werden zunächst die Ergebnisse umfangreicher Korpusanalysen präsentiert, die als Referenzbasis humanwissenschaftlicher Studien geeignet sind. Es wird gezeigt, dass die Modelle, die mithilfe dieser Daten trainiert wurden, in der Lage sind, die genannten Sprechereigenschaften mit einer Genauigkeit zu erkennen, die teilweise das Fünffache des jeweiligen Zufallsniveaus beträgt. Darüber hinaus zeichnet sich der vorgestellte Ansatz vor allen Dingen durch die so genannte Zweite Ebene aus, auf der mithilfe von Dynamischen Bayesschen Netzen eine Fusion multipler Klassifikationsergebnisse unter Berücksichtigung des auditiven Kontextes erfolgt. In der Arbeit wird außerdem ein konkretes Sprecherklassifikationssystem beschrieben, welches für das Anwendungsszenario von mobilen, sprachbasierten Dialogsystemen entwickelt worden ist.This dissertation describes a two-layered speaker classification approach on the example of age and gender. First of all, the results of comprehensive corpus analyses are presented that are suitable to serve as a reference basis for further studies in human sciences. It is showed, that the models which are trained using these data are able to recognize the above mentioned characteristics with an accuracy that is up to five times better than the respective chance level. In addition, the presented approach distinguishes itself by the so called Second Layer, on which a context sensitive fusion of multiple classification results is accomplished using Dynamic Bayesian Networks. The dissertation also describes a concrete speaker classification system which was developed for the application scenario of mobile spoken dialog systems
Abstracts of the 2014 Brains, Minds, and Machines Summer School
A compilation of abstracts from the student projects of the 2014 Brains, Minds, and Machines Summer School, held at Woods Hole Marine Biological Lab, May 29 - June 12, 2014.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216
Recommended from our members
Extracting Spatiotemporal Word and Semantic Representations from Multiscale Neurophysiological Recordings in Humans
With the recent advent of neuroimaging techniques, the majority of the research studying the neural basis of language processing has focused on the localization of various lexical and semantic functions. Unfortunately, the limited time resolution of functional neuroimaging prevents a detailed analysis of the dynamics involved in word recognition, and the hemodynamic basis of these techniques prevents the study of the underlying neurophysiology. Compounding this problem, current techniques for the analysis of high-dimensional neural data are mainly sensitive to large effects in a small area, preventing a thorough study of the distributed processing involved for representing semantic knowledge. This thesis demonstrates the use of multivariate machine-learning techniques for the study of the neural representation of semantic and speech information in electro/magneto-physiological recordings with high temporal resolution. Support vector machines (SVMs) allow for the decoding of semantic category and word-specific information from non-invasive electroencephalography (EEG) and magnetoenecephalography (MEG) and demonstrate the consistent, but spatially and temporally distributed nature of such information. Moreover, the anteroventral temporal lobe (avTL) may be important for coordinating these distributed representations, as supported by the presence of supramodal category-specific information in intracranial recordings from the avTL as early as 150ms after auditory or visual word presentation. Finally, to study the inputs to this lexico-semantic system, recordings from a high density microelectrode array in anterior superior temporal gyrus (aSTG) are obtained, and the recorded spiking activity demonstrates the presence of single neurons that respond specifically to speech sounds. The successful decoding of word identity from this firing rate information suggests that the aSTG may be involved in the population coding of acousto-phonetic speech information that is likely on the pathway for mapping speech-sounds to meaning in the avTL. The feasibility of extracting semantic and phonological information from multichannel neural recordings using machine learning techniques provides a powerful method for studying language using large datasets and has potential implications for the development of fast and intuitive communication prostheses.Engineering and Applied Science
Backwards is the way forward: feedback in the cortical hierarchy predicts the expected future
Clark offers a powerful description of the brain as a prediction machine, which offers progress on two distinct levels. First, on an abstract conceptual level, it provides a unifying framework for perception, action, and cognition (including subdivisions such as attention, expectation, and imagination). Second, hierarchical prediction offers progress on a concrete descriptive level for testing and constraining conceptual elements and mechanisms of predictive coding models (estimation of predictions, prediction errors, and internal models)
Emotions in archetypal media content
Emotion is an intriguing and mysterious psychological phenomenon. While everyone
seems to know what it is, researchers have not yet come to consensus on its definition, and
many questions still remain unanswered. While the nature of emotion is yet to discover,
the design community has noticed is importance, and poses the challenge of how emotion
could inform design. We see the necessity to follow the state of the art in psychology and
initiate the undertaking by exploring the emotional qualities in various types of media
content. The first part of this thesis aims at constructing a theoretical framework. Recent
years have seen empirical studies suggest that emotion could be unconscious. While this is
to be further justified, scientists are motivated to reconsider current theories of emotion to
account for this phenomenon. In light of this, we integrate these studies about unconscious
emotion into our literature review. An overview from theory to practice is illustrated to
provide a reference for viewing the current states in application domains, such as affective
computing and emotional design. This review offers a holistic understanding about
emotion from various perspectives, which allow us to look for new directions in future
studies.
Based on our review, we see a promising direction by applying psychoanalysis methods
to analyze the media content as affective stimuli, and these stimuli can be evaluated
by using quantitative measures to investigate the connection between the content and the
corresponding emotions. The analysis on the media content is based on a psychoanalysis
theory¿the theory of archetypes¿proposed by Carl Jung. He argues that there exists a
universal pattern in humans¿ unconscious thoughts, which can be manifested as symbolic
content in various forms of narratives, such as myth and fairy tales. Today, this archetypal
symbolic content can be seen in modern media, particularly in movies. By applying the
Jungian approach, we analyzed the symbolic meaning in movie scenes and edit these feature
scenes into a collection of archetypal media content, which serve as the experimental
materials for later explorations.
In the second part of this thesis, we present three experimental studies that aim at determining
if archetypal media content can be differentiated based on emotional responses.
We adopted the psychoanalytical approach described earlier to collect feature scenes in
movies as archetypal media content. Meanwhile, affective stimuli of explicit emotions are
also included as benchmarks for comparison, such as sadness and joy. Self-reports and
physiological signals are both adopted for measuring emotional responses. These three
studies follow similar experimental design: presenting stimuli and measuring emotion
concurrently. The results of these studies confirm that emotions induced by archetypal
content are different from explicit emotions, and the statistical analysis further indicate
that the predictive model obtained from physiological signals outperforms the model generated
from self-reports while viewing archetypal media content. These results, however,
are opposite to the results gained from affective stimuli of explicit emotions, leading us
to the conclusion that archetypal media content might induce unconscious emotions, and
physiological signals are more effective than self-reports for recognizing emotions induced
by archetypal media content.La emoción es un fenómeno psicológico intrigante y misterioso. Aunque todo el mundo parece saber lo que es, los investigadores aún no han llegado a un consenso sobre su definición, y todavÃa quedan muchas preguntas sin respuesta. Si bien la naturaleza de las emociones está aún por descubrir, la comunidad de profesionales del diseño ha entendido su importancia, y se plantea el desafÃo de interrelacionar ambos mundos, explorando de las cualidades emocionales en diversos tipos de contenido en medios de comunicación. La primera parte de esta tesis tiene como objetivo la construcción de un marco teórico. Recientemente se han realizado estudios empÃricos que sugieren que las emociones puede ser inconscientes. Si bien esto debe justificarse mejor, los cientÃficos están motivados a reconsiderar las teorÃas actuales de la emoción para explicar este fenómeno. En vista de ello, integramos estos estudios sobre las emociones inconscientes en nuestra revisión de referencias bibliográficas incluyendo dominios de aplicación recientes, tales como la Computación Afectiva y el Diseño Emocional. Una dirección prometedora de investigación se basa en la aplicación de métodos del psicoanálisis para analizar contenidos multimedia como estÃmulos afectivos, y estos estÃmulos pueden ser evaluados mediante el uso de medidas cuantitativas para investigar la conexión entre el contenido y las emociones correspondientes. Este análisis se basa en la teorÃa de los arquetipos propuesto por el psicólogo Carl Jung. El autor sostiene que existe una patrón universal en los pensamientos inconscientes de los personas, que puede manifestarse como un sÃmbolo contenido en las diversas formas de narrativas, como en los mitos y los cuentos de hadas. Hoy en dÃa, estos arquetipos de contenido simbólico se puede ver frecuentemente en los contenidos multimedia modernos, sobre todo en las pelÃculas. Mediante la aplicación del enfoque de Jung, analizamos el significado simbólico en escenas de pelÃculas seleccionando las correspondientes a diversos arquetipos, que servirá como material experimental para exploraciones posteriores. En la segunda parte de esta tesis, se presentan tres estudios experimentales que apuntan a determinar si el contenido multimedia arquetÃpico puede diferenciarse en base a respuestas emocionales. Con el enfoque psicoanalÃtico descrito anteriormente para los arquetipos, también se incluye los estÃmulos afectivos de emociones explÃcitas son como puntos de referencia para la comparación, como la tristeza y la alegrÃa. Se realizan auto-informes y se miden señales fisiológicas para la determinación de las respuestas emocionales en todos los experimentos realizados. Los resultados de estos estudios confirman que las emociones inducidas por arquetipos son diferentes de las emociones explÃcitas, y el análisis estadÃstico indica además que el modelo predictivo obtenido a partir de señales fisiológicas supera el modelo generado por los auto-informes durante la visualización de contenidos multimedia arquetÃpicos. Estos resultados, sin embargo, son opuestos a los resultados obtenidos a partir de los estÃmulos afectivos de emociones explÃcitas, llevándonos a la conclusión de que los contenidos de los medios arquetÃpicos podrÃa inducir emociones inconscientes, y que las señales fisiológicas son más eficaces que los auto informes para el reconocimiento de las emociones inducidas por el contenido de medios arquetÃpico. En la tercera parte de esta tesis, exploramos cómo los contenidos arquetÃpicos podrÃan utilizarse para diseñar contenido multimedia mediante "mood boards". Se realizaron dos estudios con diseñadores para responder a la pregunta de investigación de si es posible generar contenido emocionalmente rico a través de la generación automática de contenido arquetÃpico por "mood boards" en comparación con el contenido multimedia no arquetÃpico
- …