784 research outputs found

    Perception of emotion in music in adults with cochlear implants

    Get PDF
    Music is an integral aspect of culture that is uniquely tied to our emotions. Previous studies have shown that hearing loss and cochlear implantation have deleterious effects on music and emotion perception, particularly cues related to pitch, melody, and mode. The purpose of this study is to examine acoustic cues that adults with cochlear implants and adults with normal hearing might use to perceive emotion in music (e.g., tempo and pitch range). One adult (ages 18-50 years) with a cochlear implant and 15 adults who have normal hearing were tested. The participants listened to a series of 40 melodies which varied along tempo and pitch range. Ten melodies conveyed sadness (small pitch range; slow tempo) and 10 conveyed happiness (large pitch range; fast tempo). The remaining 20 presented conflicting cues (small pitch range + fast tempo or large pitch range + slow tempo). We asked participants to rate the emotion of the musical excerpt on a 7-point Likert scale along three dimensions: happy-sad, pleasant-unpleasant, and engaged-unengaged. Results showed that adults with NH and CIs relied on tempo more than pitch range when perceiving emotion in music, but in two instances adults with NH took pitch range into account when rating. The results from this study will help shed light on how effectively cochlear implants convey musical emotions, and could eventually lead to improvements in music perception in listeners with hearing loss

    On the mechanism of response latencies in auditory nerve fibers

    Get PDF
    Despite the structural differences of the middle and inner ears, the latency pattern in auditory nerve fibers to an identical sound has been found similar across numerous species. Studies have shown the similarity in remarkable species with distinct cochleae or even without a basilar membrane. This stimulus-, neuron-, and species- independent similarity of latency cannot be simply explained by the concept of cochlear traveling waves that is generally accepted as the main cause of the neural latency pattern. An original concept of Fourier pattern is defined, intended to characterize a feature of temporal processing—specifically phase encoding—that is not readily apparent in more conventional analyses. The pattern is created by marking the first amplitude maximum for each sinusoid component of the stimulus, to encode phase information. The hypothesis is that the hearing organ serves as a running analyzer whose output reflects synchronization of auditory neural activity consistent with the Fourier pattern. A combined research of experimental, correlational and meta-analysis approaches is used to test the hypothesis. Manipulations included phase encoding and stimuli to test their effects on the predicted latency pattern. Animal studies in the literature using the same stimulus were then compared to determine the degree of relationship. The results show that each marking accounts for a large percentage of a corresponding peak latency in the peristimulus-time histogram. For each of the stimuli considered, the latency predicted by the Fourier pattern is highly correlated with the observed latency in the auditory nerve fiber of representative species. The results suggest that the hearing organ analyzes not only amplitude spectrum but also phase information in Fourier analysis, to distribute the specific spikes among auditory nerve fibers and within a single unit. This phase-encoding mechanism in Fourier analysis is proposed to be the common mechanism that, in the face of species differences in peripheral auditory hardware, accounts for the considerable similarities across species in their latency-by-frequency functions, in turn assuring optimal phase encoding across species. Also, the mechanism has the potential to improve phase encoding of cochlear implants

    Sound, mind and emotion - research and aspects

    Get PDF
    Sound, mind and emotion Rapport nr 8 (seminarier 1 jan, 11 april resp 25 maj 2008) Innehåll: Patrik Juslin: Sound of music - Seven ways in which the brain can evoke emotions from sounds, Ulf Rosenhall: Auditory problems - not only an issue of impaired hearing, Sören Nielzén, Olle Olsson, Johan Källstrand & Sara Nehlstedt: The role of psychoacoustics for the research on neuro-psychiatric states - Theoretical basis of the S-Detect method, Gerhard Andersson: Tinnitus and Hypersensititvity to Sounds, Kerstin Persson - Waye: "It sounds like a buzzing wasp in my head" - children's perspective of the sound environment in pre-schools, Björn Lyxell, Erik Borg & Inga-Stina Olsson: Cognitive skills and perceived effort in active and passive listening in a naturalistic sound environment, Åke Iwar: Sound, Catastrophy and Trauma, Kerstin Bergh Johannesson: Sounds as triggers - How traumatic memories can be processed by Eye Movement Desensitization and Reprocessing. 138

    Idealized computational models for auditory receptive fields

    Full text link
    This paper presents a theory by which idealized models of auditory receptive fields can be derived in a principled axiomatic manner, from a set of structural properties to enable invariance of receptive field responses under natural sound transformations and ensure internal consistency between spectro-temporal receptive fields at different temporal and spectral scales. For defining a time-frequency transformation of a purely temporal sound signal, it is shown that the framework allows for a new way of deriving the Gabor and Gammatone filters as well as a novel family of generalized Gammatone filters, with additional degrees of freedom to obtain different trade-offs between the spectral selectivity and the temporal delay of time-causal temporal window functions. When applied to the definition of a second-layer of receptive fields from a spectrogram, it is shown that the framework leads to two canonical families of spectro-temporal receptive fields, in terms of spectro-temporal derivatives of either spectro-temporal Gaussian kernels for non-causal time or the combination of a time-causal generalized Gammatone filter over the temporal domain and a Gaussian filter over the logspectral domain. For each filter family, the spectro-temporal receptive fields can be either separable over the time-frequency domain or be adapted to local glissando transformations that represent variations in logarithmic frequencies over time. Within each domain of either non-causal or time-causal time, these receptive field families are derived by uniqueness from the assumptions. It is demonstrated how the presented framework allows for computation of basic auditory features for audio processing and that it leads to predictions about auditory receptive fields with good qualitative similarity to biological receptive fields measured in the inferior colliculus (ICC) and primary auditory cortex (A1) of mammals.Comment: 55 pages, 22 figures, 3 table

    A transfer learning framework for predicting the emotional content of generalized sound events

    Get PDF
    Predicting the emotions evoked by generalized sound events is a relatively recent research domain which still needs attention. In this work a framework aiming to reveal potential similarities existing during the perception of emotions evoked by sound events and songs is presented. To this end the following are proposed: (a) the usage of temporal modulation features, (b) a transfer learning module based on an echo state network, and (c) a k-medoids clustering algorithm predicting valence and arousal measurements associated with generalized sound events. The effectiveness of the proposed solution is demonstrated after a thoroughly designed experimental phase employing both sound and music data. The results demonstrate the importance of transfer learning in the specific field and encourage further research on approaches which manage the problem in a synergistic way

    Apprentissage automatique pour le codage cognitif de la parole

    Get PDF
    Depuis les années 80, les codecs vocaux reposent sur des stratégies de codage à court terme qui fonctionnent au niveau de la sous-trame ou de la trame (généralement 5 à 20 ms). Les chercheurs ont essentiellement ajusté et combiné un nombre limité de technologies disponibles (transformation, prédiction linéaire, quantification) et de stratégies (suivi de forme d'onde, mise en forme du bruit) pour construire des architectures de codage de plus en plus complexes. Dans cette thèse, plutôt que de s'appuyer sur des stratégies de codage à court terme, nous développons un cadre alternatif pour la compression de la parole en codant les attributs de la parole qui sont des caractéristiques perceptuellement importantes des signaux vocaux. Afin d'atteindre cet objectif, nous résolvons trois problèmes de complexité croissante, à savoir la classification, la prédiction et l'apprentissage des représentations. La classification est un élément courant dans les conceptions de codecs modernes. Dans un premier temps, nous concevons un classifieur pour identifier les émotions, qui sont parmi les attributs à long terme les plus complexes de la parole. Dans une deuxième étape, nous concevons un prédicteur d'échantillon de parole, qui est un autre élément commun dans les conceptions de codecs modernes, pour mettre en évidence les avantages du traitement du signal de parole à long terme et non linéaire. Ensuite, nous explorons les variables latentes, un espace de représentations de la parole, pour coder les attributs de la parole à court et à long terme. Enfin, nous proposons un réseau décodeur pour synthétiser les signaux de parole à partir de ces représentations, ce qui constitue notre dernière étape vers la construction d'une méthode complète de compression de la parole basée sur l'apprentissage automatique de bout en bout. Bien que chaque étape de développement proposée dans cette thèse puisse faire partie d'un codec à elle seule, chaque étape fournit également des informations et une base pour la prochaine étape de développement jusqu'à ce qu'un codec entièrement basé sur l'apprentissage automatique soit atteint. Les deux premières étapes, la classification et la prédiction, fournissent de nouveaux outils qui pourraient remplacer et améliorer des éléments des codecs existants. Dans la première étape, nous utilisons une combinaison de modèle source-filtre et de machine à état liquide (LSM), pour démontrer que les caractéristiques liées aux émotions peuvent être facilement extraites et classées à l'aide d'un simple classificateur. Dans la deuxième étape, un seul réseau de bout en bout utilisant une longue mémoire à court terme (LSTM) est utilisé pour produire des trames vocales avec une qualité subjective élevée pour les applications de masquage de perte de paquets (PLC). Dans les dernières étapes, nous nous appuyons sur les résultats des étapes précédentes pour concevoir un codec entièrement basé sur l'apprentissage automatique. un réseau d'encodage, formulé à l'aide d'un réseau neuronal profond (DNN) et entraîné sur plusieurs bases de données publiques, extrait et encode les représentations de la parole en utilisant la prédiction dans un espace latent. Une approche d'apprentissage non supervisé basée sur plusieurs principes de cognition est proposée pour extraire des représentations à partir de trames de parole courtes et longues en utilisant l'information mutuelle et la perte contrastive. La capacité de ces représentations apprises à capturer divers attributs de la parole à court et à long terme est démontrée. Enfin, une structure de décodage est proposée pour synthétiser des signaux de parole à partir de ces représentations. L'entraînement contradictoire est utilisé comme une approximation des mesures subjectives de la qualité de la parole afin de synthétiser des échantillons de parole à consonance naturelle. La haute qualité perceptuelle de la parole synthétisée ainsi obtenue prouve que les représentations extraites sont efficaces pour préserver toutes sortes d'attributs de la parole et donc qu'une méthode de compression complète est démontrée avec l'approche proposée.Abstract: Since the 80s, speech codecs have relied on short-term coding strategies that operate at the subframe or frame level (typically 5 to 20ms). Researchers essentially adjusted and combined a limited number of available technologies (transform, linear prediction, quantization) and strategies (waveform matching, noise shaping) to build increasingly complex coding architectures. In this thesis, rather than relying on short-term coding strategies, we develop an alternative framework for speech compression by encoding speech attributes that are perceptually important characteristics of speech signals. In order to achieve this objective, we solve three problems of increasing complexity, namely classification, prediction and representation learning. Classification is a common element in modern codec designs. In a first step, we design a classifier to identify emotions, which are among the most complex long-term speech attributes. In a second step, we design a speech sample predictor, which is another common element in modern codec designs, to highlight the benefits of long-term and non-linear speech signal processing. Then, we explore latent variables, a space of speech representations, to encode both short-term and long-term speech attributes. Lastly, we propose a decoder network to synthesize speech signals from these representations, which constitutes our final step towards building a complete, end-to-end machine-learning based speech compression method. The first two steps, classification and prediction, provide new tools that could replace and improve elements of existing codecs. In the first step, we use a combination of source-filter model and liquid state machine (LSM), to demonstrate that features related to emotions can be easily extracted and classified using a simple classifier. In the second step, a single end-to-end network using long short-term memory (LSTM) is shown to produce speech frames with high subjective quality for packet loss concealment (PLC) applications. In the last steps, we build upon the results of previous steps to design a fully machine learning-based codec. An encoder network, formulated using a deep neural network (DNN) and trained on multiple public databases, extracts and encodes speech representations using prediction in a latent space. An unsupervised learning approach based on several principles of cognition is proposed to extract representations from both short and long frames of data using mutual information and contrastive loss. The ability of these learned representations to capture various short- and long-term speech attributes is demonstrated. Finally, a decoder structure is proposed to synthesize speech signals from these representations. Adversarial training is used as an approximation to subjective speech quality measures in order to synthesize natural-sounding speech samples. The high perceptual quality of synthesized speech thus achieved proves that the extracted representations are efficient at preserving all sorts of speech attributes and therefore that a complete compression method is demonstrated with the proposed approach

    Music Listening as Therapy

    Get PDF
    Music is a universal phenomenon and is a real, physical thing. It is processed in neural circuits that overlap with language circuits, and it exerts cognitive, emotional, and physiological effects on humans. Many of those effects are therapeutic, such as reduced symptoms of physical and mental ailments. Music is the result of the elements rhythm, melody, harmony, timbre, dynamics, and form. Rhythm is the focus of pop music, and melody is the focus of classical music. The mind perceives and organizes music in learned, consistent ways in order to generate predictions and extract meaning. There are perceptual laws and information processing limitations to this process. Predictions are based in schematic and veridical approaches, which give rise to expectations. Frustrated expectations result in an effective response. Music only has meaning unto itself and the music listener ascribes any extra-musical meaning. This includes any emotional meaning. The unfolding of a song is much like how Gestalt Therapy theory conceptualizes human experience. Mindfulness offers a clear definition of how one can frame and approach experience to support health and well-being. MinMuList (said “min-mew-list”) is an evidenced-based workshop that offers a concise discussion and straightforward methods for implementation of these aspects of music and psychology
    corecore