56,304 research outputs found

    Why I tense up when you watch me: inferior parietal cortex mediates an audience’s influence on motor performance

    Get PDF
    The presence of an evaluative audience can alter skilled motor performance through changes in force output. To investigate how this is mediated within the brain, we emulated real-time social monitoring of participants’ performance of a fine grip task during functional magnetic resonance neuroimaging. We observed an increase in force output during social evaluation that was accompanied by focal reductions in activity within bilateral inferior parietal cortex. Moreover, deactivation of the left inferior parietal cortex predicted both inter- and intra-individual differences in socially-induced change in grip force. Social evaluation also enhanced activation within the posterior superior temporal sulcus, which conveys visual information about others’ actions to the inferior parietal cortex. Interestingly, functional connectivity between these two regions was attenuated by social evaluation. Our data suggest that social evaluation can vary force output through the altered engagement of inferior parietal cortex; a region implicated in sensorimotor integration necessary for object manipulation, and a component of the action-observation network which integrates and facilitates performance of observed actions. Social-evaluative situations may induce high-level representational incoherence between one’s own intentioned action and the perceived intention of others which, by uncoupling the dynamics of sensorimotor facilitation, could ultimately perturbe motor output

    Music Therapy Techniques for Memory Stabilization in Diverse Dementias

    Get PDF
    Music contains certain unmistakable healing properties pertaining specifically to the matured body and soul affected by various types of dementia. Music therapy aids in memory retention or the retarding of the loss of mental function as a result of Alzheimer\u27s disease, Dementia with Lewy bodies, and Senile Dementia. Music can help subjects access lost memories through interaction with a music therapist. Certain music therapy techniques have been shown to yield additional physical, communicative, and psychological benefits. The disease progress of Alzheimer\u27s disease, Dementia with Lewy bodies, and Senile Dementia may be further delayed by music therapy when paired with pharmaceutical interventions such as previously established memory enhancing medications

    Object Referring in Visual Scene with Spoken Language

    Full text link
    Object referring has important applications, especially for human-machine interaction. While having received great attention, the task is mainly attacked with written language (text) as input rather than spoken language (speech), which is more natural. This paper investigates Object Referring with Spoken Language (ORSpoken) by presenting two datasets and one novel approach. Objects are annotated with their locations in images, text descriptions and speech descriptions. This makes the datasets ideal for multi-modality learning. The approach is developed by carefully taking down ORSpoken problem into three sub-problems and introducing task-specific vision-language interactions at the corresponding levels. Experiments show that our method outperforms competing methods consistently and significantly. The approach is also evaluated in the presence of audio noise, showing the efficacy of the proposed vision-language interaction methods in counteracting background noise.Comment: 10 pages, Submitted to WACV 201

    EMID: An Emotional Aligned Dataset in Audio-Visual Modality

    Full text link
    In this paper, we propose Emotionally paired Music and Image Dataset (EMID), a novel dataset designed for the emotional matching of music and images, to facilitate auditory-visual cross-modal tasks such as generation and retrieval. Unlike existing approaches that primarily focus on semantic correlations or roughly divided emotional relations, EMID emphasizes the significance of emotional consistency between music and images using an advanced 13-dimension emotional model. By incorporating emotional alignment into the dataset, it aims to establish pairs that closely align with human perceptual understanding, thereby raising the performance of auditory-visual cross-modal tasks. We also design a supplemental module named EMI-Adapter to optimize existing cross-modal alignment methods. To validate the effectiveness of the EMID, we conduct a psychological experiment, which has demonstrated that considering the emotional relationship between the two modalities effectively improves the accuracy of matching in abstract perspective. This research lays the foundation for future cross-modal research in domains such as psychotherapy and contributes to advancing the understanding and utilization of emotions in cross-modal alignment. The EMID dataset is available at https://github.com/ecnu-aigc/EMID

    Deep Learning Techniques for Music Generation -- A Survey

    Full text link
    This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. We propose a methodology based on five dimensions for our analysis: Objective - What musical content is to be generated? Examples are: melody, polyphony, accompaniment or counterpoint. - For what destination and for what use? To be performed by a human(s) (in the case of a musical score), or by a machine (in the case of an audio file). Representation - What are the concepts to be manipulated? Examples are: waveform, spectrogram, note, chord, meter and beat. - What format is to be used? Examples are: MIDI, piano roll or text. - How will the representation be encoded? Examples are: scalar, one-hot or many-hot. Architecture - What type(s) of deep neural network is (are) to be used? Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges? Examples are: variability, interactivity and creativity. Strategy - How do we model and control the process of generation? Examples are: single-step feedforward, iterative feedforward, sampling or input manipulation. For each dimension, we conduct a comparative analysis of various models and techniques and we propose some tentative multidimensional typology. This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature. These systems are described and are used to exemplify the various choices of objective, representation, architecture, challenge and strategy. The last section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P. Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music Generation, Computational Synthesis and Creative Systems, Springer, 201

    Generating Realistic Images from In-the-wild Sounds

    Full text link
    Representing wild sounds as images is an important but challenging task due to the lack of paired datasets between sound and images and the significant differences in the characteristics of these two modalities. Previous studies have focused on generating images from sound in limited categories or music. In this paper, we propose a novel approach to generate images from in-the-wild sounds. First, we convert sound into text using audio captioning. Second, we propose audio attention and sentence attention to represent the rich characteristics of sound and visualize the sound. Lastly, we propose a direct sound optimization with CLIPscore and AudioCLIP and generate images with a diffusion-based model. In experiments, it shows that our model is able to generate high quality images from wild sounds and outperforms baselines in both quantitative and qualitative evaluations on wild audio datasets.Comment: Accepted to ICCV 202

    Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset

    Full text link
    Audio signals represent a wide diversity of acoustic events, from background environmental noise to spoken communication. Machine learning models such as neural networks have already been proposed for audio signal modeling, where recurrent structures can take advantage of temporal dependencies. This work aims to study the implementation of several neural network-based systems for speech and music event detection over a collection of 77,937 10-second audio segments (216 h), selected from the Google AudioSet dataset. These segments belong to YouTube videos and have been represented as mel-spectrograms. We propose and compare two approaches. The first one is the training of two different neural networks, one for speech detection and another for music detection. The second approach consists on training a single neural network to tackle both tasks at the same time. The studied architectures include fully connected, convolutional and LSTM (long short-term memory) recurrent networks. Comparative results are provided in terms of classification performance and model complexity. We would like to highlight the performance of convolutional architectures, specially in combination with an LSTM stage. The hybrid convolutional-LSTM models achieve the best overall results (85% accuracy) in the three proposed tasks. Furthermore, a distractor analysis of the results has been carried out in order to identify which events in the ontology are the most harmful for the performance of the models, showing some difficult scenarios for the detection of music and speechThis work has been supported by project “DSSL: Redes Profundas y Modelos de Subespacios para Deteccion y Seguimiento de Locutor, Idioma y Enfermedades Degenerativas a partir de la Voz” (TEC2015-68172-C2-1-P), funded by the Ministry of Economy and Competitivity of Spain and FEDE
    • 

    corecore