3,015 research outputs found

    Multimodal music information processing and retrieval: survey and future challenges

    Full text link
    Towards improving the performance in various music information processing tasks, recent studies exploit different modalities able to capture diverse aspects of music. Such modalities include audio recordings, symbolic music scores, mid-level representations, motion, and gestural data, video recordings, editorial or cultural tags, lyrics and album cover arts. This paper critically reviews the various approaches adopted in Music Information Processing and Retrieval and highlights how multimodal algorithms can help Music Computing applications. First, we categorize the related literature based on the application they address. Subsequently, we analyze existing information fusion approaches, and we conclude with the set of challenges that Music Information Retrieval and Sound and Music Computing research communities should focus in the next years

    Collaborative creativity: The Music Room

    Get PDF
    In this paper, we reflect on our experience of designing, developing and evaluating interactive spaces for collaborative creativity. In particular, we are interested in designing spaces which allow everybody to compose and play original music. The Music Room is an interactive installation where couples can compose original music by moving in the space. Following the metaphor of love, the music is automatically generated and modulated in terms of pleasantness and intensity, according to the proxemics cues extracted from the visual tracking algorithm. The Music Room was exhibited during the EU Researchers' Night in Trento, Italy

    Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss

    Full text link
    We devise a cascade GAN approach to generate talking face video, which is robust to different face shapes, view angles, facial characteristics, and noisy audio conditions. Instead of learning a direct mapping from audio to video frames, we propose first to transfer audio to high-level structure, i.e., the facial landmarks, and then to generate video frames conditioned on the landmarks. Compared to a direct audio-to-image approach, our cascade approach avoids fitting spurious correlations between audiovisual signals that are irrelevant to the speech content. We, humans, are sensitive to temporal discontinuities and subtle artifacts in video. To avoid those pixel jittering problems and to enforce the network to focus on audiovisual-correlated regions, we propose a novel dynamically adjustable pixel-wise loss with an attention mechanism. Furthermore, to generate a sharper image with well-synchronized facial movements, we propose a novel regression-based discriminator structure, which considers sequence-level information along with frame-level information. Thoughtful experiments on several datasets and real-world samples demonstrate significantly better results obtained by our method than the state-of-the-art methods in both quantitative and qualitative comparisons

    Connecting people through physiosocial technology

    Get PDF
    Social connectedness is one of the most important predictors of health and well-being. The goal of this dissertation is to investigate technologies that can support social connectedness. Such technologies can build upon the notion that disclosing emotional information has a strong positive influence on social connectedness. As physiological signals are strongly related to emotions, they might provide a solid base for emotion communication technologies. Moreover, physiological signals are largely lacking in unmediated communication, have been used successfully by machines to recognize emotions, and can be measured relatively unobtrusively with wearable sensors. Therefore, this doctoral dissertation examines the following research question: How can we use physiological signals in affective technology to improve social connectedness? First, a series of experiments was conducted to investigate if computer interpretations of physiological signals can be used to automatically communicate emotions and improve social connectedness (Chapters 2 and 3). The results of these experiments showed that computers can be more accurate at recognizing emotions than humans are. Physiological signals turned out to be the most effective information source for machine emotion recognition. One advantage of machine based emotion recognition for communication technology may be the increase in the rate at which emotions can be communicated. As expected, experiments showed that increases in the number of communicated emotions increased feelings of closeness between interacting people. Nonetheless, these effects on feelings of closeness are limited if users attribute the cause of the increases in communicated emotions to the technology and not to their interaction partner. Therefore, I discuss several possibilities to incorporate emotion recognition technologies in applications in such a way that users attribute the communication to their interaction partner. Instead of using machines to interpret physiological signals, the signals can also be represented to a user directly. This way, the interpretation of the signal is left to be done by the user. To explore this, I conducted several studies that employed heartbeat representations as a direct physiological communication signal. These studies showed that people can interpret such signals in terms of emotions (Chapter 4) and that perceiving someone's heartbeat increases feelings of closeness between the perceiver and sender of the signal (Chapter 5). Finally, we used a field study (Chapter 6) to investigate the potential of heartbeat communication mechanisms in practice. This again confirmed that heartbeat can provide an intimate connection to another person, showing the potential for communicating physiological signals directly to improve connectedness. The last part of the dissertation builds upon the notion that empathy has positive influences on social connectedness. Therefore, I developed a framework for empathic computing that employed automated empathy measurement based on physiological signals (Chapter 7). This framework was applied in a system that can train empathy (Chapter 8). The results showed that providing users frequent feedback about their physiological synchronization with others can help them to improve empathy as measured through self-report and physiological synchronization. In turn, this improves understanding of the other and helps people to signal validation and caring, which are types of communication that improve social connectedness. Taking the results presented in this dissertation together, I argue that physiological signals form a promising modality to apply in communication technology (Chapter 9). This dissertation provides a basis for future communication applications that aim to improve social connectedness

    The dancer in the eye: Towards a multi-layered computational framework of qualities in movement

    Get PDF
    This paper presents a conceptual framework for the analysis of expressive qualities of movement. Our perspective is to model an observer of a dance performance. The conceptual framework is made of four layers, ranging from the physical signals that sensors capture to the qualities that movement communicate (e.g., in terms of emotions). The framework aims to provide a conceptual background the development of computational systems can build upon, with a particular reference to systems analyzing a vocabulary of expressive movement qualities, and translating them to other sensory channels, such as the auditory modality. Such systems enable their users to "listen to a choreography" or to "feel a ballet", in a new kind of cross-modal mediated experience
    • …
    corecore