17,480 research outputs found

    Sensing and mapping for interactive performance

    Get PDF
    This paper describes a trans-domain mapping (TDM) framework for translating meaningful activities from one creative domain onto another. The multi-disciplinary framework is designed to facilitate an intuitive and non-intrusive interactive multimedia performance interface that offers the users or performers real-time control of multimedia events using their physical movements. It is intended to be a highly dynamic real-time performance tool, sensing and tracking activities and changes, in order to provide interactive multimedia performances. From a straightforward definition of the TDM framework, this paper reports several implementations and multi-disciplinary collaborative projects using the proposed framework, including a motion and colour-sensitive system, a sensor-based system for triggering musical events, and a distributed multimedia server for audio mapping of a real-time face tracker, and discusses different aspects of mapping strategies in their context. Plausible future directions, developments and exploration with the proposed framework, including stage augmenta tion, virtual and augmented reality, which involve sensing and mapping of physical and non-physical changes onto multimedia control events, are discussed

    Semi-Supervised Speech Emotion Recognition with Ladder Networks

    Full text link
    Speech emotion recognition (SER) systems find applications in various fields such as healthcare, education, and security and defense. A major drawback of these systems is their lack of generalization across different conditions. This problem can be solved by training models on large amounts of labeled data from the target domain, which is expensive and time-consuming. Another approach is to increase the generalization of the models. An effective way to achieve this goal is by regularizing the models through multitask learning (MTL), where auxiliary tasks are learned along with the primary task. These methods often require the use of labeled data which is computationally expensive to collect for emotion recognition (gender, speaker identity, age or other emotional descriptors). This study proposes the use of ladder networks for emotion recognition, which utilizes an unsupervised auxiliary task. The primary task is a regression problem to predict emotional attributes. The auxiliary task is the reconstruction of intermediate feature representations using a denoising autoencoder. This auxiliary task does not require labels so it is possible to train the framework in a semi-supervised fashion with abundant unlabeled data from the target domain. This study shows that the proposed approach creates a powerful framework for SER, achieving superior performance than fully supervised single-task learning (STL) and MTL baselines. The approach is implemented with several acoustic features, showing that ladder networks generalize significantly better in cross-corpus settings. Compared to the STL baselines, the proposed approach achieves relative gains in concordance correlation coefficient (CCC) between 3.0% and 3.5% for within corpus evaluations, and between 16.1% and 74.1% for cross corpus evaluations, highlighting the power of the architecture

    3D Time-Based Aural Data Representation Using D4 Library’s Layer Based Amplitude Panning Algorithm

    Get PDF
    Presented at the 22nd International Conference on Auditory Display (ICAD-2016)The following paper introduces a new Layer Based Amplitude Panning algorithm and supporting D4 library of rapid prototyping tools for the 3D time-based data representation using sound. The algorithm is designed to scale and support a broad array of configurations, with particular focus on High Density Loudspeaker Arrays (HDLAs). The supporting rapid prototyping tools are designed to leverage oculocentric strategies to importing, editing, and rendering data, offering an array of innovative approaches to spatial data editing and representation through the use of sound in HDLA scenarios. The ensuing D4 ecosystem aims to address the shortcomings of existing approaches to spatial aural representation of data, offers unique opportunities for furthering research in the spatial data audification and sonification, as well as transportable and scalable spatial media creation and production

    Perceptually-Driven Video Coding with the Daala Video Codec

    Full text link
    The Daala project is a royalty-free video codec that attempts to compete with the best patent-encumbered codecs. Part of our strategy is to replace core tools of traditional video codecs with alternative approaches, many of them designed to take perceptual aspects into account, rather than optimizing for simple metrics like PSNR. This paper documents some of our experiences with these tools, which ones worked and which did not. We evaluate which tools are easy to integrate into a more traditional codec design, and show results in the context of the codec being developed by the Alliance for Open Media.Comment: 19 pages, Proceedings of SPIE Workshop on Applications of Digital Image Processing (ADIP), 201
    corecore