67 research outputs found

    Extraction and representation of semantic information in digital media

    Get PDF

    能の謡分析のための深層学習によるブラインド音源分離を用いたメロディ抽出

    Get PDF
    The purpose of this study is to extract singing melody from mixed sounds related to Noh performances. Noh sounds include singing, accompaniments, and other elements. For analyzing Noh singing, we need singing solos, but they are hard to collect since there are only a few sources of solo passages. Therefore, we focus on the extraction of singing melody from mixtures of accompaniments and singing. In this paper, we demonstrate that source separation can be introduced as an efficient preprocessing step for Noh singing melody extraction. In addition, we compare melody extraction based on a convolutional neural network (CNN) and Long short-term memory (LSTM) approach with Melodia, a plug-in for melody extraction which is particularly accurate in the presence of music with wide fluctuations in pitch. Raw Pitch Accuracy and Overall Accuracy are introduced as evaluation metrics. Our experimental results show that it is efficient for melody extraction to introduce source separation. We also demonstrated that Deep learning-based melody estimation can be efficiently trained using singing after source separation

    Proceedings of the 19th Sound and Music Computing Conference

    Get PDF
    Proceedings of the 19th Sound and Music Computing Conference - June 5-12, 2022 - Saint-Étienne (France). https://smc22.grame.f

    Proceedings of the 7th Sound and Music Computing Conference

    Get PDF
    Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010

    Virtual Reality Games for Motor Rehabilitation

    Get PDF
    This paper presents a fuzzy logic based method to track user satisfaction without the need for devices to monitor users physiological conditions. User satisfaction is the key to any product’s acceptance; computer applications and video games provide a unique opportunity to provide a tailored environment for each user to better suit their needs. We have implemented a non-adaptive fuzzy logic model of emotion, based on the emotional component of the Fuzzy Logic Adaptive Model of Emotion (FLAME) proposed by El-Nasr, to estimate player emotion in UnrealTournament 2004. In this paper we describe the implementation of this system and present the results of one of several play tests. Our research contradicts the current literature that suggests physiological measurements are needed. We show that it is possible to use a software only method to estimate user emotion

    Digital neuromorphic auditory systems

    Get PDF
    This dissertation presents several digital neuromorphic auditory systems. Neuromorphic systems are capable of running in real-time at a smaller computing cost and consume lower power than on widely available general computers. These auditory systems are considered neuromorphic as they are modelled after computational models of the mammalian auditory pathway and are capable of running on digital hardware, or more specifically on a field-programmable gate array (FPGA). The models introduced are categorised into three parts: a cochlear model, an auditory pitch model, and a functional primary auditory cortical (A1) model. The cochlear model is the primary interface of an input sound signal and transmits the 2D time-frequency representation of the sound to the pitch models as well as to the A1 model. In the pitch model, pitch information is extracted from the sound signal in the form of a fundamental frequency. From the A1 model, timbre information in the form of time-frequency envelope information of the sound signal is extracted. Since the computational auditory models mentioned above are required to be implemented on FPGAs that possess fewer computational resources than general-purpose computers, the algorithms in the models are optimised so that they fit on a single FPGA. The optimisation includes using simplified hardware-implementable signal processing algorithms. Computational resource information of each model on FPGA is extracted to understand the minimum computational resources required to run each model. This information includes the quantity of logic modules, register quantity utilised, and power consumption. Similarity comparisons are also made between the output responses of the computational auditory models on software and hardware using pure tones, chirp signals, frequency-modulated signal, moving ripple signals, and musical signals as input. The limitation of the responses of the models to musical signals at multiple intensity levels is also presented along with the use of an automatic gain control algorithm to alleviate such limitations. With real-world musical signals as their inputs, the responses of the models are also tested using classifiers – the response of the auditory pitch model is used for the classification of monophonic musical notes, and the response of the A1 model is used for the classification of musical instruments with their respective monophonic signals. Classification accuracy results are shown for model output responses on both software and hardware. With the hardware implementable auditory pitch model, the classification score stands at 100% accuracy for musical notes from the 4th and 5th octaves containing 24 classes of notes. With the hardware implementation auditory timbre model, the classification score is 92% accuracy for 12 classes musical instruments. Also presented is the difference in memory requirements of the model output responses on both software and hardware – pitch and timbre responses used for the classification exercises use 24 and 2 times less memory space for hardware than software

    Harmonic duality : from interval ratios and pitch distance to spectra and sensory dissonance

    Get PDF
    Dissonance curves are the starting point for an investigation into a psychoacoustically informed harmony. Its main hypothesis is that harmony consists of two independent but intertwined aspects operating simultaneously, namely proportionality and linear pitch distance. The former aspect is related to intervallic characters, the latter to ‘high’, ‘low’, ‘bright’ and ‘dark’, therefore to timbre. This research derives from the development of tools for algorithmic composition which extract pitch materials from sound signals, analyzing them according to their timbral and harmonic properties, putting them into motion through diverse rhythmic and textural procedures. The tools and the reflections derived from their use offer fertile ideas for the generation of instrumental scores, electroacoustic soundscapes and interactive live-electronic systems.LEI Universiteit LeidenResearch in and through artistic practic

    Ultrasound cleaning of microfilters

    Get PDF
    corecore