8 research outputs found

    Deep Learning for Audio Segmentation and Intelligent Remixing

    Get PDF
    Audio segmentation divides an audio signal into homogenous sections such as music and speech. It is useful as a preprocessing step to index, store, and modify audio recordings, radio broadcasts and TV programmes. Machine learning models for audio segmentation are generally trained on copyrighted material, which cannot be shared across research groups. Furthermore, annotating these datasets is a time-consuming and expensive task. In this thesis, we present a novel approach that artificially synthesises data that resembles radio signals. We replicate the workflow of a radio DJ in mixing audio and investigate parameters like fade curves and audio ducking. Using this approach, we obtained state-of-the-art performance for music-speech detection on in-house and public datasets. After demonstrating the efficacy of training set synthesis, we investigate how audio ducking of background music impacts the precision and recall of the machine learning algorithm. Interestingly, we observed that the minimum level of audio ducking preferred by the machine learning algorithm was similar to that of human listeners. Furthermore, we observe that our proposed synthesis technique outperforms real-world data in some cases and serves as a promising alternative. This project also proposes a novel deep learning system called You Only Hear Once (YOHO), which is inspired by the YOLO algorithm popularly adopted in Computer Vision. We convert the detection of acoustic boundaries into a regression problem instead of frame-based classification. The relative improvement for F-measure of YOHO, compared to the state-of-the-art Convolutional Recurrent Neural Network, ranged from 1% to 6% across multiple datasets. As YOHO predicts acoustic boundaries directly, the speed of inference and post-processing steps are 6 times faster than frame-based classification. Furthermore, we investigate domain generalisation methods such as transfer learning and adversarial training. We demonstrated that these methods helped our algorithm perform better in unseen domains. In addition to audio segmentation, another objective of this project is to explore real-time radio remixing. This is a step towards building a customised radio and consequently, integrating it with the schedule of the listener. The system would remix music from the user’s personal playlist and play snippets of diary reminders at appropriate transition points. The intelligent remixing is governed by the underlying audio segmentation and other deep learning methods. We also explore how individuals can communicate with intelligent mixing systems through non-technical language. We demonstrated that word embeddings help in understanding representations of semantic descriptors

    Investigation into Stand-alone Brain-computer Interfaces for Musical Applications

    Get PDF
    Brain-computer interfaces (BCIs) aim to establish a communication medium that is independent of muscle control. This project investigates how BCIs can be harnessed for musical applications. The impact of such systems is twofold — (i) it offers a novel mechanism of control for musicians during performance and (ii) it is beneficial for patients who are suffering from motor disabilities. Several challenges are encountered when attempting to move these technologies from laboratories to real-world scenarios. Additionally, BCIs are significantly different from conventional computer interfaces and realise low communication rates. This project considers these challenges and uses a dry and wireless electroencephalogram (EEG) headset to detect neural activity. It adopts a paradigm called steady state visually evoked potential (SSVEP) to provide the user with control. It aims to encapsulate all braincomputer music interface (BCMI)-based operations into a stand-alone application, which would improve the portability of BCMIs. This projects addresses various engineering problems that are faced while developing a stand-alone BCMI. In order to efficiently present the visual stimulus for SSVEP, it requires hardware-accelerated rendering. EEG data is received from the headset through Bluetooth and thus, a dedicated thread is designed to receive signals. As this thesis is not using medical-grade equipment to detect EEG, signal processing techniques need to be examined to improve the signal to noise ratio (SNR) of brain waves. This projects adopts canonical correlation analysis (CCA), which is multi-variate statistical technique and explores filtering algorithms to improve communication rates of BCMIs. Furthermore, this project delves into optimising biomedical engineering-based parameters, such as placement of the EEG headset and size of the visual stimulus. After implementing the optimisations, for a time window of 4s and 2s, the mean accuracies of the BCMI are 97.92±2.22% and 88.02±9.30% respectively. The obtained information transfer rate (ITR) is 36.56±9.17 bits min-1, which surpasses communication rates of earlier BCMIs. This thesis concludes by building a system which encompasses a novel control flow, which allows the user to play a musical instrument by gazing at it.The School of Humanities and Performing Arts, University of Plymout

    Word Embeddings for Automatic Equalization in Audio Mixing

    Get PDF
    In recent years, machine learning has been widely adopted to automate the audio mixing process. Automatic mixing systems have been applied to var- ious audio effects such as gain-adjustment, stereo panning, equalization, and reverberation. These systems can be controlled through visual interfaces, pro- viding audio examples, using knobs, and semantic descriptors. Using semantic descriptors or textual information to control these systems is an effective way for artists to communicate their creative goals. Furthermore, sometimes artists use non-technical words that may not be understood by the mixing system, or even a mixing engineer. In this paper, we explore the novel idea of using word embeddings to represent semantic descriptors. Word embeddings are generally obtained by training neural networks on large corpora of written text. These embeddings serve as the input layer of the neural network to create a trans- lation from words to EQ settings. Using this technique, the machine learning model can also generate EQ settings for semantic descriptors that it has not seen before. We perform experiments to demonstrate the feasibility of this idea. In addition, we compare the EQ settings of humans with the predictions of the neural network to evaluate the quality of predictions. The results showed that the embedding layer enables the neural network to understand semantic descrip- tors. We observed that the models with embedding layers perform better those without embedding layers, but not as good as human labels

    RadioMe: Adaptive Radio with Music Intervention and Reminder System for People with Dementia in Their Own Home

    Get PDF
    The population of the world is continuously growing older, leading to more people with dementia who need support while living in their own home. Our RadioMe system was designed to adapt a live radio stream with reminders and music intervention for agitation mitigation for people with dementia living in their own home. In this demonstration we present our prototype, with features to record reminders and schedule them to be played during the live radio stream and a music intervention system when agitation is detected

    RadioMe: Supporting Individuals with Dementia in Their Own Home... and Beyond?

    Get PDF
    Dementia is an illness with complex health needs, varying between individuals and increasing in severity over time. Approaches to use technology to aid people with dementia are often designed for a specific environment and/or purpose, such as the RadioMe system, a system designed to detect agitation in people with mild dementia living in their own home and calming them with music when agitation is detected. Both the monitoring and intervention components could potentially be beneficially used outside of the own home to aid people with dementia and carers in everyday life. But the adaptation could put additional burdens on the carer, as many decisions and the handling of the data and software could rely on their input. In this paper we discuss thoughts on the potential role of the carer for adaptations of specified system’s expansion to a larger ecosystem on the example of RadioMe

    Dataset for: Effect of Densely Ionizing Radiation on Cardiomyocyte Differentiation from Human Induced Pluripotent Stem Cells

    No full text
    The process of human cardiac development can be faithfully recapitulated in a culture dish with human pluripotent stem cells, where the impact of environmental stressors can be evaluated. The consequences of ionizing radiation exposure on human cardiac differentiation are largely unknown. In this study, human induced pluripotent stem cell cultures (hiPSCs) were subjected to an external beam of 3.7 MeV α-particles at low mean absorbed doses of 0.5, 3 and 10 cGy. Subsequently, the hiPSCs were differentiated into beating cardiac myocytes (hiPSC-CMs). Pluripotent and cardiac markers and morphology did not reveal differences between the irradiated and non-irradiated groups. While cell number was not affected during CM differentiation, cell number of differentiated CMs was severely reduced by ionizing radiation in a dose-responsive manner. β-adrenergic stimulation causes calcium (Ca2+) overload and oxidative stress. Although no significant increase in Ca2+ transient amplitude was observed in any group after treatment with 1 µM isoproterenol, the incidence of spontaneous Ca2+ waves/releases was more frequent in hiPSC-CMs of the irradiated groups, indicating arrhythmogenic activities at the single cell level. Increased transcript expression of mitochondrial biomarkers (LONP1, TFAM) and mtDNA-encoded genes (MT-CYB, MT-RNR1) was detected upon differentiation of hiPSC-CMs suggesting increased organelle biogenesis. Exposure of hiPSC-CM cultures to 10 cGy significantly upregulated MT-CYB and MT-RNR1 expression, which may reflect an adaptive response to ionizing radiation. Our results indicate that important aspects of differentiation of hiPSCs into cardiac myocytes may be affected by low fluences of densely ionizing radiations such as α-particles
    corecore