75,747 research outputs found

    Live coding machine learning and machine listening: a survey on the design of languages and environments for live coding

    Get PDF
    The MIMIC (Musically Intelligent Machines Interacting Creatively) project explores how the techniques of machine learning and machine listening can be communicated and implemented in simple terms for composers, instrument makers and performers. The potential for machine learning to support musical composition and performance is high, and with novel techniques in machine listening, we see emerging a technology that can shift from being instrumental to conversational and collaborative. By leveraging the internet as a live software ecosystem, the MIMIC project explores how such technology can best reach artists, and live up to its collaborative potential to fundamentally change creative practice in the field. The project involves creating a high level language that can be used for live coding, creative coding and quick prototyping. Implementing a language that interfaces with technically complex problems such as the design of machine learning neural networks or the temporal and spectral algorithms applied in machine listening is not a simple task, but we can build upon decades of research and practice in programming language design (Ko 2016), and computer music language design in particular, as well as a plethora of inventive new approaches in the design of live coding systems for music (Reina et al. 2019). The language and user interface design will build on recent research in creative coding and interactive machine learning, exemplified by the Rapid Mix project (Bernardo et. al., 2016, Zbyszynski et. al., 2017). Machine learning continues to be at the forefront of new innovations in computer music, (e.g. new sound synthesis techniques in NSynth (Engel et. al. 2017) and WaveNet (van den Oord, 2016)); the language will seek to integrate models based around these new techniques into live coding performance, and also explore the efficacy of live coding as an approach to training and exploiting these systems for analysing and generating sound. Existing live coding systems and languages are often reported on, describing clever solutions as well as weaknesses, as given, for example, in accounts of the development of Tidal (McLean, 2014), Extramuros (Ogborn et. al, 2015) and Gibber (Roberts and Kuchera-Morin, 2012). Researchers are typically reflective and openly critical of their own systems when analysing them and often report on its design with wider implications (Aaron 2011; Sorensen 2018). However, they rarely speculate freely and uninhibitedly about possible solutions or alternative paths taken; the focus is typically on the system described. Before defining the design of our own system, we were therefore interested in opening up a channel where we could learn from other practitioners in language design, machine learning and machine listening. We created a survey that we sent out to relevant communities of practice - such as live coding, machine learning, machine listening, creative coding, deep learning - and asked open questions about how they might imagine a future system implemented, given the knowledge we have today. Below we report on the questionnaire and its findings

    Towards musical interaction : 'Schismatics' for e-violin and computer.

    Get PDF
    This paper discusses the evolution of the Max/MSP patch used in schismatics (2007, rev. 2010) for electric violin (Violectra) and computer, by composer Sam Hayden in collaboration with violinist Mieko Kanno. schismatics involves a standard performance paradigm of a fixed notated part for the e-violin with sonically unfixed live computer processing. Hayden was unsatisfied with the early version of the piece: the use of attack detection on the live e-violin playing to trigger stochastic processes led to an essentially reactive behaviour in the computer, resulting in a somewhat predictable one-toone sonic relationship between them. It demonstrated little internal relationship between the two beyond an initial e-violin ‘action’ causing a computer ‘event’. The revisions in 2010, enabled by an AHRC Practice-Led research award, aimed to achieve 1) a more interactive performance situation and 2) a subtler and more ‘musical’ relationship between live and processed sounds. This was realised through the introduction of sound analysis objects, in particular machine listening and learning techniques developed by Nick Collins. One aspect of the programming was the mapping of analysis data to synthesis parameters, enabling the computer transformations of the e-violin to be directly related to Kanno’s interpretation of the piece in performance

    The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use

    Get PDF
    The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge the interpretability of any result derived using it. In this article, we disprove the claims that all MGR systems are affected in the same ways by these faults, and that the performances of MGR systems in GTZAN are still meaningfully comparable since they all face the same faults. We identify and analyze the contents of GTZAN, and provide a catalog of its faults. We review how GTZAN has been used in MGR research, and find few indications that its faults have been known and considered. Finally, we rigorously study the effects of its faults on evaluating five different MGR systems. The lesson is not to banish GTZAN, but to use it with consideration of its contents.Comment: 29 pages, 7 figures, 6 tables, 128 reference

    Predicting Audio Advertisement Quality

    Full text link
    Online audio advertising is a particular form of advertising used abundantly in online music streaming services. In these platforms, which tend to host tens of thousands of unique audio advertisements (ads), providing high quality ads ensures a better user experience and results in longer user engagement. Therefore, the automatic assessment of these ads is an important step toward audio ads ranking and better audio ads creation. In this paper we propose one way to measure the quality of the audio ads using a proxy metric called Long Click Rate (LCR), which is defined by the amount of time a user engages with the follow-up display ad (that is shown while the audio ad is playing) divided by the impressions. We later focus on predicting the audio ad quality using only acoustic features such as harmony, rhythm, and timbre of the audio, extracted from the raw waveform. We discuss how the characteristics of the sound can be connected to concepts such as the clarity of the audio ad message, its trustworthiness, etc. Finally, we propose a new deep learning model for audio ad quality prediction, which outperforms the other discussed models trained on hand-crafted features. To the best of our knowledge, this is the first large-scale audio ad quality prediction study.Comment: WSDM '18 Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, 9 page
    • …
    corecore