75,747 research outputs found
Live coding machine learning and machine listening: a survey on the design of languages and environments for live coding
The MIMIC (Musically Intelligent Machines Interacting Creatively) project explores how the techniques of machine learning and machine listening can be communicated and implemented in simple terms for composers, instrument makers and performers. The potential for machine learning to support musical composition and performance is high, and with novel techniques in machine listening, we see emerging a technology that can shift from being instrumental to conversational and collaborative. By leveraging the internet as a live software ecosystem, the MIMIC project explores how such technology can best reach artists, and live up to its collaborative potential to fundamentally change creative practice in the field.
The project involves creating a high level language that can be used for live coding, creative coding and quick prototyping. Implementing a language that interfaces with technically complex problems such as the design of machine learning neural networks or the temporal and spectral algorithms applied in machine listening is not a simple task, but we can build upon decades of research and practice in programming language design (Ko 2016), and computer music language design in particular, as well as a plethora of inventive new approaches in the design of live coding systems for music (Reina et al. 2019). The language and user interface design will build on recent research in creative coding and interactive machine learning, exemplified by the Rapid Mix project (Bernardo et. al., 2016, Zbyszynski et. al., 2017). Machine learning continues to be at the forefront of new innovations in computer music, (e.g. new sound synthesis techniques in NSynth (Engel et. al. 2017) and WaveNet (van den Oord, 2016)); the language will seek to integrate models based around these new techniques into live coding performance, and also explore the efficacy of live coding as an approach to training and exploiting these systems for analysing and generating sound.
Existing live coding systems and languages are often reported on, describing clever solutions as well as weaknesses, as given, for example, in accounts of the development of Tidal (McLean, 2014), Extramuros (Ogborn et. al, 2015) and Gibber (Roberts and Kuchera-Morin, 2012). Researchers are typically reflective and openly critical of their own systems when analysing them and often report on its design with wider implications (Aaron 2011; Sorensen 2018). However, they rarely speculate freely and uninhibitedly about possible solutions or alternative paths taken; the focus is typically on the system described. Before defining the design of our own system, we were therefore interested in opening up a channel where we could learn from other practitioners in language design, machine learning and machine listening. We created a survey that we sent out to relevant communities of practice - such as live coding, machine learning, machine listening, creative coding, deep learning - and asked open questions about how they might imagine a future system implemented, given the knowledge we have today. Below we report on the questionnaire and its findings
Towards musical interaction : 'Schismatics' for e-violin and computer.
This paper discusses the evolution of the Max/MSP
patch used in schismatics (2007, rev. 2010) for electric
violin (Violectra) and computer, by composer Sam
Hayden in collaboration with violinist Mieko Kanno.
schismatics involves a standard performance paradigm
of a fixed notated part for the e-violin with sonically unfixed
live computer processing. Hayden was unsatisfied
with the early version of the piece: the use of attack
detection on the live e-violin playing to trigger stochastic
processes led to an essentially reactive behaviour in the
computer, resulting in a somewhat predictable one-toone
sonic relationship between them. It demonstrated
little internal relationship between the two beyond an
initial e-violin ‘action’ causing a computer ‘event’. The
revisions in 2010, enabled by an AHRC Practice-Led
research award, aimed to achieve 1) a more interactive
performance situation and 2) a subtler and more
‘musical’ relationship between live and processed
sounds. This was realised through the introduction of
sound analysis objects, in particular machine listening
and learning techniques developed by Nick Collins. One
aspect of the programming was the mapping of analysis
data to synthesis parameters, enabling the computer
transformations of the e-violin to be directly related to
Kanno’s interpretation of the piece in performance
The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use
The GTZAN dataset appears in at least 100 published works, and is the
most-used public dataset for evaluation in machine listening research for music
genre recognition (MGR). Our recent work, however, shows GTZAN has several
faults (repetitions, mislabelings, and distortions), which challenge the
interpretability of any result derived using it. In this article, we disprove
the claims that all MGR systems are affected in the same ways by these faults,
and that the performances of MGR systems in GTZAN are still meaningfully
comparable since they all face the same faults. We identify and analyze the
contents of GTZAN, and provide a catalog of its faults. We review how GTZAN has
been used in MGR research, and find few indications that its faults have been
known and considered. Finally, we rigorously study the effects of its faults on
evaluating five different MGR systems. The lesson is not to banish GTZAN, but
to use it with consideration of its contents.Comment: 29 pages, 7 figures, 6 tables, 128 reference
Predicting Audio Advertisement Quality
Online audio advertising is a particular form of advertising used abundantly
in online music streaming services. In these platforms, which tend to host tens
of thousands of unique audio advertisements (ads), providing high quality ads
ensures a better user experience and results in longer user engagement.
Therefore, the automatic assessment of these ads is an important step toward
audio ads ranking and better audio ads creation. In this paper we propose one
way to measure the quality of the audio ads using a proxy metric called Long
Click Rate (LCR), which is defined by the amount of time a user engages with
the follow-up display ad (that is shown while the audio ad is playing) divided
by the impressions. We later focus on predicting the audio ad quality using
only acoustic features such as harmony, rhythm, and timbre of the audio,
extracted from the raw waveform. We discuss how the characteristics of the
sound can be connected to concepts such as the clarity of the audio ad message,
its trustworthiness, etc. Finally, we propose a new deep learning model for
audio ad quality prediction, which outperforms the other discussed models
trained on hand-crafted features. To the best of our knowledge, this is the
first large-scale audio ad quality prediction study.Comment: WSDM '18 Proceedings of the Eleventh ACM International Conference on
Web Search and Data Mining, 9 page
- …