115 research outputs found

    An Intelligent audio workstation in the browser

    Get PDF
    Music production is a complex process requiring skill and time to undertake. The industry has undergone a digital revolution, but unlike other industries the process has not changed. However, intelligent systems, using the semantic web and signal processing, can reduce this complexity by making certain decisions for the user with minimal interaction, saving both time and investment on the engineers’ part. This paper will outline an intelligent Digital Audio Workstation (DAW) designed for use in the browser. It outlines the architecture of the DAW with its audio engine (built on the Web Audio API), using AngularJS for the user interface and a relational database

    Semantic Music Production: A Meta-Study

    Get PDF
    This paper presents a systematic review of semantic music production, including a meta-analysis of three studies into how individuals use words to describe audio effects within music production. Each study followed different methodologies and stimuli. The SAFE project created audio effect plug-ins that allowed users to report suitable words to describe the perceived result. SocialFX crowdsourced a large data set of how non-professionals described the change that resulted from an effect applied to an audio sample. The Mix Evaluation Data Set performed a series of controlled studies in which students used natural language to comment extensively on the content of different mixes of the same groups of songs. The data sets provided 40,411 audio examples and 7,221 unique word descriptors from 1,646 participants. Analysis showed strong correlations between various audio features, effect parameter settings, and semantic descriptors. Meta-analysis not only revealed consistent use of descriptors among the data sets but also showed key differences that likely resulted from the different participant groups and tasks. To the authors' knowledge, this represents the first meta-study and the largest-ever analysis of music production semantics

    Automating the Production of the Balance Mix in Music Production

    Get PDF
    Historically, the junior engineer is an individual who would assist the sound engineer to produce a mix by performing a number of mixing and pre-processing tasks ahead of the main session. With improvements in technology, these tasks can be done more efficiently, so many aspects of this role are now assigned to the lead engineer. Similarly, these technological advances mean amateur producers now have access to similar mixing tools at home, without the need for any studio time or record label investments. As the junior engineer’s role is now embedded into the process it creates a steeper learning curve for these amateur engineers, and adding time onto the mixing process. In order to build tools to help users overcome the hurdles associated with this increased workload, we first aim to quantify the role of a modern studio engineer. To do this, a production environment was built to collect session data, allowing subjects to construct a balance mix, which is the starting point of the mixing life-cycle. This balance-mix is generally designed to ensure that all the recordings in a mix are audible, as well as to build routing structures and apply pre-processing. Improvements in web technologies allow for this data-collection system to run in a browser, making remote data acquisition feasible in a short space of time. The data collected in this study was then used to develop a set of assistive tools, designed to be non-intrusive and to provide guidance, allowing the engineer to understand the process. From the data, grouping of the audio tracks proved to be one of the most important, yet overlooked tasks in the production life-cycle. This step is often misunderstood by novice engineers, and can enhance the quality of the final product. The first assistive tool we present in this thesis takes multi-track audio sessions and uses semantic information to group and label them. The system can work with any collection of audio tracks, and can be embedded into a poroduction environment. It was also apparent from the data that the minimisation of masking is a primary task of the mixing stage. We therefore present a tool which can automatically balance a mix by minimising the masking between separate audio tracks. Using evolutionary computing as a solver, the mix space can be searched effectively without the requirement for complex models to be trained on production data. The evaluation of these systems show they are capable of producing a session structure similar to that of a real engineer. This provides a balance mix which is routed and pre-processed, before creative mixing can take place. This provides an engineer with several steps completed for them, similar to the work of a junior engineer

    Towards a better understanding of mix engineering

    Get PDF
    PhDThis thesis explores how the study of realistic mixes can expand current knowledge about multitrack music mixing. An essential component of music production, mixing remains an esoteric matter with few established best practices. Research on the topic is challenged by a lack of suitable datasets, and consists primarily of controlled studies focusing on a single type of signal processing. However, considering one of these processes in isolation neglects the multidimensional nature of mixing. For this reason, this work presents an analysis and evaluation of real-life mixes, demonstrating that it is a viable and even necessary approach to learn more about how mixes are created and perceived. Addressing the need for appropriate data, a database of 600 multitrack audio recordings is introduced, and mixes are produced by skilled engineers for a selection of songs. This corpus is subjectively evaluated by 33 expert listeners, using a new framework tailored to the requirements of comparison of musical signal processing. By studying the relationship between these assessments and objective audio features, previous results are confirmed or revised, new rules are unearthed, and descriptive terms can be defined. In particular, it is shown that examples of inadequate processing, combined with subjective evaluation, are essential in revealing the impact of mix processes on perception. As a case study, the percept `reverberation amount' is ex-pressed as a function of two objective measures, and a range of acceptable values can be delineated. To establish the generality of these findings, the experiments are repeated with an expanded set of 180 mixes, assessed by 150 subjects with varying levels of experience from seven different locations in five countries. This largely confirms initial findings, showing few distinguishable trends between groups. Increasing experience of the listener results in a larger proportion of critical and specific statements, and agreement with other experts.Yamaha Corporation, the Audio Engineering Society, Harman International Industries, the Engineering and Physical Sciences Research Council, the Association of British Turkish Academics, and Queen Mary University of London's School of Electronic Engineering and Computer Scienc

    Improving User Involvement Through Live Collaborative Creation

    Full text link
    Creating an artifact - such as writing a book, developing software, or performing a piece of music - is often limited to those with domain-specific experience or training. As a consequence, effectively involving non-expert end users in such creative processes is challenging. This work explores how computational systems can facilitate collaboration, communication, and participation in the context of involving users in the process of creating artifacts while mitigating the challenges inherent to such processes. In particular, the interactive systems presented in this work support live collaborative creation, in which artifact users collaboratively participate in the artifact creation process with creators in real time. In the systems that I have created, I explored liveness, the extent to which the process of creating artifacts and the state of the artifacts are immediately and continuously perceptible, for applications such as programming, writing, music performance, and UI design. Liveness helps preserve natural expressivity, supports real-time communication, and facilitates participation in the creative process. Live collaboration is beneficial for users and creators alike: making the process of creation visible encourages users to engage in the process and better understand the final artifact. Additionally, creators can receive immediate feedback in a continuous, closed loop with users. Through these interactive systems, non-expert participants help create such artifacts as GUI prototypes, software, and musical performances. This dissertation explores three topics: (1) the challenges inherent to collaborative creation in live settings, and computational tools that address them; (2) methods for reducing the barriers of entry to live collaboration; and (3) approaches to preserving liveness in the creative process, affording creators more expressivity in making artifacts and affording users access to information traditionally only available in real-time processes. In this work, I showed that enabling collaborative, expressive, and live interactions in computational systems allow the broader population to take part in various creative practices.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145810/1/snaglee_1.pd

    Binaural virtual auditory display for music discovery and recommendation

    Get PDF
    Emerging patterns in audio consumption present renewed opportunity for searching or navigating music via spatial audio interfaces. This thesis examines the potential benefits and considerations for using binaural audio as the sole or principal output interface in a music browsing system. Three areas of enquiry are addressed. Specific advantages and constraints in spatial display of music tracks are explored in preliminary work. A voice-led binaural music discovery prototype is shown to offer a contrasting interactive experience compared to a mono smartspeaker. Results suggest that touch or gestural interaction may be more conducive input modes in the former case. The limit of three binaurally spatialised streams is identified from separate data as a usability threshold for simultaneous presentation of tracks, with no evident advantages derived from visual prompts to aid source discrimination or localisation. The challenge of implementing personalised binaural rendering for end-users of a mobile system is addressed in detail. A custom framework for assessing head-related transfer function (HRTF) selection is applied to data from an approach using 2D rendering on a personal computer. That HRTF selection method is developed to encompass 3D rendering on a mobile device. Evaluation against the same criteria shows encouraging results in reliability, validity, usability and efficiency. Computational analysis of a novel approach for low-cost, real-time, head-tracked binaural rendering demonstrates measurable advantages compared to first order virtual Ambisonics. Further perceptual evaluation establishes working parameters for interactive auditory display use cases. In summation, the renderer and identified tolerances are deployed with a method for synthesised, parametric 3D reverberation (developed through related research) in a final prototype for mobile immersive playlist editing. Task-oriented comparison with a graphical interface reveals high levels of usability and engagement, plus some evidence of enhanced flow state when using the eyes-free binaural system

    Supporting pro-amateur composers using digital audio workstations

    Get PDF
    This thesis investigates the activity of pro-amateur composers in order to identify possible design improvements to a category of composition software called digital audio workstations. Pro-amateur composers are composers who are not full-time professional musicians but who have a considerably greater level of expertise than amateurs. In contrast to the collaborative settings that this group is normally studied in this thesis will focus on situations where pro-amateur composers work independently. Existing research on the use of composition software is reviewed, revealing that the composition process can involve a wide variety of component activities and overarching macro structures, and that other aids are often used in addition to composition software. Studies have also indicated that the design of composition software may constrain the creativity of composers. Four important considerations are identified for studying composers: triangulating multiple data capture methods, avoiding study designs that constrain what activities can be observed, capturing use of any external aids, and studying the use of a variety of composition software (or a prototype design) to mitigate any constraints that are due to the software's design. Four pro-amateur composers were observed composing in their usual environments using a methodology based on interaction analysis. Based on information recorded about the settings, artefacts used, and activities carried out, three major patterns are observed. Firstly, existing tools support different composition activities to varying degrees, with additional support needed for improvisation, reflection, and auditioning incompletely specified material; secondly, composers make coordinated use of multiple representations; and finally, composers make use of strategies that enable selective allocation of time and effort (habituation, limited exploration, and self-constraining). Previous authors have used many different notions of external cognition when studying the use of composition software. A literature review of such studies identifies techniques that can be applied to improve the representations used in composition software. Seven techniques are described: selective representation, diverse media types, structured representations, incomplete specification, representing alternatives, task lists, and representing history. A detailed review of evidence from the literature and the observational study is used to identify implementation suggestions for each technique. The technique of task lists has been studied significantly less in the literature on composition software and appears to be a fruitful avenue for further exploration. A prototype to-do list website designed for coordinated use with Ableton Live is created to further investigate the task lists technique by studying how it is used by five pro-amateur composers. Using thematic analysis of interviews triangulated with video recordings and logs, four main themes are identified: using to-do lists to plan and focus, changing to-do list items over time, organising to-do lists, and applicability of to-do lists. Seven key patterns of activity that are enabled by task lists are also described: planning activity, journalling activity, interleaving activities, reflection, organising the to-do list, idea capture, and collaboration. Task lists appear to be useful because explicitly representing tasks, processes, and plans helps the composer to consider those subjects; and also because task lists ease many related activities, such as tracking incomplete work, monitoring deviation from a planned creative direction, or identifying and re-using useful strategies. Two important considerations for design of task lists in DAWs are identified: how task lists are integrated with the DAW, and how to increase the visibility of the composer's activity. For both considerations, specific suggestions are made for how these could be achieved

    DIVE on the internet

    Get PDF
    This dissertation reports research and development of a platform for Collaborative Virtual Environments (CVEs). It has particularly focused on two major challenges: supporting the rapid development of scalable applications and easing their deployment on the Internet. This work employs a research method based on prototyping and refinement and promotes the use of this method for application development. A number of the solutions herein are in line with other CVE systems. One of the strengths of this work consists in a global approach to the issues raised by CVEs and the recognition that such complex problems are best tackled using a multi-disciplinary approach that understands both user and system requirements. CVE application deployment is aided by an overlay network that is able to complement any IP multicast infrastructure in place. Apart from complementing a weakly deployed worldwide multicast, this infrastructure provides for a certain degree of introspection, remote controlling and visualisation. As such, it forms an important aid in assessing the scalability of running applications. This scalability is further facilitated by specialised object distribution algorithms and an open framework for the implementation of novel partitioning techniques. CVE application development is eased by a scripting language, which enables rapid development and favours experimentation. This scripting language interfaces many aspects of the system and enables the prototyping of distribution-related components as well as user interfaces. It is the key construct of a distributed environment to which components, written in different languages, connect and onto which they operate in a network abstracted manner. The solutions proposed are exemplified and strengthened by three collaborative applications. The Dive room system is a virtual environment modelled after the room metaphor and supporting asynchronous and synchronous cooperative work. WebPath is a companion application to a Web browser that seeks to make the current history of page visits more visible and usable. Finally, the London travel demonstrator supports travellers by providing an environment where they can explore the city, utilise group collaboration facilities, rehearse particular journeys and access tourist information data

    Deep Learning for Audio Segmentation and Intelligent Remixing

    Get PDF
    Audio segmentation divides an audio signal into homogenous sections such as music and speech. It is useful as a preprocessing step to index, store, and modify audio recordings, radio broadcasts and TV programmes. Machine learning models for audio segmentation are generally trained on copyrighted material, which cannot be shared across research groups. Furthermore, annotating these datasets is a time-consuming and expensive task. In this thesis, we present a novel approach that artificially synthesises data that resembles radio signals. We replicate the workflow of a radio DJ in mixing audio and investigate parameters like fade curves and audio ducking. Using this approach, we obtained state-of-the-art performance for music-speech detection on in-house and public datasets. After demonstrating the efficacy of training set synthesis, we investigate how audio ducking of background music impacts the precision and recall of the machine learning algorithm. Interestingly, we observed that the minimum level of audio ducking preferred by the machine learning algorithm was similar to that of human listeners. Furthermore, we observe that our proposed synthesis technique outperforms real-world data in some cases and serves as a promising alternative. This project also proposes a novel deep learning system called You Only Hear Once (YOHO), which is inspired by the YOLO algorithm popularly adopted in Computer Vision. We convert the detection of acoustic boundaries into a regression problem instead of frame-based classification. The relative improvement for F-measure of YOHO, compared to the state-of-the-art Convolutional Recurrent Neural Network, ranged from 1% to 6% across multiple datasets. As YOHO predicts acoustic boundaries directly, the speed of inference and post-processing steps are 6 times faster than frame-based classification. Furthermore, we investigate domain generalisation methods such as transfer learning and adversarial training. We demonstrated that these methods helped our algorithm perform better in unseen domains. In addition to audio segmentation, another objective of this project is to explore real-time radio remixing. This is a step towards building a customised radio and consequently, integrating it with the schedule of the listener. The system would remix music from the user’s personal playlist and play snippets of diary reminders at appropriate transition points. The intelligent remixing is governed by the underlying audio segmentation and other deep learning methods. We also explore how individuals can communicate with intelligent mixing systems through non-technical language. We demonstrated that word embeddings help in understanding representations of semantic descriptors
    • …
    corecore