138 research outputs found

    Gaussian Framework for Interference Reduction in Live Recordings

    Get PDF
    Here typical live full-length music recordings are considered. In this scenarios, some instrumental voices are captured by microphones intended to other voices, leading to so-called “interferences”. Reducing this phenomenon is desirable because it opens new possibilities for sound engineers and also it has been proven that it increase performances of music analysis and processing tools (e.g. pitch tracking). In this work we propose an fast NMF-based algorithm to solve this problem.ope

    Iterative decoding and equalization for 2-D recording channels

    Full text link

    A networking approach to sharing music studio resources

    Get PDF
    This thesis investigates the extent to which networking technology can be used to provide remote workstation access to a pool of shared music studio resources. A pilot system is described in which MIDI messages, studio control data, and audio signals flow between the workstations and a studio server. A booking and timing facility avoids contention and allows for accurate reports of studio usage. The operation of the system has been evaluated in terms of its ability to satislY three fundamental goals, namely the remote, shared and centralized access to studio resources. Three essential network configurations have been identified, incorporating a mix of star and bus topologies, and their relative potential for satisfYing the fundamental goals has been highlighted

    hpDJ: An automated DJ with floorshow feedback

    No full text
    Many radio stations and nightclubs employ Disk-Jockeys (DJs) to provide a continuous uninterrupted stream or “mix” of dance music, built from a sequence of individual song-tracks. In the last decade, commercial pre-recorded compilation CDs of DJ mixes have become a growth market. DJs exercise skill in deciding an appropriate sequence of tracks and in mixing 'seamlessly' from one track to the next. Online access to large-scale archives of digitized music via automated music information retrieval systems offers users the possibility of discovering many songs they like, but the majority of consumers are unlikely to want to learn the DJ skills of sequencing and mixing. This paper describes hpDJ, an automatic method by which compilations of dance-music can be sequenced and seamlessly mixed by computer, with minimal user involvement. The user may specify a selection of tracks, and may give a qualitative indication of the type of mix required. The resultant mix can be presented as a continuous single digital audio file, whether for burning to CD, or for play-out from a personal playback device such as an iPod, or for play-out to rooms full of dancers in a nightclub. Results from an early version of this system have been tested on an audience of patrons in a London nightclub, with very favourable results. Subsequent to that experiment, we designed technologies which allow the hpDJ system to monitor the responses of crowds of dancers/listeners, so that hpDJ can dynamically react to those responses from the crowd. The initial intention was that hpDJ would monitor the crowd’s reaction to the song-track currently being played, and use that response to guide its selection of subsequent song-tracks tracks in the mix. In that version, it’s assumed that all the song-tracks existed in some archive or library of pre-recorded files. However, once reliable crowd-monitoring technology is available, it becomes possible to use the crowd-response data to dynamically “remix” existing song-tracks (i.e, alter the track in some way, tailoring it to the response of the crowd) and even to dynamically “compose” new song-tracks suited to that crowd. Thus, the music played by hpDJ to any particular crowd of listeners on any particular night becomes a direct function of that particular crowd’s particular responses on that particular night. On a different night, the same crowd of people might react in a different way, leading hpDJ to create different music. Thus, the music composed and played by hpDJ could be viewed as an “emergent” property of the dynamic interaction between the computer system and the crowd, and the crowd could then be viewed as having collectively collaborated on composing the music that was played on that night. This en masse collective composition raises some interesting legal issues regarding the ownership of the composition (i.e.: who, exactly, is the author of the work?), but revenue-generating businesses can nevertheless plausibly be built from such technologies

    Systematic evaluation of perceived spatial quality

    Get PDF
    The evaluation of perceived spatial quality calls for a method that is sensitive to changes in the constituent dimensions of that quality. In order to devise a method accounting for these changes, several processes have to be performed. This paper shows the development of scales by elicitation and structuring of verbal data, followed by validation of the resulting attribute scales

    Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds

    Full text link
    Humans can robustly recognize and localize objects by integrating visual and auditory cues. While machines are able to do the same now with images, less work has been done with sounds. This work develops an approach for dense semantic labelling of sound-making objects, purely based on binaural sounds. We propose a novel sensor setup and record a new audio-visual dataset of street scenes with eight professional binaural microphones and a 360 degree camera. The co-existence of visual and audio cues is leveraged for supervision transfer. In particular, we employ a cross-modal distillation framework that consists of a vision `teacher' method and a sound `student' method -- the student method is trained to generate the same results as the teacher method. This way, the auditory system can be trained without using human annotations. We also propose two auxiliary tasks namely, a) a novel task on Spatial Sound Super-resolution to increase the spatial resolution of sounds, and b) dense depth prediction of the scene. We then formulate the three tasks into one end-to-end trainable multi-tasking network aiming to boost the overall performance. Experimental results on the dataset show that 1) our method achieves promising results for semantic prediction and the two auxiliary tasks; and 2) the three tasks are mutually beneficial -- training them together achieves the best performance and 3) the number and orientations of microphones are both important. The data and code will be released to facilitate the research in this new direction.Comment: Project page: https://www.trace.ethz.ch/publications/2020/sound_perception/index.htm

    Iterative detection and decoding for separable two-dimensional intersymbol interference

    Full text link

    Music Visualization Using Source Separated Stereophonic Music

    Get PDF
    This thesis introduces a music visualization system for stereophonic source separated music. Music visualization systems are a popular way to represent information from audio signals through computer graphics. Visualization can help people better understand music and its complex and interacting elements. This music visualization system extracts pitch, panning, and loudness features from source separated audio files to create the visual. Most state-of-the art visualization systems develop their visual representation of the music from either the fully mixed final song recording, where all of the instruments and vocals are combined into one file, or from the digital audio workstation (DAW) data containing multiple independent recordings of individual audio sources. Original source recordings are not always readily available to the public so music source separation (MSS) can be used to obtain estimated versions of the audio source files. This thesis surveys different approaches to MSS and music visualization as well as introduces a new music visualization system specifically for source separated music
    • …
    corecore