Search CORE

5,506 research outputs found

Parallel Online Time Warping for Real-Time Audio-to-Score Alignment in Multi-core Systems

Author: C Joder
C Raphael
F Itakura
F. J. Rodríguez-Serrano
FJ Rodriguez-Serrano
JJ Carabias-Ortí
José Ranilla
M. Alonso-González
P. Vera-Candeas
Pedro Alonso
Raquel Cortina
Z Duan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

[EN] The Audio-to-Score framework consists of two separate stages: pre- processing and alignment. The alignment is commonly solved through offline Dynamic Time Warping (DTW), which is a method to find the path over the distortion matrix with the minimum cost to determine the relation between the performance and the musical score times. In this work we propose a par- allel online DTW solution based on a client-server architecture. The current version of the application has been implemented for multi-core architectures (x86, x64 and ARM), thus covering either powerful systems or mobile devices. An extensive experimentation has been conducted in order to validate the software. The experiments also show that our framework allows to achieve a good score alignment within the real-time window by using parallel computing techniques.This work has been partially supported by Spanish Ministry of Science and Innovation and FEDER under Projects TEC2012-38142-C04-01, TEC2012-38142-C04-03, TEC2012-38142-C04-04, TEC2015-67387-C4-1-R, TEC2015-67387-C4-3-R, TEC2015-67387-C4-4-R, the European Union FEDER (CAPAP-H5 network TIN2014-53522-REDT), and the Generalitat Valenciana under Grant PROMETEOII/2014/003.Alonso-Jordá, P.; Cortina, R.; Rodríguez-Serrano, F.; Vera-Candeas, P.; Alonso-González, M.; Ranilla, J. (2017). Parallel Online Time Warping for Real-Time Audio-to-Score Alignment in Multi-core Systems. The Journal of Supercomputing. 73(1):126-138. https://doi.org/10.1007/s11227-016-1647-5S126138731Joder C, Essid S, Richard G (2011) A conditional random field framework for robust and scalable audio-to-score matching. IEEE Trans Speech Audio Lang Process 19(8):2385–2397McNab RJ, Smith LA, Witten IH, Henderson CL, Cunningham SJ (1996) Towards the digital music library: tune retrieval from acoustic input. In: DL 96: Proceedings of the first ACM international conference on digital libraries. ACM, New York, pp 11–18Dannenberg RB (2007) An intelligent multi-track audio editor. In: Proceedings of international computer music conference (ICMC), vol 2, pp 89–94Duan Z, Pardo B (2011) Soundprism: an online system for score-informed source separation of music audio. IEEE J Sel Topics Signal Process 5(6):1205–1215Dixon S (2005) Live tracking of musical performances using on-line time warping. In: Proceedings of the international conference on digital audio effects (DAFx), Madrid, Spain, pp 92–97Orio N, Schwarz D (2001) Alignment of monophonic and polyphonic music to a score. In: Proceedings of the international computer music conference (ICMC), pp 129–132Simon I, Morris D, Basu S (2008) MySong: automatic accompaniment generation for vocal melodies. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, New York, pp 725–734Rodriguez-Serrano FJ, Duan Z, Vera-Candeas P, Pardo B, Carabias-Orti JJ (2015) Online score-informed source separation with adaptive instrument models. J New Music Res Lond 44(2):83–96Arzt A, Widmer G, Dixon S (2008) Automatic page turning for musicians via real-time machine listening. In: Proceedings of the 18th European conference on artificial intelligence. IOS Press, Amsterdam, pp 241–245Carabias-Orti JJ, Rodriguez-Serrano FJ, Vera-Candeas P, Canadas-Quesada FJ, Ruiz-Reyes N (2015) An audio to score alignment framework using spectral factorization and dynamic time warping. In: 16th International Society for music information retrieval conference, pp 742–748Rodríguez-Serrano FJ, Menéndez-Canal J, Vidal A, Cañadas-Quesada FJ, Cortina R (2015) A DTW based score following method for score-informed sound source separation. In: Proceedings of the 12th sound and music computing conference 2015 (SMC-15), Ireland, pp 491–496Carabias-Ortí JJ, Rodríguez-Serrano FJ, Vera-Candeas P, Cañadas-Quesada FJ, Ruíz-Reyes N (2013) Constrained non-negative sparse coding using learnt instrument templates for realtime music transcription. Eng Appl Artif Intell 26(7):1671–1680Raphael C (2006) Aligning music audio with symbolic scores using a hybrid graphical model. Mach Learn 65:389–409Schreck-Ensemble (2001–2004) ComParser 1.42. http://home.hku.nl/~pieter.suurmond/SOFT/CMP/doc/cmp.html . Accessed Sept 2015Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust Speech Signal Process 23:52–72Dannenberg R, Hu N (2003) Polyphonic audio matching for score following and intelligent audio editors. In: Proceedings of the international computer music conference. International Computer Music Association, San Francisco, pp 27–34Mueller M, Kurth F, Roeder T (2004) Towards an efficient algorithm for automatic score-to-audio synchronization. In: Proceedings of the 5th international conference on music information retrieval, Barcelona, SpainMueller M, Mattes H, Kurth F (2006) An efficient multiscale approach to audio synchronization. In: Proceedings of the 7th international conference on music information retrieval, Victoria, CanadaKaprykowsky H, Rodet X (2006) Globally optimal short-time dynamic time warping applications to score to audio alignment. In: IEEE ICASSP, Toulouse, France, pp 249–252Fremerey C, Müller M, Clausen M (2010) Handling repeats and jumps in score-performance synchronization. In: Proceedings of ISMIR, pp 243–248Arzt A, Widmer G (2010) Towards effective any-time music tracking. In: Proceedings of starting AI researchers symposium (STAIRS), Lisbon, Portugal, pp 24–3

Crossref

Repositorio Institucional de la Universidad de Oviedo

RiuNet

Speaker-following Video Subtitles

Author: Hu Yongtao
Kautz Jan
Wang Wenping
Yu Yizhou
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

We propose a new method for improving the presentation of subtitles in video (e.g. TV and movies). With conventional subtitles, the viewer has to constantly look away from the main viewing area to read the subtitles at the bottom of the screen, which disrupts the viewing experience and causes unnecessary eyestrain. Our method places on-screen subtitles next to the respective speakers to allow the viewer to follow the visual content while simultaneously reading the subtitles. We use novel identification algorithms to detect the speakers based on audio and visual information. Then the placement of the subtitles is determined using global optimization. A comprehensive usability study indicated that our subtitle placement method outperformed both conventional fixed-position subtitling and another previous dynamic subtitling method in terms of enhancing the overall viewing experience and reducing eyestrain

arXiv.org e-Print Archive

HKU Scholars Hub

Linking Sheet Music and Audio - Challenges and New Approaches

Author: Clausen Michael
Fremerey Christian
Thomas Verena
Publication venue: Dagstuhl Follow-Ups. Multimodal Music Processing
Publication date: 01/01/2012
Field of study

Score and audio files are the two most important ways to represent, convey, record, store, and experience music. While score describes a piece of music on an abstract level using symbols such as notes, keys, and measures, audio files allow for reproducing a specific acoustic realization of the piece. Each of these representations reflects different facets of music yielding insights into aspects ranging from structural elements (e.g., motives, themes, musical form) to specific performance aspects (e.g., artistic shaping, sound). Therefore, the simultaneous access to score and audio representations is of great importance. In this paper, we address the problem of automatically generating musically relevant linking structures between the various data sources that are available for a given piece of music. In particular, we discuss the task of sheet music-audio synchronization with the aim to link regions in images of scanned scores to musically corresponding sections in an audio recording of the same piece. Such linking structures form the basis for novel interfaces that allow users to access and explore multimodal sources of music within a single framework. As our main contributions, we give an overview of the state-of-the-art for this kind of synchronization task, we present some novel approaches, and indicate future research directions. In particular, we address problems that arise in the presence of structural differences and discuss challenges when applying optical music recognition to complex orchestral scores. Finally, potential applications of the synchronization results are presented

Dagstuhl Research Online Publication Server

MPG.PuRe

Robust and Efficient Joint Alignment of Multiple Musical Performances

Author: Dixon S
Ewert S
Wang S
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/09/2016
Field of study

Crossref

Queen Mary Research Online

Signal Processing Methods for Music Synchronization, Audio Matching, and Source Separation

Author: Ewert Sebastian
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

The field of music information retrieval (MIR) aims at developing techniques and tools for organizing, understanding, and searching multimodal information in large music collections in a robust, efficient and intelligent manner. In this context, this thesis presents novel, content-based methods for music synchronization, audio matching, and source separation. In general, music synchronization denotes a procedure which, for a given position in one representation of a piece of music, determines the corresponding position within another representation. Here, the thesis presents three complementary synchronization approaches, which improve upon previous methods in terms of robustness, reliability, and accuracy. The first approach employs a late-fusion strategy based on multiple, conceptually different alignment techniques to identify those music passages that allow for reliable alignment results. The second approach is based on the idea of employing musical structure analysis methods in the context of synchronization to derive reliable synchronization results even in the presence of structural differences between the versions to be aligned. Finally, the third approach employs several complementary strategies for increasing the accuracy and time resolution of synchronization results. Given a short query audio clip, the goal of audio matching is to automatically retrieve all musically similar excerpts in different versions and arrangements of the same underlying piece of music. In this context, chroma-based audio features are a well-established tool as they possess a high degree of invariance to variations in timbre. This thesis describes a novel procedure for making chroma features even more robust to changes in timbre while keeping their discriminative power. Here, the idea is to identify and discard timbre-related information using techniques inspired by the well-known MFCC features, which are usually employed in speech processing. Given a monaural music recording, the goal of source separation is to extract musically meaningful sound sources corresponding, for example, to a melody, an instrument, or a drum track from the recording. To facilitate this complex task, one can exploit additional information provided by a musical score. Based on this idea, this thesis presents two novel, conceptually different approaches to source separation. Using score information provided by a given MIDI file, the first approach employs a parametric model to describe a given audio recording of a piece of music. The resulting model is then used to extract sound sources as specified by the score. As a computationally less demanding and easier to implement alternative, the second approach employs the additional score information to guide a decomposition based on non-negative matrix factorization (NMF)

bonndoc – Der Publikationsserver der Universität Bonn

Music Information Retrieval: An Inspirational Guide to Transfer from Related Disciplines

Author: Hanjalic Alan
Kurth Frank
Liem Cynthia C.S.
Weninger Felix
Publication venue: Dagstuhl Follow-Ups. Multimodal Music Processing
Publication date: 01/01/2012
Field of study

The emerging field of Music Information Retrieval (MIR) has been influenced by neighboring domains in signal processing and machine learning, including automatic speech recognition, image processing and text information retrieval. In this contribution, we start with concrete examples for methodology transfer between speech and music processing, oriented on the building blocks of pattern recognition: preprocessing, feature extraction, and classification/decoding. We then assume a higher level viewpoint when describing sources of mutual inspiration derived from text and image information retrieval. We conclude that dealing with the peculiarities of music in MIR research has contributed to advancing the state-of-the-art in other fields, and that many future challenges in MIR are strikingly similar to those that other research areas have been facing

Dagstuhl Research Online Publication Server