21 research outputs found

    Deep Remix: Remixing Musical Mixtures Using a Convolutional Deep Neural Network

    Full text link
    Audio source separation is a difficult machine learning problem and performance is measured by comparing extracted signals with the component source signals. However, if separation is motivated by the ultimate goal of re-mixing then complete separation is not necessary and hence separation difficulty and separation quality are dependent on the nature of the re-mix. Here, we use a convolutional deep neural network (DNN), trained to estimate 'ideal' binary masks for separating voice from music, to perform re-mixing of the vocal balance by operating directly on the individual magnitude components of the musical mixture spectrogram. Our results demonstrate that small changes in vocal gain may be applied with very little distortion to the ultimate re-mix. Our method may be useful for re-mixing existing mixes

    Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network

    Get PDF
    Identification and extraction of singing voice from within musical mixtures is a key challenge in source separation and machine audition. Recently, deep neural networks (DNN) have been used to estimate 'ideal' binary masks for carefully controlled cocktail party speech separation problems. However, it is not yet known whether these methods are capable of generalizing to the discrimination of voice and non-voice in the context of musical mixtures. Here, we trained a convolutional DNN (of around a billion parameters) to provide probabilistic estimates of the ideal binary mask for separation of vocal sounds from real-world musical mixtures. We contrast our DNN results with more traditional linear methods. Our approach may be useful for automatic removal of vocal sounds from musical mixtures for 'karaoke' type applications

    An efficient musical accompaniment parallel system for mobile devices

    Full text link
    [EN] This work presents a software system designed to track the reproduction of a musical piece with the aim to match the score position into its symbolic representation on a digital sheet. Into this system, known as automated musical accompaniment system, the process of score alignment can be carried out real-time. A real-time score alignment, also known as score following, poses an important challenge due to the large amount of computation needed to process each digital frame and the very small time slot to process it. Moreover, the challenge is even greater since we are interested on handheld devices, i.e. devices characterized by both low power consumption and mobility. The results presented here show that it is possible to exploit efficiently several cores of an ARM(A (R)) processor, or a GPU accelerator (presented in some SoCs from NVIDIA) reducing the processing time per frame under 10 ms in most of the cases.This work was supported by the Ministry of Economy and Competitiveness from Spain (FEDER) under projects TEC2015-67387-C4-1-R, TEC2015-67387-C4-2-R and TEC2015-67387-C4-3-R, the Andalusian Business, Science and Innovation Council under project P2010-TIC-6762 (FEDER), and the Generalitat Valenciana PROMETEOII/2014/003Alonso-Jordá, P.; Vera-Candeas, P.; Cortina, R.; Ranilla, J. (2017). An efficient musical accompaniment parallel system for mobile devices. The Journal of Supercomputing. 73(1):343-353. https://doi.org/10.1007/s11227-016-1865-xS343353731Cont A, Schwarz D, Schnell N, Raphael C (2007) Evaluation of real- time audio-to-score alignment. In: Proc. of the International Conference on Music Information Retrieval (ISMIR) 2007, ViennaArzt A (2008) Score following with dynamic time warping. An automatic page-turner. Master’s Thesis, Vienna University of Technology, ViennaRaphael C (2010) Music plus one and machine learning. In: Proc. of the 27 th International Conference on Machine Learning, Haifa, pp 21–28Carabias-Ortí JJ, Rodríguez-Serrano FJ, Vera-Candeas P, Ruiz-Reyes N, Cañadas-Quesada FJ (2015) An audio to score alignment framework using spectral factorization and dynamic time warping. In: Proc. of the International Conference on Music Information Retrieval (ISMIR), Málaga, pp 742–748Cont A (2010) A coupled duration-focused architecture for real-time music-to-score alignment. IEEE Trans. Pattern Anal. Mach. Intell. 32(6):974–987Montecchio N, Orio N (2009) A discrete filterbank approach to audio to score matching for score following. In: Proc. of the International Conference on Music Information Retrieval (ISMIR), pp 495–500Puckette M (1995) Score following using the sung voice. In: Proc. of the International Computer Music Conference (ICMC), pp 175–178Duan Z, Pardo B (2011) Soundprism: an online system for score-informed source separation of music audio. IEEE J. Sel. Top. Signal Process. 5(6):1205–1215Cont A (2006) Realtime audio to score alignment for polyphonic music instruments using sparse non-negative constraints and hierarchical hmms. In: Proc. of IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), ToulouseCuvillier P, Cont A (2014) Coherent time modeling of Semi-Markov models with application to realtime audio-to-score alignment. In Proc. of the 2014 IEEE International Workshop on Machine Learning for Signal Processing, p 16Joder C, Essid S, Richard G (2013) Learning optimal features for polyphonic audio-to-score alignment. IEEE Trans. Audio Speech Lang. Process. 21(10):2118–2128Dixon S (2005) Live tracking of musical performances using on-line time warping. In: Proc. International Conference on Digital Audio Effects (DAFx), Madrid, pp 92–97Hu N, Dannenberg RB, Tzanetakis G (2009) Polyphonic audio matching and alignment for music retrieval. In: Proc. of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp 185–188Orio N, Schwarz D (2001) Alignment of monophonic and polyphonic music to a score. In: Proc. International Computer Music Conference (ICMC)Alonso P, Cortina R, Rodríguez-Serrano FJ, Vera-Candeas P, Alonso-Gonzalez M, Ranilla J (2016) Parallel online time warping for real-time audio-to-score alignment in multi-core systems. J. Supercomput. doi: 10.1007/s11227-016-1647-5 (published online)Carabias-Ortí JJ, Rodríguez-Serrano FJ, Vera-Candeas P, Cañadas-Quesada FJ, Ruiz-Reyes N (2013) Constrained non-negative sparse coding using learnt instrument templates for realtime music transcription, Eng. Appl. Artif. Intell. 26(7):1671–1680Carabias-Ortí JJ, Rodríguez-Serrano FJ, Vera-Candeas P, Martínez-Muñoz D (2016) Tempo driven audio-to-score alignment using spectral decomposition and online dynamic time warping. ACM Trans. Intell. Syst. Technol. (accepted)FFTW (2016) http://www.fftw.org . Accessed July 2016NVIDIA CUDA Fast Fourier Transform library (cuFFT) (2016) http://developer.nvidia.com/cufft . Accessed July 2016The OpenMP API specification for parallel programming (2016) http://openmp.org . Accessed July 201

    HReMAS: Hybrid Real-time Musical Alignment System

    Get PDF
    [EN] This paper presents a real-time audio-to-score alignment system for musical applications. The aim of these systems is to synchronize a live musical performance with its symbolic representation in a music sheet. We have used as a base our previous real-time alignment system by enhancing it with a traceback stage, a stage used in offline alignment to improve the accuracy of the aligned note. This stage introduces some delay, what forces to assume a trade-off between output delay and alignment accuracy that must be considered in the design of this type of hybrid techniques. We have also improved our former system to execute faster in order to minimize this delay. Other interesting improvements, like identification of silence frames, have also been incorporated to our proposed system.This work has been supported by the "Ministerio de Economia y Competitividad" of Spain and FEDER under Projects TEC2015-67387-C4-{1,2,3}-R.Cabañas-Molero, P.; Cortina-Parajón, R.; Combarro, EF.; Alonso-Jordá, P.; Bris-Peñalver, FJ. (2019). HReMAS: Hybrid Real-time Musical Alignment System. The Journal of Supercomputing. 75(3):1001-1013. https://doi.org/10.1007/s11227-018-2265-1S10011013753Alonso P, Cortina R, Rodríguez-Serrano FJ, Vera-Candeas P, Alonso-González M, Ranilla J (2017) Parallel online time warping for real-time audio-to-score alignment in multi-core systems. J Supercomput 73(1):126–138Alonso P, Vera-Candeas P, Cortina R, Ranilla J (2017) An efficient musical accompaniment parallel system for mobile devices. J Supercomput 73(1):343–353Arzt A (2016) Flexible and robust music tracking. Ph.D. thesis, Johannes Kepler University Linz, Linz, ÖsterreichArzt A, Widmer G, Dixon S (2008) Automatic page turning for musicians via real-time machine listening. In: Proceedings of the 18th European Conference on Artificial Intelligence (ECAI), Amsterdam, pp 241–245Carabias-Orti J, Rodríguez-Serrano F, Vera-Candeas P, Ruiz-Reyes N, Cañadas-Quesada F (2015) An audio to score alignment framework using spectral factorization and dynamic time warping. In: Proceedings of ISMIR, pp 742–748Cont A (2006) Realtime audio to score alignment for polyphonic music instruments, using sparse non-negative constraints and hierarchical HMMs. In: 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, vol 5. pp V–VCont A, Schwarz D, Schnell N, Raphael C (2007) Evaluation of real-time audio-to-score alignment. In: International Symposium on Music Information Retrieval (ISMIR), ViennaDannenberg RB, Raphael C (2006) Music score alignment and computer accompaniment. Commun ACM 49(8):38–43Devaney J, Ellis D (2009) Handling asynchrony in audio-score alignment. In: Proceedings of the International Computer Music Conference Computer Music Association. pp 29–32Dixon S (2005) An on-line time warping algorithm for tracking musical performances. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). pp 1727–1728Duan Z, Pardo B (2011) Soundprism: an online system for score-informed source separation of music audio. IEEE J Sel Top Signal Process 5(6):1205–1215Ewert S, Muller M, Grosche P (2009) High resolution audio synchronization using chroma onset features. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2009 (ICASSP 2009). pp 1869–1872Hu N, Dannenberg R, Tzanetakis G (2003) Polyphonic audio matching and alignment for music retrieval. In: 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. pp 185–188Kaprykowsky H, Rodet X (2006) Globally optimal short-time dynamic time warping, application to score to audio alignment. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, vol 5. pp. V–VLi B, Duan Z (2016) An approach to score following for piano performances with the sustained effect. IEEE/ACM Trans Audio Speech Lang Process 24(12):2425–2438Miron M, Carabias-Orti JJ, Bosch JJ, Gómez E, Janer J (2016) Score-informed source separation for multichannel orchestral recordings. J Electr Comput Eng 2016(8363507):1–19Muñoz-Montoro A, Cabañas-Molero P, Bris-Peñalver F, Combarro E, Cortina R, Alonso P (2017) Discovering the composition of audio files by audio-to-midi alignment. In: Proceedings of the 17th International Conference on Computational and Mathematical Methods in Science and Engineering. pp 1522–1529Orio N, Schwarz D (2001) Alignment of monophonic and polyphonic music to a score. In: Proceedings of the International Computer Music Conference (ICMC), pp 155–158Pätynen J, Pulkki V, Lokki T (2008) Anechoic recording system for symphony orchestra. Acta Acust United Acust 94(6):856–865Raphael C (2010) Music plus one and machine learning. In: Proceedings of the 27th International Conference on Machine Learning (ICML), pp 21–28Rodriguez-Serrano FJ, Carabias-Orti JJ, Vera-Candeas P, Martinez-Munoz D (2016) Tempo driven audio-to-score alignment using spectral decomposition and online dynamic time warping. ACM Trans Intell Syst Technol 8(2):22:1–22:2

    Online Symbolic Music Alignment with Offline Reinforcement Learning

    Full text link
    Symbolic Music Alignment is the process of matching performed MIDI notes to corresponding score notes. In this paper, we introduce a reinforcement learning (RL)-based online symbolic music alignment technique. The RL agent - an attention-based neural network - iteratively estimates the current score position from local score and performance contexts. For this symbolic alignment task, environment states can be sampled exhaustively and the reward is dense, rendering a formulation as a simplified offline RL problem straightforward. We evaluate the trained agent in three ways. First, in its capacity to identify correct score positions for sampled test contexts; second, as the core technique of a complete algorithm for symbolic online note-wise alignment; and finally, as a real-time symbolic score follower. We further investigate the pitch-based score and performance representations used as the agent's inputs. To this end, we develop a second model, a two-step Dynamic Time Warping (DTW)-based offline alignment algorithm leveraging the same input representation. The proposed model outperforms a state-of-the-art reference model of offline symbolic music alignment

    Linking Sheet Music and Audio - Challenges and New Approaches

    Get PDF
    Score and audio files are the two most important ways to represent, convey, record, store, and experience music. While score describes a piece of music on an abstract level using symbols such as notes, keys, and measures, audio files allow for reproducing a specific acoustic realization of the piece. Each of these representations reflects different facets of music yielding insights into aspects ranging from structural elements (e.g., motives, themes, musical form) to specific performance aspects (e.g., artistic shaping, sound). Therefore, the simultaneous access to score and audio representations is of great importance. In this paper, we address the problem of automatically generating musically relevant linking structures between the various data sources that are available for a given piece of music. In particular, we discuss the task of sheet music-audio synchronization with the aim to link regions in images of scanned scores to musically corresponding sections in an audio recording of the same piece. Such linking structures form the basis for novel interfaces that allow users to access and explore multimodal sources of music within a single framework. As our main contributions, we give an overview of the state-of-the-art for this kind of synchronization task, we present some novel approaches, and indicate future research directions. In particular, we address problems that arise in the presence of structural differences and discuss challenges when applying optical music recognition to complex orchestral scores. Finally, potential applications of the synchronization results are presented
    corecore