126,719 research outputs found

    Real-time Soundprism

    Full text link
    [EN] This paper presents a parallel real-time sound source separation system for decomposing an audio signal captured with a single microphone in so many audio signals as the number of instruments that are really playing. This approach is usually known as Soundprism. The application scenario of the system is for a concert hall in which users, instead of listening to the mixed audio, want to receive the audio of just an instrument, focusing on a particular performance. The challenge is even greater since we are interested in a real-time system on handheld devices, i.e., devices characterized by both low power consumption and mobility. The results presented show that it is possible to obtain real-time results in the tested scenarios using an ARM processor aided by a GPU, when this one is present.This work has been supported by the "Ministerio de Economia y Competitividad" of Spain and FEDER under projects TEC2015-67387-C4-{1,2,3}-R.Muñoz-Montoro, AJ.; Ranilla, J.; Vera-Candeas, P.; Combarro, EF.; Alonso-Jordá, P. (2019). Real-time Soundprism. The Journal of Supercomputing. 75(3):1594-1609. https://doi.org/10.1007/s11227-018-2703-0S15941609753Alonso P, Cortina R, Rodríguez-Serrano FJ, Vera-Candeas P, Alonso-González M, Ranilla J (2017) Parallel online time warping for real-time audio-to-score alignment in multi-core systems. J Supercomput 73:126. https://doi.org/10.1007/s11227-016-1647-5Carabias-Orti JJ, Cobos M, Vera-Candeas P, Rodríguez-Serrano FJ (2013) Nonnegative signal factorization with learnt instrument models for sound source separation in close-microphone recordings. EURASIP J Adv Signal Process 2013:184. https://doi.org/10.1186/1687-6180-2013-184Carabias-Orti JJ, Rodriguez-Serrano FJ, Vera-Candeas P, Canadas-Quesada FJ, Ruiz-Reyes N (2015) An audio to score alignment framework using spectral factorization and dynamic time warping. In: 16th International Society for Music Information Retrieval Conference, pp 742–748Díaz-Gracia N, Cocaña-Fernández A, Alonso-González M, Martínez-Zaldívar FJ, Cortina R, García-Mollá VM, Alonso P, Ranilla J (2014) NNMFPACK: a versatile approach to an NNMF parallel library. In: Proceedings of the 2014 International Conference on Computational and Mathematical Methods in Science and Engineering, pp 456–465Díaz-Gracia N, Cocaña-Fernández A, Alonso-González M, Martínez-Zaldívar FJ, Cortina R, García-Mollá VM, Vidal AM (2015) Improving NNMFPACK with heterogeneous and efficient kernels for β\beta β -divergence metrics. J Supercomput 71:1846–1856. https://doi.org/10.1007/s11227-014-1363-yDriedger J, Grohganz H, Prätzlich T, Ewert S, Müller M (2013) Score-informed audio decomposition and applications. In: Proceedings of the 21st ACM International Conference on Multimedia, pp 541–544Duan Z, Pardo B (2011) Soundprism: an online system for score-informed source separation of music audio. IEEE J Sel Top Signal Process 5(6):1205–1215Duong NQ, Vincent E, Gribonval R (2010) Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Trans Audio Speech 18(7):1830–1840. https://doi.org/10.1109/TASL.2010.2050716Ewert S, Müller M (2011) Estimating note intensities in music recordings. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 385–388Ewert S, Pardo B, Mueller M, Plumbley MD (2014) Score-informed source separation for musical audio recordings: an overview. IEEE Signal Process Mag 31:116–124. https://doi.org/10.1109/MSP.2013.2296076Fastl H, Zwicker E (2007) Psychoacoustics. Springer, BerlinGanseman J, Scheunders P, Mysore GJ, Abel JS (2010) Source separation by score synthesis. Int Comput Music Conf 2010:1–4Goto M, Hashiguchi H, Nishimura T, Oka R (2002) RWC music database: popular, classical and jazz music databases. In: ISMIR, vol 2, pp 287–288Goto M (2004) Development of the RWC music database. In: Proceedings of the 18th International Congress on Acoustics (ICA 2004), ppp 553–556Hennequin R, David B, Badeau R (2011) Score informed audio source separation using a parametric model of non-negative spectrogram. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp 45–48. https://doi.org/10.1109/ICASSP.2011.5946324Itoyama K, Goto M, Komatani K et al (2008) Instrument equalizer for query-by-example retrieval: improving sound source separation based on integrated harmonic and inharmonic models. In: ISMIR. https://doi.org/10.1136/bmj.324.7341.827Marxer R, Janer J, Bonada J (2012) Low-latency instrument separation in polyphonic audio using timbre models. In: International Conference on Latent Variable Analysis and Signal Separation, pp 314–321Miron M, Carabias-Orti JJ, Janer J (2015) Improving score-informed source separation for classical music through note refinement. In: ISMIR, pp 448–454Ozerov A, Févotte C (2010) Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans Audio Speech Lang Process 18:550–563. https://doi.org/10.1109/TASL.2009.2031510Ozerov A, Vincent E, Bimbot F (2012) A general flexible framework for the handling of prior information in audio source separation. IEEE Trans Audio Speech Lang Process 20:1118–1133. https://doi.org/10.1109/TASL.2011.2172425Pätynen J, Pulkki V, Lokki T (2008) Anechoic recording system for symphony orchestra. Acta Acust United Acust 94:856–865. https://doi.org/10.3813/AAA.918104Raphael C (2008) A classifier-based approach to score-guided source separation of musical audio. Comput Music J 32:51–59. https://doi.org/10.1162/comj.2008.32.1.51Rodriguez-Serrano FJ, Duan Z, Vera-Candeas P, Pardo B, Carabias-Orti JJ (2015) Online score-informed source separation with adaptive instrument models. J New Music Res 44:83–96. https://doi.org/10.1080/09298215.2014.989174Rodriguez-Serrano FJ, Carabias-Orti JJ, Vera-Candeas P, Martinez-Munoz D (2016) Tempo driven audio-to-score alignment using spectral decomposition and online dynamic time warping. ACM Trans Intell Syst Technol 8:1–20. https://doi.org/10.1145/2926717Sawada H, Araki S, Makino S (2011) Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment. IEEE Trans Audio Speech Lang Process 19(3):516–527. https://doi.org/10.1109/TASL.2010.2051355Vincent E, Araki S, Theis F et al (2012) The signal separation evaluation campaign (2007–2010): achievements and remaining challenges. Signal Process 92:1928–1936. https://doi.org/10.1016/j.sigpro.2011.10.007Vincent E, Bertin N, Gribonval R, Bimbot F (2014) From blind to guided audio source separation: how models and side information can improve the separation of sound. IEEE Signal Process Mag 31:107–115. https://doi.org/10.1109/MSP.2013.229744

    HReMAS: Hybrid Real-time Musical Alignment System

    Get PDF
    [EN] This paper presents a real-time audio-to-score alignment system for musical applications. The aim of these systems is to synchronize a live musical performance with its symbolic representation in a music sheet. We have used as a base our previous real-time alignment system by enhancing it with a traceback stage, a stage used in offline alignment to improve the accuracy of the aligned note. This stage introduces some delay, what forces to assume a trade-off between output delay and alignment accuracy that must be considered in the design of this type of hybrid techniques. We have also improved our former system to execute faster in order to minimize this delay. Other interesting improvements, like identification of silence frames, have also been incorporated to our proposed system.This work has been supported by the "Ministerio de Economia y Competitividad" of Spain and FEDER under Projects TEC2015-67387-C4-{1,2,3}-R.Cabañas-Molero, P.; Cortina-Parajón, R.; Combarro, EF.; Alonso-Jordá, P.; Bris-Peñalver, FJ. (2019). HReMAS: Hybrid Real-time Musical Alignment System. The Journal of Supercomputing. 75(3):1001-1013. https://doi.org/10.1007/s11227-018-2265-1S10011013753Alonso P, Cortina R, Rodríguez-Serrano FJ, Vera-Candeas P, Alonso-González M, Ranilla J (2017) Parallel online time warping for real-time audio-to-score alignment in multi-core systems. J Supercomput 73(1):126–138Alonso P, Vera-Candeas P, Cortina R, Ranilla J (2017) An efficient musical accompaniment parallel system for mobile devices. J Supercomput 73(1):343–353Arzt A (2016) Flexible and robust music tracking. Ph.D. thesis, Johannes Kepler University Linz, Linz, ÖsterreichArzt A, Widmer G, Dixon S (2008) Automatic page turning for musicians via real-time machine listening. In: Proceedings of the 18th European Conference on Artificial Intelligence (ECAI), Amsterdam, pp 241–245Carabias-Orti J, Rodríguez-Serrano F, Vera-Candeas P, Ruiz-Reyes N, Cañadas-Quesada F (2015) An audio to score alignment framework using spectral factorization and dynamic time warping. In: Proceedings of ISMIR, pp 742–748Cont A (2006) Realtime audio to score alignment for polyphonic music instruments, using sparse non-negative constraints and hierarchical HMMs. In: 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, vol 5. pp V–VCont A, Schwarz D, Schnell N, Raphael C (2007) Evaluation of real-time audio-to-score alignment. In: International Symposium on Music Information Retrieval (ISMIR), ViennaDannenberg RB, Raphael C (2006) Music score alignment and computer accompaniment. Commun ACM 49(8):38–43Devaney J, Ellis D (2009) Handling asynchrony in audio-score alignment. In: Proceedings of the International Computer Music Conference Computer Music Association. pp 29–32Dixon S (2005) An on-line time warping algorithm for tracking musical performances. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). pp 1727–1728Duan Z, Pardo B (2011) Soundprism: an online system for score-informed source separation of music audio. IEEE J Sel Top Signal Process 5(6):1205–1215Ewert S, Muller M, Grosche P (2009) High resolution audio synchronization using chroma onset features. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2009 (ICASSP 2009). pp 1869–1872Hu N, Dannenberg R, Tzanetakis G (2003) Polyphonic audio matching and alignment for music retrieval. In: 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. pp 185–188Kaprykowsky H, Rodet X (2006) Globally optimal short-time dynamic time warping, application to score to audio alignment. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, vol 5. pp. V–VLi B, Duan Z (2016) An approach to score following for piano performances with the sustained effect. IEEE/ACM Trans Audio Speech Lang Process 24(12):2425–2438Miron M, Carabias-Orti JJ, Bosch JJ, Gómez E, Janer J (2016) Score-informed source separation for multichannel orchestral recordings. J Electr Comput Eng 2016(8363507):1–19Muñoz-Montoro A, Cabañas-Molero P, Bris-Peñalver F, Combarro E, Cortina R, Alonso P (2017) Discovering the composition of audio files by audio-to-midi alignment. In: Proceedings of the 17th International Conference on Computational and Mathematical Methods in Science and Engineering. pp 1522–1529Orio N, Schwarz D (2001) Alignment of monophonic and polyphonic music to a score. In: Proceedings of the International Computer Music Conference (ICMC), pp 155–158Pätynen J, Pulkki V, Lokki T (2008) Anechoic recording system for symphony orchestra. Acta Acust United Acust 94(6):856–865Raphael C (2010) Music plus one and machine learning. In: Proceedings of the 27th International Conference on Machine Learning (ICML), pp 21–28Rodriguez-Serrano FJ, Carabias-Orti JJ, Vera-Candeas P, Martinez-Munoz D (2016) Tempo driven audio-to-score alignment using spectral decomposition and online dynamic time warping. ACM Trans Intell Syst Technol 8(2):22:1–22:2

    Automatic Quality Estimation for ASR System Combination

    Get PDF
    Recognizer Output Voting Error Reduction (ROVER) has been widely used for system combination in automatic speech recognition (ASR). In order to select the most appropriate words to insert at each position in the output transcriptions, some ROVER extensions rely on critical information such as confidence scores and other ASR decoder features. This information, which is not always available, highly depends on the decoding process and sometimes tends to over estimate the real quality of the recognized words. In this paper we propose a novel variant of ROVER that takes advantage of ASR quality estimation (QE) for ranking the transcriptions at "segment level" instead of: i) relying on confidence scores, or ii) feeding ROVER with randomly ordered hypotheses. We first introduce an effective set of features to compensate for the absence of ASR decoder information. Then, we apply QE techniques to perform accurate hypothesis ranking at segment-level before starting the fusion process. The evaluation is carried out on two different tasks, in which we respectively combine hypotheses coming from independent ASR systems and multi-microphone recordings. In both tasks, it is assumed that the ASR decoder information is not available. The proposed approach significantly outperforms standard ROVER and it is competitive with two strong oracles that e xploit prior knowledge about the real quality of the hypotheses to be combined. Compared to standard ROVER, the abs olute WER improvements in the two evaluation scenarios range from 0.5% to 7.3%

    Implementation Science and Fidelity Measurement: A Test of the 3-5-7 Model™

    Full text link
    Children and youths engaged with the child welfare system can experience grief and loss as a result of trauma, broken relationships, and inadequate attachments. Interventionists are often challenged to implement effective strategies that help youths to reestablish trusting relationships and to promote overall psychological well-being. A 5-year federal demonstration project funded by the U.S. Department of Health and Human Services, Children’s Bureau, guided by an implementation science model, sought to increase well-being in youths age 12–21 who were involved in the child welfare system. The 3-5-7 Model™, a strengths-based approach that empowers children, youths, and families to engage in grieving and integrating significant relationships, was studied. A fidelity system was created in order to test the model. Important lessons about implementation science guided the work of the demonstration project. Although definitive conclusions could not be reached, several indicators of psychological well-being were found to be associated with high levels of fidelity to the 3-5-7 ModelTM. Suggestions for future research are offered

    Multimodal music information processing and retrieval: survey and future challenges

    Full text link
    Towards improving the performance in various music information processing tasks, recent studies exploit different modalities able to capture diverse aspects of music. Such modalities include audio recordings, symbolic music scores, mid-level representations, motion, and gestural data, video recordings, editorial or cultural tags, lyrics and album cover arts. This paper critically reviews the various approaches adopted in Music Information Processing and Retrieval and highlights how multimodal algorithms can help Music Computing applications. First, we categorize the related literature based on the application they address. Subsequently, we analyze existing information fusion approaches, and we conclude with the set of challenges that Music Information Retrieval and Sound and Music Computing research communities should focus in the next years

    Deep speech inpainting of time-frequency masks

    Full text link
    Transient loud intrusions, often occurring in noisy environments, can completely overpower speech signal and lead to an inevitable loss of information. While existing algorithms for noise suppression can yield impressive results, their efficacy remains limited for very low signal-to-noise ratios or when parts of the signal are missing. To address these limitations, here we propose an end-to-end framework for speech inpainting, the context-based retrieval of missing or severely distorted parts of time-frequency representation of speech. The framework is based on a convolutional U-Net trained via deep feature losses, obtained using speechVGG, a deep speech feature extractor pre-trained on an auxiliary word classification task. Our evaluation results demonstrate that the proposed framework can recover large portions of missing or distorted time-frequency representation of speech, up to 400 ms and 3.2 kHz in bandwidth. In particular, our approach provided a substantial increase in STOI & PESQ objective metrics of the initially corrupted speech samples. Notably, using deep feature losses to train the framework led to the best results, as compared to conventional approaches.Comment: Accepted to InterSpeech202
    • …
    corecore