126,719 research outputs found
Real-time Soundprism
[EN] This paper presents a parallel real-time sound source separation system for decomposing an audio signal captured with a single microphone in so many audio signals as the number of instruments that are really playing. This approach is usually known as Soundprism. The application scenario of the system is for a concert hall in which users, instead of listening to the mixed audio, want to receive the audio of just an instrument, focusing on a particular performance. The challenge is even greater since we are interested in a real-time system on handheld devices, i.e., devices characterized by both low power consumption and mobility. The results presented show that it is possible to obtain real-time results in the tested scenarios using an ARM processor aided by a GPU, when this one is present.This work has been supported by the "Ministerio de Economia y Competitividad" of Spain and FEDER under projects TEC2015-67387-C4-{1,2,3}-R.Muñoz-Montoro, AJ.; Ranilla, J.; Vera-Candeas, P.; Combarro, EF.; Alonso-Jordá, P. (2019). Real-time Soundprism. The Journal of Supercomputing. 75(3):1594-1609. https://doi.org/10.1007/s11227-018-2703-0S15941609753Alonso P, Cortina R, RodrĂguez-Serrano FJ, Vera-Candeas P, Alonso-González M, Ranilla J (2017) Parallel online time warping for real-time audio-to-score alignment in multi-core systems. J Supercomput 73:126. https://doi.org/10.1007/s11227-016-1647-5Carabias-Orti JJ, Cobos M, Vera-Candeas P, RodrĂguez-Serrano FJ (2013) Nonnegative signal factorization with learnt instrument models for sound source separation in close-microphone recordings. EURASIP J Adv Signal Process 2013:184. https://doi.org/10.1186/1687-6180-2013-184Carabias-Orti JJ, Rodriguez-Serrano FJ, Vera-Candeas P, Canadas-Quesada FJ, Ruiz-Reyes N (2015) An audio to score alignment framework using spectral factorization and dynamic time warping. In: 16th International Society for Music Information Retrieval Conference, pp 742–748DĂaz-Gracia N, Cocaña-Fernández A, Alonso-González M, MartĂnez-ZaldĂvar FJ, Cortina R, GarcĂa-Mollá VM, Alonso P, Ranilla J (2014) NNMFPACK: a versatile approach to an NNMF parallel library. In: Proceedings of the 2014 International Conference on Computational and Mathematical Methods in Science and Engineering, pp 456–465DĂaz-Gracia N, Cocaña-Fernández A, Alonso-González M, MartĂnez-ZaldĂvar FJ, Cortina R, GarcĂa-Mollá VM, Vidal AM (2015) Improving NNMFPACK with heterogeneous and efficient kernels for β -divergence metrics. J Supercomput 71:1846–1856. https://doi.org/10.1007/s11227-014-1363-yDriedger J, Grohganz H, Prätzlich T, Ewert S, MĂĽller M (2013) Score-informed audio decomposition and applications. In: Proceedings of the 21st ACM International Conference on Multimedia, pp 541–544Duan Z, Pardo B (2011) Soundprism: an online system for score-informed source separation of music audio. IEEE J Sel Top Signal Process 5(6):1205–1215Duong NQ, Vincent E, Gribonval R (2010) Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Trans Audio Speech 18(7):1830–1840. https://doi.org/10.1109/TASL.2010.2050716Ewert S, MĂĽller M (2011) Estimating note intensities in music recordings. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp 385–388Ewert S, Pardo B, Mueller M, Plumbley MD (2014) Score-informed source separation for musical audio recordings: an overview. IEEE Signal Process Mag 31:116–124. https://doi.org/10.1109/MSP.2013.2296076Fastl H, Zwicker E (2007) Psychoacoustics. Springer, BerlinGanseman J, Scheunders P, Mysore GJ, Abel JS (2010) Source separation by score synthesis. Int Comput Music Conf 2010:1–4Goto M, Hashiguchi H, Nishimura T, Oka R (2002) RWC music database: popular, classical and jazz music databases. In: ISMIR, vol 2, pp 287–288Goto M (2004) Development of the RWC music database. In: Proceedings of the 18th International Congress on Acoustics (ICA 2004), ppp 553–556Hennequin R, David B, Badeau R (2011) Score informed audio source separation using a parametric model of non-negative spectrogram. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp 45–48. https://doi.org/10.1109/ICASSP.2011.5946324Itoyama K, Goto M, Komatani K et al (2008) Instrument equalizer for query-by-example retrieval: improving sound source separation based on integrated harmonic and inharmonic models. In: ISMIR. https://doi.org/10.1136/bmj.324.7341.827Marxer R, Janer J, Bonada J (2012) Low-latency instrument separation in polyphonic audio using timbre models. In: International Conference on Latent Variable Analysis and Signal Separation, pp 314–321Miron M, Carabias-Orti JJ, Janer J (2015) Improving score-informed source separation for classical music through note refinement. In: ISMIR, pp 448–454Ozerov A, FĂ©votte C (2010) Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans Audio Speech Lang Process 18:550–563. https://doi.org/10.1109/TASL.2009.2031510Ozerov A, Vincent E, Bimbot F (2012) A general flexible framework for the handling of prior information in audio source separation. IEEE Trans Audio Speech Lang Process 20:1118–1133. https://doi.org/10.1109/TASL.2011.2172425Pätynen J, Pulkki V, Lokki T (2008) Anechoic recording system for symphony orchestra. Acta Acust United Acust 94:856–865. https://doi.org/10.3813/AAA.918104Raphael C (2008) A classifier-based approach to score-guided source separation of musical audio. Comput Music J 32:51–59. https://doi.org/10.1162/comj.2008.32.1.51Rodriguez-Serrano FJ, Duan Z, Vera-Candeas P, Pardo B, Carabias-Orti JJ (2015) Online score-informed source separation with adaptive instrument models. J New Music Res 44:83–96. https://doi.org/10.1080/09298215.2014.989174Rodriguez-Serrano FJ, Carabias-Orti JJ, Vera-Candeas P, Martinez-Munoz D (2016) Tempo driven audio-to-score alignment using spectral decomposition and online dynamic time warping. ACM Trans Intell Syst Technol 8:1–20. https://doi.org/10.1145/2926717Sawada H, Araki S, Makino S (2011) Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment. IEEE Trans Audio Speech Lang Process 19(3):516–527. https://doi.org/10.1109/TASL.2010.2051355Vincent E, Araki S, Theis F et al (2012) The signal separation evaluation campaign (2007–2010): achievements and remaining challenges. Signal Process 92:1928–1936. https://doi.org/10.1016/j.sigpro.2011.10.007Vincent E, Bertin N, Gribonval R, Bimbot F (2014) From blind to guided audio source separation: how models and side information can improve the separation of sound. IEEE Signal Process Mag 31:107–115. https://doi.org/10.1109/MSP.2013.229744
HReMAS: Hybrid Real-time Musical Alignment System
[EN] This paper presents a real-time audio-to-score alignment system for musical applications. The aim of these systems is to synchronize a live musical performance with its symbolic representation in a music sheet. We have used as a base our previous real-time alignment system by enhancing it with a traceback stage, a stage used in offline alignment to improve the accuracy of the aligned note. This stage introduces some delay, what forces to assume a trade-off between output delay and alignment accuracy that must be considered in the design of this type of hybrid techniques. We have also improved our former system to execute faster in order to minimize this delay. Other interesting improvements, like identification of silence frames, have also been incorporated to our proposed system.This work has been supported by the "Ministerio de Economia y Competitividad" of Spain and FEDER under Projects TEC2015-67387-C4-{1,2,3}-R.Cabañas-Molero, P.; Cortina-ParajĂłn, R.; Combarro, EF.; Alonso-Jordá, P.; Bris-Peñalver, FJ. (2019). HReMAS: Hybrid Real-time Musical Alignment System. The Journal of Supercomputing. 75(3):1001-1013. https://doi.org/10.1007/s11227-018-2265-1S10011013753Alonso P, Cortina R, RodrĂguez-Serrano FJ, Vera-Candeas P, Alonso-González M, Ranilla J (2017) Parallel online time warping for real-time audio-to-score alignment in multi-core systems. J Supercomput 73(1):126–138Alonso P, Vera-Candeas P, Cortina R, Ranilla J (2017) An efficient musical accompaniment parallel system for mobile devices. J Supercomput 73(1):343–353Arzt A (2016) Flexible and robust music tracking. Ph.D. thesis, Johannes Kepler University Linz, Linz, Ă–sterreichArzt A, Widmer G, Dixon S (2008) Automatic page turning for musicians via real-time machine listening. In: Proceedings of the 18th European Conference on Artificial Intelligence (ECAI), Amsterdam, pp 241–245Carabias-Orti J, RodrĂguez-Serrano F, Vera-Candeas P, Ruiz-Reyes N, Cañadas-Quesada F (2015) An audio to score alignment framework using spectral factorization and dynamic time warping. In: Proceedings of ISMIR, pp 742–748Cont A (2006) Realtime audio to score alignment for polyphonic music instruments, using sparse non-negative constraints and hierarchical HMMs. In: 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, vol 5. pp V–VCont A, Schwarz D, Schnell N, Raphael C (2007) Evaluation of real-time audio-to-score alignment. In: International Symposium on Music Information Retrieval (ISMIR), ViennaDannenberg RB, Raphael C (2006) Music score alignment and computer accompaniment. Commun ACM 49(8):38–43Devaney J, Ellis D (2009) Handling asynchrony in audio-score alignment. In: Proceedings of the International Computer Music Conference Computer Music Association. pp 29–32Dixon S (2005) An on-line time warping algorithm for tracking musical performances. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). pp 1727–1728Duan Z, Pardo B (2011) Soundprism: an online system for score-informed source separation of music audio. IEEE J Sel Top Signal Process 5(6):1205–1215Ewert S, Muller M, Grosche P (2009) High resolution audio synchronization using chroma onset features. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2009 (ICASSP 2009). pp 1869–1872Hu N, Dannenberg R, Tzanetakis G (2003) Polyphonic audio matching and alignment for music retrieval. In: 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. pp 185–188Kaprykowsky H, Rodet X (2006) Globally optimal short-time dynamic time warping, application to score to audio alignment. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, vol 5. pp. V–VLi B, Duan Z (2016) An approach to score following for piano performances with the sustained effect. IEEE/ACM Trans Audio Speech Lang Process 24(12):2425–2438Miron M, Carabias-Orti JJ, Bosch JJ, GĂłmez E, Janer J (2016) Score-informed source separation for multichannel orchestral recordings. J Electr Comput Eng 2016(8363507):1–19Muñoz-Montoro A, Cabañas-Molero P, Bris-Peñalver F, Combarro E, Cortina R, Alonso P (2017) Discovering the composition of audio files by audio-to-midi alignment. In: Proceedings of the 17th International Conference on Computational and Mathematical Methods in Science and Engineering. pp 1522–1529Orio N, Schwarz D (2001) Alignment of monophonic and polyphonic music to a score. In: Proceedings of the International Computer Music Conference (ICMC), pp 155–158Pätynen J, Pulkki V, Lokki T (2008) Anechoic recording system for symphony orchestra. Acta Acust United Acust 94(6):856–865Raphael C (2010) Music plus one and machine learning. In: Proceedings of the 27th International Conference on Machine Learning (ICML), pp 21–28Rodriguez-Serrano FJ, Carabias-Orti JJ, Vera-Candeas P, Martinez-Munoz D (2016) Tempo driven audio-to-score alignment using spectral decomposition and online dynamic time warping. ACM Trans Intell Syst Technol 8(2):22:1–22:2
Recommended from our members
Score-informed transcription for automatic piano tutoring
In this paper, a score-informed transcription method for automatic piano tutoring is proposed. The method takes as input a recording made by a student which may contain mistakes, along with a reference score. The recording and the aligned synthesized score are automatically transcribed using the non-negative matrix factorization algorithm for multi-pitch estimation and hidden Markov models for note tracking. By comparing the two transcribed recordings, common errors occurring in transcription algorithms such as extra octave notes can be suppressed. The result is a piano-roll description which shows the mistakes made by the student along with the correctly played notes. Evaluation was performed on six pieces recorded using a Disklavier piano, using both manually-aligned and automatically-aligned scores as an input. Results comparing the system output with ground-truth annotation of the original recording reach a weighted F-measure of 93%, indicating that the proposed method can successfully analyze the student's performance
Automatic Quality Estimation for ASR System Combination
Recognizer Output Voting Error Reduction (ROVER) has been widely used for
system combination in automatic speech recognition (ASR). In order to select
the most appropriate words to insert at each position in the output
transcriptions, some ROVER extensions rely on critical information such as
confidence scores and other ASR decoder features. This information, which is
not always available, highly depends on the decoding process and sometimes
tends to over estimate the real quality of the recognized words. In this paper
we propose a novel variant of ROVER that takes advantage of ASR quality
estimation (QE) for ranking the transcriptions at "segment level" instead of:
i) relying on confidence scores, or ii) feeding ROVER with randomly ordered
hypotheses. We first introduce an effective set of features to compensate for
the absence of ASR decoder information. Then, we apply QE techniques to perform
accurate hypothesis ranking at segment-level before starting the fusion
process. The evaluation is carried out on two different tasks, in which we
respectively combine hypotheses coming from independent ASR systems and
multi-microphone recordings. In both tasks, it is assumed that the ASR decoder
information is not available. The proposed approach significantly outperforms
standard ROVER and it is competitive with two strong oracles that e xploit
prior knowledge about the real quality of the hypotheses to be combined.
Compared to standard ROVER, the abs olute WER improvements in the two
evaluation scenarios range from 0.5% to 7.3%
Implementation Science and Fidelity Measurement: A Test of the 3-5-7 Model™
Children and youths engaged with the child welfare system can experience grief and loss as a result of trauma, broken relationships, and inadequate attachments. Interventionists are often challenged to implement effective strategies that help youths to reestablish trusting relationships and to promote overall psychological well-being. A 5-year federal demonstration project funded by the U.S. Department of Health and Human Services, Children’s Bureau, guided by an implementation science model, sought to increase well-being in youths age 12–21 who were involved in the child welfare system. The 3-5-7 Model™, a strengths-based approach that empowers children, youths, and families to engage in grieving and integrating significant relationships, was studied. A fidelity system was created in order to test the model. Important lessons about implementation science guided the work of the demonstration project. Although definitive conclusions could not be reached, several indicators of psychological well-being were found to be associated with high levels of fidelity to the 3-5-7 ModelTM. Suggestions for future research are offered
Multimodal music information processing and retrieval: survey and future challenges
Towards improving the performance in various music information processing
tasks, recent studies exploit different modalities able to capture diverse
aspects of music. Such modalities include audio recordings, symbolic music
scores, mid-level representations, motion, and gestural data, video recordings,
editorial or cultural tags, lyrics and album cover arts. This paper critically
reviews the various approaches adopted in Music Information Processing and
Retrieval and highlights how multimodal algorithms can help Music Computing
applications. First, we categorize the related literature based on the
application they address. Subsequently, we analyze existing information fusion
approaches, and we conclude with the set of challenges that Music Information
Retrieval and Sound and Music Computing research communities should focus in
the next years
Deep speech inpainting of time-frequency masks
Transient loud intrusions, often occurring in noisy environments, can
completely overpower speech signal and lead to an inevitable loss of
information. While existing algorithms for noise suppression can yield
impressive results, their efficacy remains limited for very low signal-to-noise
ratios or when parts of the signal are missing. To address these limitations,
here we propose an end-to-end framework for speech inpainting, the
context-based retrieval of missing or severely distorted parts of
time-frequency representation of speech. The framework is based on a
convolutional U-Net trained via deep feature losses, obtained using speechVGG,
a deep speech feature extractor pre-trained on an auxiliary word classification
task. Our evaluation results demonstrate that the proposed framework can
recover large portions of missing or distorted time-frequency representation of
speech, up to 400 ms and 3.2 kHz in bandwidth. In particular, our approach
provided a substantial increase in STOI & PESQ objective metrics of the
initially corrupted speech samples. Notably, using deep feature losses to train
the framework led to the best results, as compared to conventional approaches.Comment: Accepted to InterSpeech202
- …