6,082 research outputs found
An efficient musical accompaniment parallel system for mobile devices
[EN] This work presents a software system designed to track the reproduction of a musical piece with the aim to match the score position into its symbolic representation on a digital sheet. Into this system, known as automated musical accompaniment system, the process of score alignment can be carried out real-time. A real-time score alignment, also known as score following, poses an important challenge due to the large amount of computation needed to process each digital frame and the very small time slot to process it. Moreover, the challenge is even greater since we are interested on handheld devices, i.e. devices characterized by both low power consumption and mobility. The results presented here show that it is possible to exploit efficiently several cores of an ARM(A (R)) processor, or a GPU accelerator (presented in some SoCs from NVIDIA) reducing the processing time per frame under 10 ms in most of the cases.This work was supported by the Ministry of Economy and Competitiveness from Spain (FEDER) under projects TEC2015-67387-C4-1-R, TEC2015-67387-C4-2-R and TEC2015-67387-C4-3-R, the Andalusian Business, Science and Innovation Council under project P2010-TIC-6762 (FEDER), and the Generalitat Valenciana PROMETEOII/2014/003Alonso-Jordá, P.; Vera-Candeas, P.; Cortina, R.; Ranilla, J. (2017). An efficient musical accompaniment parallel system for mobile devices. The Journal of Supercomputing. 73(1):343-353. https://doi.org/10.1007/s11227-016-1865-xS343353731Cont A, Schwarz D, Schnell N, Raphael C (2007) Evaluation of real- time audio-to-score alignment. In: Proc. of the International Conference on Music Information Retrieval (ISMIR) 2007, ViennaArzt A (2008) Score following with dynamic time warping. An automatic page-turner. Master’s Thesis, Vienna University of Technology, ViennaRaphael C (2010) Music plus one and machine learning. In: Proc. of the 27 th International Conference on Machine Learning, Haifa, pp 21–28Carabias-Ortí JJ, Rodríguez-Serrano FJ, Vera-Candeas P, Ruiz-Reyes N, Cañadas-Quesada FJ (2015) An audio to score alignment framework using spectral factorization and dynamic time warping. In: Proc. of the International Conference on Music Information Retrieval (ISMIR), Málaga, pp 742–748Cont A (2010) A coupled duration-focused architecture for real-time music-to-score alignment. IEEE Trans. Pattern Anal. Mach. Intell. 32(6):974–987Montecchio N, Orio N (2009) A discrete filterbank approach to audio to score matching for score following. In: Proc. of the International Conference on Music Information Retrieval (ISMIR), pp 495–500Puckette M (1995) Score following using the sung voice. In: Proc. of the International Computer Music Conference (ICMC), pp 175–178Duan Z, Pardo B (2011) Soundprism: an online system for score-informed source separation of music audio. IEEE J. Sel. Top. Signal Process. 5(6):1205–1215Cont A (2006) Realtime audio to score alignment for polyphonic music instruments using sparse non-negative constraints and hierarchical hmms. In: Proc. of IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), ToulouseCuvillier P, Cont A (2014) Coherent time modeling of Semi-Markov models with application to realtime audio-to-score alignment. In Proc. of the 2014 IEEE International Workshop on Machine Learning for Signal Processing, p 16Joder C, Essid S, Richard G (2013) Learning optimal features for polyphonic audio-to-score alignment. IEEE Trans. Audio Speech Lang. Process. 21(10):2118–2128Dixon S (2005) Live tracking of musical performances using on-line time warping. In: Proc. International Conference on Digital Audio Effects (DAFx), Madrid, pp 92–97Hu N, Dannenberg RB, Tzanetakis G (2009) Polyphonic audio matching and alignment for music retrieval. In: Proc. of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp 185–188Orio N, Schwarz D (2001) Alignment of monophonic and polyphonic music to a score. In: Proc. International Computer Music Conference (ICMC)Alonso P, Cortina R, Rodríguez-Serrano FJ, Vera-Candeas P, Alonso-Gonzalez M, Ranilla J (2016) Parallel online time warping for real-time audio-to-score alignment in multi-core systems. J. Supercomput. doi: 10.1007/s11227-016-1647-5 (published online)Carabias-Ortí JJ, Rodríguez-Serrano FJ, Vera-Candeas P, Cañadas-Quesada FJ, Ruiz-Reyes N (2013) Constrained non-negative sparse coding using learnt instrument templates for realtime music transcription, Eng. Appl. Artif. Intell. 26(7):1671–1680Carabias-Ortí JJ, Rodríguez-Serrano FJ, Vera-Candeas P, Martínez-Muñoz D (2016) Tempo driven audio-to-score alignment using spectral decomposition and online dynamic time warping. ACM Trans. Intell. Syst. Technol. (accepted)FFTW (2016) http://www.fftw.org . Accessed July 2016NVIDIA CUDA Fast Fourier Transform library (cuFFT) (2016) http://developer.nvidia.com/cufft . Accessed July 2016The OpenMP API specification for parallel programming (2016) http://openmp.org . Accessed July 201
Clustering by compression
We present a new method for clustering based on compression. The method
doesn't use subject-specific features or background knowledge, and works as
follows: First, we determine a universal similarity distance, the normalized
compression distance or NCD, computed from the lengths of compressed data files
(singly and in pairwise concatenation). Second, we apply a hierarchical
clustering method. The NCD is universal in that it is not restricted to a
specific application area, and works across application area boundaries. A
theoretical precursor, the normalized information distance, co-developed by one
of the authors, is provably optimal but uses the non-computable notion of
Kolmogorov complexity. We propose precise notions of similarity metric, normal
compressor, and show that the NCD based on a normal compressor is a similarity
metric that approximates universality. To extract a hierarchy of clusters from
the distance matrix, we determine a dendrogram (binary tree) by a new quartet
method and a fast heuristic to implement it. The method is implemented and
available as public software, and is robust under choice of different
compressors. To substantiate our claims of universality and robustness, we
report evidence of successful application in areas as diverse as genomics,
virology, languages, literature, music, handwritten digits, astronomy, and
combinations of objects from completely different domains, using statistical,
dictionary, and block sorting compressors. In genomics we presented new
evidence for major questions in Mammalian evolution, based on
whole-mitochondrial genomic analysis: the Eutherian orders and the Marsupionta
hypothesis against the Theria hypothesis.Comment: LaTeX, 27 pages, 20 figure
Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop
We summarize the accomplishments of a multi-disciplinary workshop exploring
the computational and scientific issues surrounding the discovery of linguistic
units (subwords and words) in a language without orthography. We study the
replacement of orthographic transcriptions by images and/or translated text in
a well-resourced language to help unsupervised discovery from raw speech.Comment: Accepted to ICASSP 201
Speech vocoding for laboratory phonology
Using phonological speech vocoding, we propose a platform for exploring
relations between phonology and speech processing, and in broader terms, for
exploring relations between the abstract and physical structures of a speech
signal. Our goal is to make a step towards bridging phonology and speech
processing and to contribute to the program of Laboratory Phonology. We show
three application examples for laboratory phonology: compositional phonological
speech modelling, a comparison of phonological systems and an experimental
phonological parametric text-to-speech (TTS) system. The featural
representations of the following three phonological systems are considered in
this work: (i) Government Phonology (GP), (ii) the Sound Pattern of English
(SPE), and (iii) the extended SPE (eSPE). Comparing GP- and eSPE-based vocoded
speech, we conclude that the latter achieves slightly better results than the
former. However, GP - the most compact phonological speech representation -
performs comparably to the systems with a higher number of phonological
features. The parametric TTS based on phonological speech representation, and
trained from an unlabelled audiobook in an unsupervised manner, achieves
intelligibility of 85% of the state-of-the-art parametric speech synthesis. We
envision that the presented approach paves the way for researchers in both
fields to form meaningful hypotheses that are explicitly testable using the
concepts developed and exemplified in this paper. On the one hand, laboratory
phonologists might test the applied concepts of their theoretical models, and
on the other hand, the speech processing community may utilize the concepts
developed for the theoretical phonological models for improvements of the
current state-of-the-art applications
Quantification of audio quality loss after wireless transfer
The report describes a quality measurement for audio, both the theoretical background and implementation. It begins by describing the unlicensed methods the implementation is based on, Segmental SNR, Frequency Weighted Segmental SNR, Log-Likelihood Ratio, Cepstral Distance and Weighted Slope Spectral distance, and the commercial methods used as reference, PEAQ and PESQ. It also mentions the problems present in wireless transfer and the concept of sound quality assessment. It concludes by describing the suggested analysis method and implemented software together with the results when compared to PEAQ and PESQ.When talking on the phone, how do you know if the sound quality is good or bad? How do you know if it is better or worse than your last phone call? Although the perception of sound varies from person to person, only humans can truly determine sound quality. However, companies wants to ensure the quality of their product before releasing it, and therefore need an easier way to evaluate without humans, since human testing is expensive, time consuming and cannot be guaranteed to be consistent
- …