5,562 research outputs found
On the Application of Generic Summarization Algorithms to Music
Several generic summarization algorithms were developed in the past and
successfully applied in fields such as text and speech summarization. In this
paper, we review and apply these algorithms to music. To evaluate this
summarization's performance, we adopt an extrinsic approach: we compare a Fado
Genre Classifier's performance using truncated contiguous clips against the
summaries extracted with those algorithms on 2 different datasets. We show that
Maximal Marginal Relevance (MMR), LexRank and Latent Semantic Analysis (LSA)
all improve classification performance in both datasets used for testing.Comment: 12 pages, 1 table; Submitted to IEEE Signal Processing Letter
Multimodal music information processing and retrieval: survey and future challenges
Towards improving the performance in various music information processing
tasks, recent studies exploit different modalities able to capture diverse
aspects of music. Such modalities include audio recordings, symbolic music
scores, mid-level representations, motion, and gestural data, video recordings,
editorial or cultural tags, lyrics and album cover arts. This paper critically
reviews the various approaches adopted in Music Information Processing and
Retrieval and highlights how multimodal algorithms can help Music Computing
applications. First, we categorize the related literature based on the
application they address. Subsequently, we analyze existing information fusion
approaches, and we conclude with the set of challenges that Music Information
Retrieval and Sound and Music Computing research communities should focus in
the next years
Text segmentation on multilabel documents: A distant-supervised approach
Segmenting text into semantically coherent segments is an important task with
applications in information retrieval and text summarization. Developing
accurate topical segmentation requires the availability of training data with
ground truth information at the segment level. However, generating such labeled
datasets, especially for applications in which the meaning of the labels is
user-defined, is expensive and time-consuming. In this paper, we develop an
approach that instead of using segment-level ground truth information, it
instead uses the set of labels that are associated with a document and are
easier to obtain as the training data essentially corresponds to a multilabel
dataset. Our method, which can be thought of as an instance of distant
supervision, improves upon the previous approaches by exploiting the fact that
consecutive sentences in a document tend to talk about the same topic, and
hence, probably belong to the same class. Experiments on the text segmentation
task on a variety of datasets show that the segmentation produced by our method
beats the competing approaches on four out of five datasets and performs at par
on the fifth dataset. On the multilabel text classification task, our method
performs at par with the competing approaches, while requiring significantly
less time to estimate than the competing approaches.Comment: Accepted in 2018 IEEE International Conference on Data Mining (ICDM
Using Generic Summarization to Improve Music Information Retrieval Tasks
In order to satisfy processing time constraints, many MIR tasks process only
a segment of the whole music signal. This practice may lead to decreasing
performance, since the most important information for the tasks may not be in
those processed segments. In this paper, we leverage generic summarization
algorithms, previously applied to text and speech summarization, to summarize
items in music datasets. These algorithms build summaries, that are both
concise and diverse, by selecting appropriate segments from the input signal
which makes them good candidates to summarize music as well. We evaluate the
summarization process on binary and multiclass music genre classification
tasks, by comparing the performance obtained using summarized datasets against
the performances obtained using continuous segments (which is the traditional
method used for addressing the previously mentioned time constraints) and full
songs of the same original dataset. We show that GRASSHOPPER, LexRank, LSA,
MMR, and a Support Sets-based Centrality model improve classification
performance when compared to selected 30-second baselines. We also show that
summarized datasets lead to a classification performance whose difference is
not statistically significant from using full songs. Furthermore, we make an
argument stating the advantages of sharing summarized datasets for future MIR
research.Comment: 24 pages, 10 tables; Submitted to IEEE/ACM Transactions on Audio,
Speech and Language Processin
Text segmentation techniques: A critical review
Text segmentation is widely used for processing text. It is a method of splitting a document into smaller parts, which is usually called segments. Each segment has its relevant meaning. Those segments categorized as word, sentence, topic, phrase or any information unit depending on the task of the text analysis. This study presents various reasons of usage of text segmentation for different analyzing approaches. We categorized the types of documents and languages used. The main contribution of this study includes a summarization of 50 research papers and an illustration of past decade (January 2007- January 2017)’s of research that applied text segmentation as their main approach for analysing text. Results revealed the popularity of using text segmentation in different languages. Besides that, the “word” seems to be the most practical and usable segment, as it is the smaller unit than the phrase, sentence or line
Multi-task Layout Analysis of Handwritten Musical Scores
[EN] Document Layout Analysis (DLA) is a process that must be performed before attempting to recognize the content of handwritten musical scores by a modern automatic or semiautomatic system. DLA should provide the segmentation of the document image into semantically useful region types such as staff, lyrics, etc. In this paper we extend our previous work for DLA of handwritten text documents to also address complex handwritten music scores. This system is able to perform region segmentation, region classification and baseline detection in an integrated manner. Several experiments were performed in two different datasets in order to validate this approach and assess it in different scenarios. Results show high accuracy in such complex manuscripts and very competent computational time, which is a good indicator of the scalability of the method for very large collections.This work was partially supported by the Universitat Politecnica de Valencia under grant FPI-420II/899, a 2017-2018 Digital Humanities research grant of the
BBVA Foundation for the project Carabela, the History Of Medieval Europe
(HOME) project (Ref.: PCI2018-093122) and through the EU project READ
(Horizon-2020 program, grant Ref. 674943). NVIDIA Corporation kindly donated the Titan X GPU used for this research.Quirós, L.; Toselli, AH.; Vidal, E. (2019). Multi-task Layout Analysis of Handwritten Musical Scores. Springer. 123-134. https://doi.org/10.1007/978-3-030-31321-0_11S123134Burgoyne, J.A., Ouyang, Y., Himmelman, T., Devaney, J., Pugin, L., Fujinaga, I.: Lyric extraction and recognition on digital images of early music sources. In: Proceedings of the 10th International Society for Music Information Retrieval Conference, vol. 10, pp. 723–727 (2009)Calvo-Zaragoza, J., Toselli, A.H., Vidal, E.: Probabilistic music-symbol spotting in handwritten scores. In: 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 558–563, August 2018Calvo-Zaragoza, J., Zhang, K., Saleh, Z., Vigliensoni, G., Fujinaga, I.: Music document layout analysis through machine learning and human feedback. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 02, pp. 23–24, November 2017Calvo-Zaragoza, J., Castellanos, F.J., Vigliensoni, G., Fujinaga, I.: Deep neural networks for document processing of music score images. Appl. Sci. 8(5), 654 (2018). (2076-3417)Calvo-Zaragoza, J., Toselli, A.H., Vidal, E.: Handwritten music recognition for mensural notation: formulation, data and baseline results. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1081–1086. IEEE (2017)Campos, V.B., Calvo-Zaragoza, J., Toselli, A.H., Ruiz, E.V.: Sheet music statistical layout analysis. In: 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 313–318. IEEE (2016)Castellanos, F.J., Calvo-Zaragoza, J., Vigliensoni, G., Fujinaga, I.: Document analysis of music score images with selectional auto-encoders. In: 19th International Society for Music Information Retrieval Conference, pp. 256–263 (2018)Grüning, T., Labahn, R., Diem, M., Kleber, F., Fiel, S.: READ-BAD: a new dataset and evaluation scheme for baseline detection in archival documents. CoRR abs/1705.03311 (2017). http://arxiv.org/abs/1705.03311Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations (ICLR) (2015)Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)Quirós, L.: Multi-task handwritten document layout analysis. ArXiv e-prints, 1806.08852 (2018). https://arxiv.org/abs/1806.08852Quirós, L., Bosch, V., Serrano, L., Toselli, A.H., Vidal, E.: From HMMs to RNNs: computer-assisted transcription of a handwritten notarial records collection. In: 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 116–121. IEEE, August 2018Rebelo, A., Fujinaga, I., Paszkiewicz, F., Marcal, A.R., Guedes, C., Cardoso, J.S.: Optical music recognition: state-of-the-art and open issues. Int. J. Multimed. Inf. Retrieval 1(3), 173–190 (2012)Sánchez, J.A., Romero, V., Toselli, A.H., Villegas, M., Vidal, E.: ICDAR2017 competition on handwritten text recognition on the READ dataset. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1383–1388. IEEE (2017)Suzuki, S., et al.: Topological structural analysis of digitized binary images by border following. Comput. Vis. Graph. Image Process. 30(1), 32–46 (1985
DeepScores : a dataset for segmentation, detection and classification of tiny objects
We present the DeepScores dataset with the goal of advancing the state-of-the-art in small object recognition by placing the question of object recognition in the context of scene understanding. DeepScores contains high quality images of musical scores, partitioned into 300,000 sheets of written music that contain symbols of different shapes and sizes. With close to a hundred million small objects, this makes our dataset not only unique, but also the largest public dataset. DeepScores comes with ground truth for object classification, detection and semantic segmentation. DeepScores thus poses a relevant challenge for computer vision in general, and optical music recognition (OMR) research in particular. We present a detailed statistical analysis of the dataset, comparing it with other computer vision datasets like PASCAL VOC, SUN, SVHN, ImageNet, MS-COCO, as well as with other OMR datasets. Finally, we provide baseline performances for object classification, intuition for the inherent difficulty that DeepScores poses to state-of-the-art object detectors like YOLO or R-CNN, and give pointers to future research based on this dataset
- …