96 research outputs found

    Avoiding staff removal stage in optical music recognition: application to scores written in white mensural notation

    Get PDF
    Staff detection and removal is one of the most important issues in optical music recognition (OMR) tasks since common approaches for symbol detection and classification are based on this process. Due to its complexity, staff detection and removal is often inaccurate, leading to a great number of errors in posterior stages. For this reason, a new approach that avoids this stage is proposed in this paper, which is expected to overcome these drawbacks. Our approach is put into practice in a case of study focused on scores written in white mensural notation. Symbol detection is performed by using the vertical projection of the staves. The cross-correlation operator for template matching is used at the classification stage. The goodness of our proposal is shown in an experiment in which our proposal attains an extraction rate of 96 % and a classification rate of 92 %, on average. The results found have reinforced the idea of pursuing a new research line in OMR systems without the need of the removal of staff lines.This work has been funded by the Ministerio de Educación, Cultura y Deporte of the Spanish Government under a FPU Fellowship No. AP20120939, by the Ministerio de Economía y Competitividad of the Spanish Government under Project No. TIN2013-48152-C2-1-R and Project No. TIN2013-47276-C6-2-R, by the Consejería de Educación de la Comunidad Valenciana under Project No. PROMETEO/2012/017 and by the Junta de Andalucía under Project No. P11-TIC-7154

    Extensions to rank-based prototype selection in k-Nearest Neighbour classification

    Get PDF
    The k-nearest neighbour rule is commonly considered for classification tasks given its straightforward implementation and good performance in many applications. However, its efficiency represents an obstacle in real-case scenarios because the classification requires computing a distance to every single prototype of the training set. Prototype Selection (PS) is a typical approach to alleviate this problem, which focuses on reducing the size of the training set by selecting the most interesting prototypes. In this context, rank methods have been postulated as a good solution: following some heuristics, these methods perform an ordering of the prototypes according to their relevance in the classification task, which is then used to select the most relevant ones. This work presents a significant improvement of existing rank methods by proposing two extensions: (i) a greater robustness against noise at label level by considering the parameter ‘k’ of the classification in the selection process; and (ii) a new parameter-free rule to select the prototypes once they have been ordered. The experiments performed in different scenarios and datasets demonstrate the goodness of these extensions. Also, it is empirically proved that the new full approach is competitive with respect to existing PS algorithms.This work is supported by the Spanish Ministry HISPAMUS project TIN2017-86576-R, partially funded by the EU

    Weekday and weekend days correlates of sedentary time and screen-Based behaviors in children

    Get PDF
    The aim of this study was to compare weekday and weekend day correlates of sedentary time, as well as some specific screen-based behaviors, in a sample of 213 Spanish six to eleven year-olds (8.68 +/- 1.75 years), 76 boys (8.79 +/- 1.75 years) and 137 girls (8.73 +/- 1.75 years), who wore GT3X accelerometers for 7 days. Screen-based behaviors were reported by parents through questionnaires. Different potential correlates of sedentary time and screen-based behaviors were measured, and data were analyzed using general univariate linear models and multiple regression analysis. Results revealed high levels of screen-based behaviors, both during weekdays and weekend days. From the different significant correlates for each screen-based behavior analyzed, gender, age, hours of extracurricular PA, children''s MVPA and having a TV in the bedroom were identified as the main correlates in most of the behaviors analyzed. The design of multicomponent intervention programs seems advisable. El objetivo de este estudio fue comparar diferentes determinantes de tiempo sedentario en días entre semana y fines de semana, así como determinados comportamientos de consumo de pantallas, en una muestra de 213 niños de 6 a 11 (8.68 ±1.75) años, 76 chicos (8.79 ±1.75) y 137 chicas (8.73±1.75). El tiempo sedentario fue determinado mediante acelerómetros GT3X, que los sujetos llevaron durante 7 días. Los diferentes comportamientos de consumo de pantallas fueron reportados por los padres mediante cuestionarios. Se midieron diferentes potenciales determinantes de tiempo sedentario y consumo de pantallas, y los datos fueron analizados mediante modelos lineales univariantes y análisis de regresión múltiple. Los resultados revelaron altos niveles de consumo de pantallas, tanto en días entre semana como en fines de semana. De los diferentes determinantes para cada tipo de comportamiento de consumo de pantallas analizados, el género, la edad, las horas de AF extracurricular, la AFMV de los niños y tener una TV en el dormitorio, fueron identificados como los principales. El diseño de programas multicomponentes de intervención parece recomendable

    Improving kNN multi-label classification in Prototype Selection scenarios using class proposals

    Get PDF
    Prototype Selection (PS) algorithms allow a faster Nearest Neighbor classification by keeping only the most profitable prototypes of the training set. In turn, these schemes typically lower the performance accuracy. In this work a new strategy for multi-label classifications tasks is proposed to solve this accuracy drop without the need of using all the training set. For that, given a new instance, the PS algorithm is used as a fast recommender system which retrieves the most likely classes. Then, the actual classification is performed only considering the prototypes from the initial training set belonging to the suggested classes. Results show that this strategy provides a large set of trade-off solutions which fills the gap between PS-based classification efficiency and conventional kNN accuracy. Furthermore, this scheme is not only able to, at best, reach the performance of conventional kNN with barely a third of distances computed, but it does also outperform the latter in noisy scenarios, proving to be a much more robust approach.This work was partially supported by the Spanish Ministerio de Educación, Cultura y Deporte through FPU Fellowship (AP2012–0939), the Spanish Ministerio de Economía y Competitividad through Project TIMuL (TIN2013-48152-C2-1-R), Consejería de Educación de la Comunidad Valenciana through Project PROMETEO/2012/017 and Vicerrectorado de Investigación, Desarrollo e Innovación de la Universidad de Alicante through FPU Program (UAFPU2014–5883)

    Domain adaptation for staff-region retrieval of music score images

    Get PDF
    Optical music recognition (OMR) is the field that studies how to automatically read music notation from score images. One of the relevant steps within the OMR workflow is the staff-region retrieval. This process is a key step because any undetected staff will not be processed by the subsequent steps. This task has previously been addressed as a supervised learning problem in the literature; however, ground-truth data are not always available, so each new manuscript requires a preliminary manual annotation. This situation is one of the main bottlenecks in OMR, because of the countless number of existing manuscripts , and the associated manual labeling cost. With the aim of mitigating this issue, we propose the application of a domain adaptation technique, the so-called Domain-Adversarial Neural Network (DANN), based on a combination of a gradient reversal layer and a domain classifier in the inference neural architecture. The results from our experiments support the benefits of our proposed solution, obtaining improvements of approximately 29% in the F-score.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This paper is part of the I+D+i PID2020-118447RA-I00 (MultiScore) project funded by MCIN/AEI/10.13039/501100011033. The first author acknowledges support from the “Programa I+D+i de la Generalitat Valenciana” through grants ACIF/2019/042 and CIBEFP/2021/72. This work also draws on research supported by the Social Sciences and Humanities Research Council (895-2013-1012) and the Fonds de recherche du Québec-Société et Culture (2022-SE3-303927)

    Region-based layout analysis of music score images

    Get PDF
    The Layout Analysis (LA) stage is of vital importance to the correct performance of an Optical Music Recognition (OMR) system. It identifies the regions of interest, such as staves or lyrics, which must then be processed in order to transcribe their content. Despite the existence of modern approaches based on deep learning, an exhaustive study of LA in OMR has not yet been carried out with regard to the performance of different models, their generalization to different domains or, more importantly, their impact on subsequent stages of the pipeline. This work focuses on filling this gap in the literature by means of an experimental study of different neural architectures, music document types, and evaluation scenarios. The need for training data has also led to a proposal for a new semi-synthetic data-generation technique that enables the efficient applicability of LA approaches in real scenarios. Our results show that: (i) the choice of the model and its performance are crucial for the entire transcription process; (ii) the metrics commonly used to evaluate the LA stage do not always correlate with the final performance of the OMR system, and (iii) the proposed data-generation technique enables state-of-the-art results to be achieved with a limited set of labeled data.This paper is part of the I+D+i PID2020-118447RA-I00 (MultiScore) project funded by MCIN/AEI/10.13039/501100011033, Spain and the GV/2020/030, Spain project funded by the Generalitat Valenciana, Spain. The first and third authors acknowledge support from the “Programa I+D+i de la Generalitat Valenciana, Spain ” through grants ACIF/2019/042 and ACIF/2021/356, respectively

    Physical activity levels during unstructured recess in Spanish primary and secondary schools

    Get PDF
    Introduction. The goals of this study were: a) to describe sedentary time and different physical activity (PA) intensities during school recess; b) to analyze sex and education level differences; c) to describe compliance with recommended guidelines for recess and; d) to determine the contribution of unstructured recess to PA guidelines. Material and Methods. Two subsamples from Spain participated: one of primary school students (114 girls, 8.77 +/- 1.74 years and 59 boys, 8.47 +/- 1.71 years), and one of secondary school students (100 girls, 12.16 +/- 0.49 years and 116 boys, 12.15 +/- 0.52 years). PA was quantified by accelerometers. Results and Discussion. Significant sex and education level effect was found over the combination of different percentages of PA intensities. All PA intensities except sedentary and light, showed higher values in primary education students. Boys reported higher values in MVPA both in primary and secondary. It was found a significant effect of sex and education level on the contribution of recess to PA guidelines. Conclusions. Interventions should be carried out to encourage PA during recess, especially for girls and secondary school students. Introducción. Los objetivos de este estudio fueron: a) describir el tiempo sedentario y diferentes niveles de intensidad de actividad física (AF) durante los recreos escolares; b) analizar las diferencias según el sexo y el nivel educativo; c) determinar el cumplimiento con las recomendaciones internacionales de práctica de AF para los periodos de recreo; d) definir la contribución de los recreos a las recomendaciones diarias de actividad física. Materiales y Métodos. Participaron dos muestras de alumnos: una de educación primaria (114 chicas, 8.77±1.74 años y 59 chicos, 8.47±1.71 años), y una de educación secundaria (100 chicas, 12.16±0.49 años y 116 chicos, 12.15±0.52 años). Se utilizaron acelerómetros para analizar los niveles de AF. Resultados y discusión. Los resultados mostraron un efecto significativo del género y el nivel educativo en las diferentes intensidades de actividad física. Todas las intensidades, excepto tiempo sedentario y actividad ligera, mostraron valores mayores en alumnos de educación primaria. Los chicos mostraron valores mayores en la actividad física moderada-vigorosa (AFMV). Se encontró también un efecto significativo del sexo y el nivel educativo en la contribución del recreo al cumplimiento de las recomendaciones de AF. Conclusiones. Deberían desarrollarse intervenciones para fomentar la AF durante los recreos, especialmente para chicas y alumnos de educación secundaria

    Late multimodal fusion for image and audio music transcription

    Get PDF
    Music transcription, which deals with the conversion of music sources into a structured digital format, is a key problem for Music Information Retrieval (MIR). When addressing this challenge in computational terms, the MIR community follows two lines of research: music documents, which is the case of Optical Music Recognition (OMR), or audio recordings, which is the case of Automatic Music Transcription (AMT). The different nature of the aforementioned input data has conditioned these fields to develop modality-specific frameworks. However, their recent definition in terms of sequence labeling tasks leads to a common output representation, which enables research on a combined paradigm. In this respect, multimodal image and audio music transcription comprises the challenge of effectively combining the information conveyed by image and audio modalities. In this work, we explore this question at a late-fusion level: we study four combination approaches in order to merge, for the first time, the hypotheses regarding end-to-end OMR and AMT systems in a lattice-based search space. The results obtained for a series of performance scenarios–in which the corresponding single-modality models yield different error rates–showed interesting benefits of these approaches. In addition, two of the four strategies considered significantly improve the corresponding unimodal standard recognition frameworks.This paper is part of the I+D+i PID2020-118447RA-I00 (MultiScore) project, funded by MCIN/AEI/10.13039/501100011033. Some of the computing resources were provided by the Generalitat Valenciana and the European Union through the FEDER funding programme (IDIFEDER/2020/003). The first and second authors are respectively supported by grants FPU19/04957 from the Spanish Ministerio de Universidades and APOSTD/2020/256 from Generalitat Valenciana

    Few-Shot Symbol Classification via Self-Supervised Learning and Nearest Neighbor

    Get PDF
    The recognition of symbols within document images is one of the most relevant steps involved in the Document Analysis field. While current state-of-the-art methods based on Deep Learning are capable of adequately performing this task, they generally require a vast amount of data that has to be manually labeled. In this paper, we propose a self-supervised learning-based method that addresses this task by training a neural-based feature extractor with a set of unlabeled documents and performs the recognition task considering just a few reference samples. Experiments on different corpora comprising music, text, and symbol documents report that the proposal is capable of adequately tackling the task with high accuracy rates of up to 95% in few-shot settings. Moreover, results show that the presented strategy outperforms the base supervised learning approaches trained with the same amount of data that, in some cases, even fail to converge. This approach, hence, stands as a lightweight alternative to deal with symbol classification with few annotated data.This paper is part of the project I+D+i PID2020-118447RA-I00 (MultiScore), funded by MCIN/AEI/10.13039/501100011033. The first author is supported by grant FPU19/04957 from the Spanish Ministerio de Universidades. The second and third authors are respectively supported by grants ACIF/2021/356 and APOSTD/2020/256 from “Programa I+D+i de la Generalitat Valenciana”

    Multimodal recognition of frustration during game-play with deep neural networks

    Get PDF
    Frustration, which is one aspect of the field of emotional recognition, is of particular interest to the video game industry as it provides information concerning each individual player’s level of engagement. The use of non-invasive strategies to estimate this emotion is, therefore, a relevant line of research with a direct application to real-world scenarios. While several proposals regarding the performance of non-invasive frustration recognition can be found in literature, they usually rely on hand-crafted features and rarely exploit the potential inherent to the combination of different sources of information. This work, therefore, presents a new approach that automatically extracts meaningful descriptors from individual audio and video sources of information using Deep Neural Networks (DNN) in order to then combine them, with the objective of detecting frustration in Game-Play scenarios. More precisely, two fusion modalities, namely decision-level and feature-level, are presented and compared with state-of-the-art methods, along with different DNN architectures optimized for each type of data. Experiments performed with a real-world audiovisual benchmarking corpus revealed that the multimodal proposals introduced herein are more suitable than those of a unimodal nature, and that their performance also surpasses that of other state-of-the–art approaches, with error rate improvements of between 40% and 90%.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. The first author acknowledges the support from the Spanish “Ministerio de Educación y Formación Profesional” through grant 20CO1/000966. The second and third authors acknowledge support from the “Programa I+D+i de la Generalitat Valenciana” through grants ACIF/2019/042 and APOSTD/2020/256, respectively
    corecore