14 research outputs found

    A Fully Convolutional Deep Auditory Model for Musical Chord Recognition

    Full text link
    Chord recognition systems depend on robust feature extraction pipelines. While these pipelines are traditionally hand-crafted, recent advances in end-to-end machine learning have begun to inspire researchers to explore data-driven methods for such tasks. In this paper, we present a chord recognition system that uses a fully convolutional deep auditory model for feature extraction. The extracted features are processed by a Conditional Random Field that decodes the final chord sequence. Both processing stages are trained automatically and do not require expert knowledge for optimising parameters. We show that the learned auditory system extracts musically interpretable features, and that the proposed chord recognition system achieves results on par or better than state-of-the-art algorithms.Comment: In Proceedings of the 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), Vietro sul Mare, Ital

    End-to-End Cross-Modality Retrieval with CCA Projections and Pairwise Ranking Loss

    Full text link
    Cross-modality retrieval encompasses retrieval tasks where the fetched items are of a different type than the search query, e.g., retrieving pictures relevant to a given text query. The state-of-the-art approach to cross-modality retrieval relies on learning a joint embedding space of the two modalities, where items from either modality are retrieved using nearest-neighbor search. In this work, we introduce a neural network layer based on Canonical Correlation Analysis (CCA) that learns better embedding spaces by analytically computing projections that maximize correlation. In contrast to previous approaches, the CCA Layer (CCAL) allows us to combine existing objectives for embedding space learning, such as pairwise ranking losses, with the optimal projections of CCA. We show the effectiveness of our approach for cross-modality retrieval on three different scenarios (text-to-image, audio-sheet-music and zero-shot retrieval), surpassing both Deep CCA and a multi-view network using freely learned projections optimized by a pairwise ranking loss, especially when little training data is available (the code for all three methods is released at: https://github.com/CPJKU/cca_layer).Comment: Preliminary version of a paper published in the International Journal of Multimedia Information Retrieva

    Supervised and Unsupervised Learning of Audio Representations for Music Understanding

    Full text link
    In this work, we provide a broad comparative analysis of strategies for pre-training audio understanding models for several tasks in the music domain, including labelling of genre, era, origin, mood, instrumentation, key, pitch, vocal characteristics, tempo and sonority. Specifically, we explore how the domain of pre-training datasets (music or generic audio) and the pre-training methodology (supervised or unsupervised) affects the adequacy of the resulting audio embeddings for downstream tasks. We show that models trained via supervised learning on large-scale expert-annotated music datasets achieve state-of-the-art performance in a wide range of music labelling tasks, each with novel content and vocabularies. This can be done in an efficient manner with models containing less than 100 million parameters that require no fine-tuning or reparameterization for downstream tasks, making this approach practical for industry-scale audio catalogs. Within the class of unsupervised learning strategies, we show that the domain of the training dataset can significantly impact the performance of representations learned by the model. We find that restricting the domain of the pre-training dataset to music allows for training with smaller batch sizes while achieving state-of-the-art in unsupervised learning -- and in some cases, supervised learning -- for music understanding. We also corroborate that, while achieving state-of-the-art performance on many tasks, supervised learning can cause models to specialize to the supervised information provided, somewhat compromising a model's generality

    Burnout, neurotic symptoms and coping strategies in medical students

    Get PDF
    the early stages of a medical career - as early as in medical college. Medical studies are considered one of the most stressful majors, leading to early burnout and other related symptoms such as neurotic symptoms. Our aim was to examine this topic by assessing burnout and neurotic symptoms as well as strategies of coping with stress experienced during each year of studies. Method: We used a web-based questionnaire, consisting of the Maslach Burnout Inventory-Student Survey (MBI-SS), Coping Inventory for Stressful Situations (CISS) and Symptom Checklist S-III, and invited medical students at various stages of a 6-year medical course to fill it in online. Questionnaire was filled by 781 students in total. Results: Statistical analysis revealed an interesting pattern of symptoms severity in students, with highest scores at the beginning and at the end of the medical course and the lowest score during the 3rd year of studies. This pattern was clearly visible for MBI-SS Exhaustion, and somewhat less pronounced for MBI-SS Cynicism and S-III scores, where only the decrease of symptoms was significant. Coping strategies seemed to be similar for all medical students with a higher score for the Distraction scale among the 3rd - year students compared with the 2nd-year students. Discussion: These results, however unexpected, seem to be consistent with available literature, emphasizing higher levels of stress experienced during great changes regarding expectations in students at the beginning of their course and in soon-to-be doctors. Conclusions: The results prompt to reflect on ways of countering emerging symptoms of burnout not only in experienced students, but also among those starting medical college

    ChoreoNet: Towards Music to Dance Synthesis with Choreographic Action Unit

    Full text link
    Dance and music are two highly correlated artistic forms. Synthesizing dance motions has attracted much attention recently. Most previous works conduct music-to-dance synthesis via directly music to human skeleton keypoints mapping. Meanwhile, human choreographers design dance motions from music in a two-stage manner: they firstly devise multiple choreographic dance units (CAUs), each with a series of dance motions, and then arrange the CAU sequence according to the rhythm, melody and emotion of the music. Inspired by these, we systematically study such two-stage choreography approach and construct a dataset to incorporate such choreography knowledge. Based on the constructed dataset, we design a two-stage music-to-dance synthesis framework ChoreoNet to imitate human choreography procedure. Our framework firstly devises a CAU prediction model to learn the mapping relationship between music and CAU sequences. Afterwards, we devise a spatial-temporal inpainting model to convert the CAU sequence into continuous dance motions. Experimental results demonstrate that the proposed ChoreoNet outperforms baseline methods (0.622 in terms of CAU BLEU score and 1.59 in terms of user study score).Comment: 10 pages, 5 figures, Accepted by ACM MM 202
    corecore