14 research outputs found
A Fully Convolutional Deep Auditory Model for Musical Chord Recognition
Chord recognition systems depend on robust feature extraction pipelines.
While these pipelines are traditionally hand-crafted, recent advances in
end-to-end machine learning have begun to inspire researchers to explore
data-driven methods for such tasks. In this paper, we present a chord
recognition system that uses a fully convolutional deep auditory model for
feature extraction. The extracted features are processed by a Conditional
Random Field that decodes the final chord sequence. Both processing stages are
trained automatically and do not require expert knowledge for optimising
parameters. We show that the learned auditory system extracts musically
interpretable features, and that the proposed chord recognition system achieves
results on par or better than state-of-the-art algorithms.Comment: In Proceedings of the 2016 IEEE 26th International Workshop on
Machine Learning for Signal Processing (MLSP), Vietro sul Mare, Ital
End-to-End Cross-Modality Retrieval with CCA Projections and Pairwise Ranking Loss
Cross-modality retrieval encompasses retrieval tasks where the fetched items
are of a different type than the search query, e.g., retrieving pictures
relevant to a given text query. The state-of-the-art approach to cross-modality
retrieval relies on learning a joint embedding space of the two modalities,
where items from either modality are retrieved using nearest-neighbor search.
In this work, we introduce a neural network layer based on Canonical
Correlation Analysis (CCA) that learns better embedding spaces by analytically
computing projections that maximize correlation. In contrast to previous
approaches, the CCA Layer (CCAL) allows us to combine existing objectives for
embedding space learning, such as pairwise ranking losses, with the optimal
projections of CCA. We show the effectiveness of our approach for
cross-modality retrieval on three different scenarios (text-to-image,
audio-sheet-music and zero-shot retrieval), surpassing both Deep CCA and a
multi-view network using freely learned projections optimized by a pairwise
ranking loss, especially when little training data is available (the code for
all three methods is released at: https://github.com/CPJKU/cca_layer).Comment: Preliminary version of a paper published in the International Journal
of Multimedia Information Retrieva
Supervised and Unsupervised Learning of Audio Representations for Music Understanding
In this work, we provide a broad comparative analysis of strategies for
pre-training audio understanding models for several tasks in the music domain,
including labelling of genre, era, origin, mood, instrumentation, key, pitch,
vocal characteristics, tempo and sonority. Specifically, we explore how the
domain of pre-training datasets (music or generic audio) and the pre-training
methodology (supervised or unsupervised) affects the adequacy of the resulting
audio embeddings for downstream tasks.
We show that models trained via supervised learning on large-scale
expert-annotated music datasets achieve state-of-the-art performance in a wide
range of music labelling tasks, each with novel content and vocabularies. This
can be done in an efficient manner with models containing less than 100 million
parameters that require no fine-tuning or reparameterization for downstream
tasks, making this approach practical for industry-scale audio catalogs.
Within the class of unsupervised learning strategies, we show that the domain
of the training dataset can significantly impact the performance of
representations learned by the model. We find that restricting the domain of
the pre-training dataset to music allows for training with smaller batch sizes
while achieving state-of-the-art in unsupervised learning -- and in some cases,
supervised learning -- for music understanding.
We also corroborate that, while achieving state-of-the-art performance on
many tasks, supervised learning can cause models to specialize to the
supervised information provided, somewhat compromising a model's generality
Burnout, neurotic symptoms and coping strategies in medical students
the early stages of a medical career - as early as in medical college. Medical studies are considered one of the most stressful majors, leading to early burnout and other related symptoms such as neurotic symptoms. Our aim was to examine this topic by assessing burnout and neurotic symptoms as well as strategies of coping with stress experienced during each year of studies. Method: We used a web-based questionnaire, consisting of the Maslach Burnout Inventory-Student Survey (MBI-SS), Coping Inventory for Stressful Situations (CISS) and Symptom Checklist S-III, and invited medical students at various stages of a 6-year medical course to fill it in online. Questionnaire was filled by 781 students in total. Results: Statistical analysis revealed an interesting pattern of symptoms severity in students, with highest scores at the beginning and at the end of the medical course and the lowest score during the 3rd year of studies. This pattern was clearly visible for MBI-SS Exhaustion, and somewhat less pronounced for MBI-SS Cynicism
and S-III scores, where only the decrease of symptoms was significant. Coping strategies seemed to be similar for all medical students with a higher score for the Distraction scale among the 3rd - year students compared with the 2nd-year students. Discussion: These results, however unexpected, seem to be consistent with available literature, emphasizing higher levels of stress experienced during great changes regarding expectations in students at the beginning of their course and in soon-to-be doctors.
Conclusions: The results prompt to reflect on ways of countering emerging symptoms of burnout not only in experienced students, but also among those starting medical college
ChoreoNet: Towards Music to Dance Synthesis with Choreographic Action Unit
Dance and music are two highly correlated artistic forms. Synthesizing dance
motions has attracted much attention recently. Most previous works conduct
music-to-dance synthesis via directly music to human skeleton keypoints
mapping. Meanwhile, human choreographers design dance motions from music in a
two-stage manner: they firstly devise multiple choreographic dance units
(CAUs), each with a series of dance motions, and then arrange the CAU sequence
according to the rhythm, melody and emotion of the music. Inspired by these, we
systematically study such two-stage choreography approach and construct a
dataset to incorporate such choreography knowledge. Based on the constructed
dataset, we design a two-stage music-to-dance synthesis framework ChoreoNet to
imitate human choreography procedure. Our framework firstly devises a CAU
prediction model to learn the mapping relationship between music and CAU
sequences. Afterwards, we devise a spatial-temporal inpainting model to convert
the CAU sequence into continuous dance motions. Experimental results
demonstrate that the proposed ChoreoNet outperforms baseline methods (0.622 in
terms of CAU BLEU score and 1.59 in terms of user study score).Comment: 10 pages, 5 figures, Accepted by ACM MM 202