4,617 research outputs found
Genome-wide association study of musical beat synchronization demonstrates high polygenicity
Moving in synchrony to the beat is a fundamental component of musicality. Here we conducted a genome-wide association study to identify common genetic variants associated with beat synchronization in 606,825 individuals. Beat synchronization exhibited a highly polygenic architecture, with 69 loci reaching genome-wide significance (Pβ<β5βΓβ10β8) and single-nucleotide-polymorphism-based heritability (on the liability scale) of 13%β16%. Heritability was enriched for genes expressed in brain tissues and for fetal and adult brain-specific gene regulatory elements, underscoring the role of central-nervous-system-expressed genes linked to the genetic basis of the trait. We performed validations of the self-report phenotype (through separate experiments) and of the genome-wide association study (polygenic scores for beat synchronization were associated with patients algorithmically classified as musicians in medical records of a separate biobank). Genetic correlations with breathing function, motor function, processing speed and chronotype suggest shared genetic architecture with beat synchronization and provide avenues for new phenotypic and genetic explorations
Got rhythm? Better inhibitory control is linked with more consistent drumming and enhanced neural tracking of the musical beat in adult percussionists and nonpercussionists
Musical rhythm engages motor and reward circuitry that is important for cognitive control, and there is evidence for enhanced inhibitory control in musicians. We recently revealed an inhibitory control advantage in percussionists compared with vocalists, highlighting the potential importance of rhythmic expertise in mediating this advantage. Previous research has shown that better inhibitory control is associated with less variable performance in simple sensorimotor synchronization tasks; however, this relationship has not been examined through the lens of rhythmic expertise. We hypothesize that the development of rhythm skills strengthens inhibitory control in two ways: by fine-tuning motor networks through the precise coordination of movements βin timeβ and by activating reward-based mechanisms, such as predictive processing and conflict monitoring, which are involved in tracking temporal structure in music. Here, we assess adult percussionists and nonpercussionists on inhibitory control, selective attention, basic drumming skills (self-paced, paced, and continuation drumming), and cortical evoked responses to an auditory stimulus presented on versus off the beat of music. Consistent with our hypotheses, we find that better inhibitory control is correlated with more consistent drumming and enhanced neural tracking of the musical beat. Drumming variability and the neural index of beat alignment each contribute unique predictive power to a regression model, explaining 57% of variance in inhibitory control. These outcomes present the first evidence that enhanced inhibitory control in musicians may be mediated by rhythmic expertise and provide a foundation for future research investigating the potential for rhythm-based training to strengthen cognitive function
βWhatβ and βwhenβ predictions modulate auditory processing in a mutually congruent manner
Introduction: Extracting regularities from ongoing stimulus streams to form predictions is crucial for adaptive behavior. Such regularities exist in terms of the content of the stimuli and their timing, both of which are known to interactively modulate sensory processing. In real-world stimulus streams such as music, regularities can occur at multiple levels, both in terms of contents (e.g., predictions relating to individual notes vs. their more complex groups) and timing (e.g., pertaining to timing between intervals vs. the overall beat of a musical phrase). However, it is unknown whether the brain integrates predictions in a manner that is mutually congruent (e.g., if βbeatβ timing predictions selectively interact with βwhatβ predictions falling on pulses which define the beat), and whether integrating predictions in different timing conditions relies on dissociable neural correlates.
Methods: To address these questions, our study manipulated βwhatβ and βwhenβ predictions at different levels β (local) interval-defining and (global) beat-defining β within the same stimulus stream, while neural activity was recorded using electroencephalogram (EEG) in participants (N =β20) performing a repetition detection task.
Results: Our results reveal that temporal predictions based on beat or interval timing modulated mismatch responses to violations of βwhatβ predictions happening at the predicted time points, and that these modulations were shared between types of temporal predictions in terms of the spatiotemporal distribution of EEG signals. Effective connectivity analysis using dynamic causal modeling showed that the integration of βwhatβ and βwhenβ predictions selectively increased connectivity at relatively late cortical processing stages, between the superior temporal gyrus and the fronto-parietal network.
Discussion: Taken together, these results suggest that the brain integrates different predictions with a high degree of mutual congruence, but in a shared and distributed cortical network. This finding contrasts with recent studies indicating separable mechanisms for beat-based and memory-based predictive processing
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Self-supervised learning (SSL) has recently emerged as a promising paradigm
for training generalisable models on large-scale data in the fields of vision,
text, and speech. Although SSL has been proven effective in speech and audio,
its application to music audio has yet to be thoroughly explored. This is
primarily due to the distinctive challenges associated with modelling musical
knowledge, particularly its tonal and pitched characteristics of music. To
address this research gap, we propose an acoustic Music undERstanding model
with large-scale self-supervised Training (MERT), which incorporates teacher
models to provide pseudo labels in the masked language modelling (MLM) style
acoustic pre-training. In our exploration, we identified a superior combination
of teacher models, which outperforms conventional speech and audio approaches
in terms of performance. This combination includes an acoustic teacher based on
Residual Vector Quantization - Variational AutoEncoder (RVQ-VAE) and a musical
teacher based on the Constant-Q Transform (CQT). These teachers effectively
guide our student model, a BERT-style transformer encoder, to better model
music audio. In addition, we introduce an in-batch noise mixture augmentation
to enhance the representation robustness. Furthermore, we explore a wide range
of settings to overcome the instability in acoustic language model
pre-training, which allows our designed paradigm to scale from 95M to 330M
parameters. Experimental results indicate that our model can generalise and
perform well on 14 music understanding tasks and attains state-of-the-art
(SOTA) overall scores. The code and models are online:
https://github.com/yizhilll/MERT
Brain networks for temporal adaptation, anticipation, and sensory-motor integration in rhythmic human behavior
Human interaction often requires the precise yet flexible interpersonal coordination of rhythmic behavior, as in group music making. The present fMRI study investigates the functional brain networks that may facilitate such behavior by enabling temporal adaptation (error correction), prediction, and the monitoring and integration of information about βselfβ and the external environment. Participants were required to synchronize finger taps with computer-controlled auditory sequences that were presented either at a globally steady tempo with local adaptations to the participants' tap timing (Virtual Partner task) or with gradual tempo accelerations and decelerations but without adaptation (Tempo Change task). Connectome-based predictive modelling was used to examine patterns of brain functional connectivity related to individual differences in behavioral performance and parameter estimates from the adaptation and anticipation model (ADAM) of sensorimotor synchronization for these two tasks under conditions of varying cognitive load. Results revealed distinct but overlapping brain networks associated with ADAM-derived estimates of temporal adaptation, anticipation, and the integration of self-controlled and externally controlled processes across task conditions. The partial overlap between ADAM networks suggests common hub regions that modulate functional connectivity within and between the brain's resting-state networks and additional sensory-motor regions and subcortical structures in a manner reflecting coordination skill. Such network reconfiguration might facilitate sensorimotor synchronization by enabling shifts in focus on internal and external information, and, in social contexts requiring interpersonal coordination, variations in the degree of simultaneous integration and segregation of these information sources in internal models that support self, other, and joint action planning and prediction
μ¬μΈ΅ μ κ²½λ§ κΈ°λ°μ μμ 리λ μνΈ μλ μ±λ³΄ λ° λ©λ‘λ μ μ¬λ νκ°
νμλ
Όλ¬Έ(λ°μ¬) -- μμΈλνκ΅λνμ : 곡과λν μ°μ
곡νκ³Ό, 2023. 2. μ΄κ²½μ.Since the composition, arrangement, and distribution of music became convenient thanks to the digitization of the music industry, the number of newly supplied music recordings is increasing. Recently, due to platform environments being established whereby anyone can become a creator, user-created music such as their songs, cover songs, and remixes is being distributed through YouTube and TikTok. With such a large volume of musical recordings, the demand to transcribe music into sheet music has always existed for musicians.
However, it requires musical knowledge and is time-consuming.
This thesis studies automatic lead sheet transcription using deep neural networks. The development of transcription artificial intelligence (AI) can greatly reduce the time and cost for people in the music industry to find or transcribe sheet music. In addition, since the conversion from music sources to the form of digital music is possible, the applications could be expanded, such as music plagiarism detection and music composition AI.
The thesis first proposes a model recognizing chords from audio signals. Chord recognition is an important task in music information retrieval since chords are highly abstract and descriptive features of music. We utilize a self-attention mechanism for chord recognition to focus on certain regions of chords. Through an attention map analysis, we visualize how attention is performed. It turns out that the model is able to divide segments of chords by utilizing the adaptive receptive field of the attention mechanism.
This thesis proposes a note-level singing melody transcription model using sequence-to-sequence transformers. Overlapping decoding is introduced to solve the problem of the context between segments being broken. Applying pitch augmentation and adding a noisy dataset with data cleansing turns out to be effective in preventing overfitting and generalizing the model performance. Ablation studies demonstrate the effects of the proposed techniques in note-level singing melody transcription, both quantitatively and qualitatively. The proposed model outperforms other models in note-level singing melody transcription performance for all the metrics considered. Finally, subjective human evaluation demonstrates that the results of the proposed models are perceived as more accurate than the results of a previous study.
Utilizing the above research results, we introduce the entire process of an automatic music lead sheet transcription. By combining various music information recognized from audio signals, we show that it is possible to transcribe lead sheets that express the core of popular music. Furthermore, we compare the results with lead sheets transcribed by musicians.
Finally, we propose a melody similarity assessment method based on self-supervised learning by applying the automatic lead sheet transcription. We present convolutional neural networks that express the melody of lead sheet transcription results in embedding space. To apply self-supervised learning, we introduce methods of generating training data by musical data augmentation techniques. Furthermore, a loss function is presented to utilize the training data. Experimental results demonstrate that the proposed model is able to detect similar melodies of popular music from plagiarism and cover song cases.μμ
μ°μ
μ λμ§νΈνλ₯Ό ν΅ν΄ μμ
μ μ곑, νΈκ³‘ λ° μ ν΅μ΄ νΈλ¦¬ν΄μ‘κΈ° λλ¬Έμ μλ‘κ² κ³΅κΈλλ μμμ μκ° μ¦κ°νκ³ μλ€. μ΅κ·Όμλ λꡬλ ν¬λ¦¬μμ΄ν°κ° λ μ μλ νλ«νΌ νκ²½μ΄ κ΅¬μΆλμ΄, μ¬μ©μκ° λ§λ μμ곑, 컀λ²κ³‘, λ¦¬λ―Ήμ€ λ±μ΄ μ νλΈ, ν±ν‘μ ν΅ν΄ μ ν΅λκ³ μλ€. μ΄λ κ² λ§μ μμ μμ
μ λν΄, μμ
μ μ
λ³΄λ‘ μ±λ³΄νκ³ μ νλ μμλ μμ
κ°λ€μκ² νμ μ‘΄μ¬νλ€. κ·Έλ¬λ μ
보 μ±λ³΄μλ μμ
μ μ§μμ΄ νμνκ³ , μκ°κ³Ό λΉμ©μ΄ λ§μ΄ μμλλ€λ λ¬Έμ μ μ΄ μλ€.
λ³Έ λ
Όλ¬Έμμλ μ¬μΈ΅ μ κ²½λ§μ νμ©νμ¬ μμ
리λ μνΈ μ
보 μλ μ±λ³΄ κΈ°λ²μ μ°κ΅¬νλ€. μ±λ³΄ μΈκ³΅μ§λ₯μ κ°λ°μ μμ
μ’
μ¬μ λ° μ°μ£Όμλ€μ΄ μ
보λ₯Ό ꡬνκ±°λ λ§λ€κΈ° μν΄ μλͺ¨νλ μκ°κ³Ό λΉμ©μ ν¬κ² μ€μ¬ μ€ μ μλ€. λν μμμμ λμ§νΈ μ
보 ννλ‘ λ³νμ΄ κ°λ₯ν΄μ§λ―λ‘, μλ νμ νμ§, μ곑 μΈκ³΅μ§λ₯ νμ΅ λ± λ€μνκ² νμ©μ΄ κ°λ₯νλ€.
리λ μνΈ μ±λ³΄λ₯Ό μν΄, λ¨Όμ μ€λμ€ μ νΈλ‘λΆν° μ½λλ₯Ό μΈμνλ λͺ¨λΈμ μ μνλ€. μμ
μμ μ½λλ ν¨μΆμ μ΄κ³ ννμ μΈ μμ
μ μ€μν νΉμ§μ΄λ―λ‘ μ΄λ₯Ό μΈμνλ κ²μ λ§€μ° μ€μνλ€. μ½λ κ΅¬κ° μΈμμ μν΄, μ΄ν
μ
맀컀λμ¦μ μ΄μ©νλ νΈλμ€ν¬λ¨Έ κΈ°λ° λͺ¨λΈμ μ μνλ€. μ΄ν
μ
μ§λ λΆμμ ν΅ν΄, μ΄ν
μ
μ΄ μ€μ λ‘ μ΄λ»κ² μ μ©λλμ§ μκ°ννκ³ , λͺ¨λΈμ΄ μ½λμ ꡬκ°μ λλκ³ μΈμνλ κ³Όμ μ μ΄ν΄λ³Έλ€.
κ·Έλ¦¬κ³ μνμ€ ν¬ μνμ€ νΈλμ€ν¬λ¨Έλ₯Ό μ΄μ©ν μν μμ€μ κ°μ°½ λ©λ‘λ μ±λ³΄ λͺ¨λΈμ μ μνλ€. λμ½λ© κ³Όμ μμ κ° κ΅¬κ° μ¬μ΄μ λ¬Έλ§₯ μ λ³΄κ° λ¨μ λλ λ¬Έμ λ₯Ό ν΄κ²°νκΈ° μν΄ μ€μ²© λμ½λ©μ λμ
νλ€. λ°μ΄ν° λ³ν κΈ°λ²μΌλ‘ μλμ΄ λ³νμ μ μ©νλ λ°©λ²κ³Ό λ°μ΄ν° ν΄λ μ§μ ν΅ν΄ νμ΅ λ°μ΄ν°λ₯Ό μΆκ°νλ λ°©λ²μ μκ°νλ€. μ λ λ° μ μ±μ μΈ λΉκ΅λ₯Ό ν΅ν΄ μ μν κΈ°λ²λ€μ΄ μ±λ₯ κ°μ μ λμμ΄ λλ κ²μ νμΈνμκ³ , μ μλͺ¨λΈμ΄ MIR-ST500 λ°μ΄ν° μ
μ λν μν μμ€μ κ°μ°½ λ©λ‘λ μ±λ³΄ μ±λ₯μμ κ°μ₯ μ°μν μ±λ₯μ 보μλ€. μΆκ°λ‘ μ£Όκ΄μ μΈ μ¬λμ νκ°μμ μ μ λͺ¨λΈμ μ±λ³΄ κ²°κ³Όκ° μ΄μ λͺ¨λΈλ³΄λ€ μ μ ννλ€κ³ μΈμλ¨μ νμΈνμλ€.
μμ μ°κ΅¬μ κ²°κ³Όλ₯Ό νμ©νμ¬, μμ
리λ μνΈ μλ μ±λ³΄μ μ 체 κ³Όμ μ μ μνλ€. μ€λμ€ μ νΈλ‘λΆν° μΈμν λ€μν μμ
μ 보λ₯Ό μ’
ν©νμ¬, λμ€ μμ
μ€λμ€ μ νΈμ ν΅μ¬μ νννλ 리λ μνΈ μ
보 μ±λ³΄κ° κ°λ₯ν¨μ 보μΈλ€. κ·Έλ¦¬κ³ μ΄λ₯Ό μ λ¬Έκ°κ° μ μν 리λμνΈμ λΉκ΅νμ¬ λΆμνλ€.
λ§μ§λ§μΌλ‘ 리λ μνΈ μ
보 μλ μ±λ³΄ κΈ°λ²μ μμ©νμ¬, μκΈ° μ§λ νμ΅ κΈ°λ° λ©λ‘λ μ μ¬λ νκ° λ°©λ²μ μ μνλ€. 리λ μνΈ μ±λ³΄ κ²°κ³Όμ λ©λ‘λλ₯Ό μλ² λ© κ³΅κ°μ νννλ ν©μ±κ³± μ κ²½λ§ λͺ¨λΈμ μ μνλ€. μκΈ°μ§λ νμ΅ λ°©λ²λ‘ μ μ μ©νκΈ° μν΄, μμ
μ λ°μ΄ν° λ³ν κΈ°λ²μ μ μ©νμ¬ νμ΅ λ°μ΄ν°λ₯Ό μμ±νλ λ°©λ²μ μ μνλ€. κ·Έλ¦¬κ³ μ€λΉλ νμ΅ λ°μ΄ν°λ₯Ό νμ©νλ μ¬μΈ΅ 거리 νμ΅ μμ€ν¨μλ₯Ό μ€κ³νλ€. μ€ν κ²°κ³Ό λΆμμ ν΅ν΄, μ μ λͺ¨λΈμ΄ νμ λ° μ»€λ²μ‘ μΌμ΄μ€μμ λμ€μμ
μ μ μ¬ν λ©λ‘λλ₯Ό νμ§ν μ μμμ νμΈνλ€.Chapter 1 Introduction 1
1.1 Background and Motivation 1
1.2 Objectives 4
1.3 Thesis Outline 6
Chapter 2 Literature Review 7
2.1 Attention Mechanism and Transformers 7
2.1.1 Attention-based Models 7
2.1.2 Transformers with Musical Event Sequence 8
2.2 Chord Recognition 11
2.3 Note-level Singing Melody Transcription 13
2.4 Musical Key Estimation 15
2.5 Beat Tracking 17
2.6 Music Plagiarism Detection and Cover Song Identi cation 19
2.7 Deep Metric Learning and Triplet Loss 21
Chapter 3 Problem De nition 23
3.1 Lead Sheet Transcription 23
3.1.1 Chord Recognition 24
3.1.2 Singing Melody Transcription 25
3.1.3 Post-processing for Lead Sheet Representation 26
3.2 Melody Similarity Assessment 28
Chapter 4 A Bi-directional Transformer for Musical Chord Recognition 29
4.1 Methodology 29
4.1.1 Model Architecture 29
4.1.2 Self-attention in Chord Recognition 33
4.2 Experiments 35
4.2.1 Datasets 35
4.2.2 Preprocessing 35
4.2.3 Evaluation Metrics 36
4.2.4 Training 37
4.3 Results 38
4.3.1 Quantitative Evaluation 38
4.3.2 Attention Map Analysis 41
Chapter 5 Note-level Singing Melody Transcription 44
5.1 Methodology 44
5.1.1 Monophonic Note Event Sequence 44
5.1.2 Audio Features 45
5.1.3 Model Architecture 46
5.1.4 Autoregressive Decoding and Monophonic Masking 47
5.1.5 Overlapping Decoding 47
5.1.6 Pitch Augmentation 49
5.1.7 Adding Noisy Dataset with Data Cleansing 50
5.2 Experiments 51
5.2.1 Dataset 51
5.2.2 Experiment Con gurations 52
5.2.3 Evaluation Metrics 53
5.2.4 Comparison Models 54
5.2.5 Human Evaluation 55
5.3 Results 56
5.3.1 Ablation Study 56
5.3.2 Note-level Transcription Model Comparison 59
5.3.3 Transcription Performance Distribution Analysis 59
5.3.4 Fundamental Frequency (F0) Metric Evaluation 60
5.4 Qualitative Analysis 62
5.4.1 Visualization of Ablation Study 62
5.4.2 Spectrogram Analysis 65
5.4.3 Human Evaluation 67
Chapter 6 Automatic Music Lead Sheet Transcription 68
6.1 Post-processing for Lead Sheet Representation 68
6.2 Lead Sheet Transcription Results 71
Chapter 7 Melody Similarity Assessment with Self-supervised Convolutional Neural Networks 77
7.1 Methodology 77
7.1.1 Input Data Representation 77
7.1.2 Data Augmentation 78
7.1.3 Model Architecture 82
7.1.4 Loss Function 84
7.1.5 De nition of Distance between Songs 85
7.2 Experiments 87
7.2.1 Dataset 87
7.2.2 Training 88
7.2.3 Evaluation Metrics 88
7.3 Results 89
7.3.1 Quantitative Evaluation 89
7.3.2 Qualitative Evaluation 99
Chapter 8 Conclusion 107
8.1 Summary and Contributions 107
8.2 Limitations and Future Research 110
Bibliography 111
κ΅λ¬Έμ΄λ‘ 126λ°
The relation between rhythm processing and cognitive abilities during child development: The role of prediction
Rhythm and meter are central elements of music. From the very beginning, children are responsive to rhythms and acquire increasingly complex rhythmic skills over the course of development. Previous research has shown that the processing of musical rhythm is not only related to childrenβs music-specific responses but also to their cognitive abilities outside the domain of music. However, despite a lot of research on that topic, the connections and underlying mechanisms involved in such relation are still unclear in some respects. In this article, we aim at analyzing the relation between rhythmic and cognitive-motor abilities during childhood and at providing a new hypothesis about this relation. We consider whether predictive processing may be involved in the relation between rhythmic and various cognitive abilities and hypothesize that prediction as a cross-domain process is a central mechanism building a bridge between rhythm processing and cognitive-motor abilities. Further empirical studies focusing on rhythm processing and cognitive-motor abilities are needed to precisely investigate the links between rhythmic, predictive, and cognitive processes
Evidence for multiple rhythmic skills
Rhythms, or patterns in time, play a vital role in both speech and music. Proficiency in a number of rhythm skills has been linked to language ability, suggesting that certain rhythmic processes in music and language rely on overlapping resources. However, a lack of understanding about how rhythm skills relate to each other has impeded progress in understanding how language relies on rhythm processing. In particular, it is unknown whether all rhythm skills are linked together, forming a single broad rhythmic competence, or whether there are multiple dissociable rhythm skills. We hypothesized that beat tapping and rhythm memory/sequencing form two separate clusters of rhythm skills. This hypothesis was tested with a battery of two beat tapping and two rhythm memory tests. Here we show that tapping to a metronome and the ability to adjust to a changing tempo while tapping to a metronome are related skills. The ability to remember rhythms and to drum along to repeating rhythmic sequences are also related. However, we found no relationship between beat tapping skills and rhythm memory skills. Thus, beat tapping and rhythm memory are dissociable rhythmic aptitudes. This discovery may inform future research disambiguating how distinct rhythm competencies track with specific language functions
- β¦