7,302 research outputs found
Pitch ability as an aptitude for tone learning
Tone languages such as Mandarin use voice pitch to signal lexical contrasts, presenting a challenge for second/foreign language (L2) learners whose native languages do not use pitch in this manner. The present study examined components of an aptitude for mastering L2 lexical tone. Native English speakers with no previous tone language experience completed a Mandarin word learning task, as well as tests of pitch ability, musicality, L2 aptitude, and general cognitive ability. Pitch ability measures improved predictions of learning performance beyond musicality, L2 aptitude, and general cognitive ability and also predicted transfer of learning to new talkers. In sum, although certain nontonal measures help predict successful tone learning, the central components of tonal aptitude are pitch-specific perceptual measures
Perceptual Musical Features for Interpretable Audio Tagging
In the age of music streaming platforms, the task of automatically tagging
music audio has garnered significant attention, driving researchers to devise
methods aimed at enhancing performance metrics on standard datasets. Most
recent approaches rely on deep neural networks, which, despite their impressive
performance, possess opacity, making it challenging to elucidate their output
for a given input. While the issue of interpretability has been emphasized in
other fields like medicine, it has not received attention in music-related
tasks. In this study, we explored the relevance of interpretability in the
context of automatic music tagging. We constructed a workflow that incorporates
three different information extraction techniques: a) leveraging symbolic
knowledge, b) utilizing auxiliary deep neural networks, and c) employing signal
processing to extract perceptual features from audio files. These features were
subsequently used to train an interpretable machine-learning model for tag
prediction. We conducted experiments on two datasets, namely the MTG-Jamendo
dataset and the GTZAN dataset. Our method surpassed the performance of baseline
models in both tasks and, in certain instances, demonstrated competitiveness
with the current state-of-the-art. We conclude that there are use cases where
the deterioration in performance is outweighed by the value of
interpretability.Comment: Github Repository:
https://github.com/vaslyb/perceptible-music-taggin
Backwards is the way forward: feedback in the cortical hierarchy predicts the expected future
Clark offers a powerful description of the brain as a prediction machine, which offers progress on two distinct levels. First, on an abstract conceptual level, it provides a unifying framework for perception, action, and cognition (including subdivisions such as attention, expectation, and imagination). Second, hierarchical prediction offers progress on a concrete descriptive level for testing and constraining conceptual elements and mechanisms of predictive coding models (estimation of predictions, prediction errors, and internal models)
Expressivity-aware Music Performance Retrieval using Mid-level Perceptual Features and Emotion Word Embeddings
This paper explores a specific sub-task of cross-modal music retrieval. We
consider the delicate task of retrieving a performance or rendition of a
musical piece based on a description of its style, expressive character, or
emotion from a set of different performances of the same piece. We observe that
a general purpose cross-modal system trained to learn a common text-audio
embedding space does not yield optimal results for this task. By introducing
two changes -- one each to the text encoder and the audio encoder -- we
demonstrate improved performance on a dataset of piano performances and
associated free-text descriptions. On the text side, we use emotion-enriched
word embeddings (EWE) and on the audio side, we extract mid-level perceptual
features instead of generic audio embeddings. Our results highlight the
effectiveness of mid-level perceptual features learnt from music and emotion
enriched word embeddings learnt from emotion-labelled text in capturing musical
expression in a cross-modal setting. Additionally, our interpretable mid-level
features provide a route for introducing explainability in the retrieval and
downstream recommendation processes.Comment: Presented at FIRE 2023 (Forum for Information Retrieval Evaluation)
conference, Goa, Indi
A computational framework for sound segregation in music signals
Tese de doutoramento. Engenharia Electrotécnica e de Computadores. Faculdade de Engenharia. Universidade do Porto. 200
OK Computer Analysis: An Audio Corpus Study of Radiohead
The application of music information retrieval techniques in popular music studies has great promise. In the present work, a corpus of Radiohead songs across their career from 1992 to 2017 are subjected to automated audio analysis. We examine findings from a number of granularities and perspectives, including within song and between song examination of both timbral-rhythmic and harmonic features. Chronological changes include possible career spanning effects for a band's releases such as slowing tempi and reduced brightness, and the timbral markers of Radiohead's expanding approach to instrumental resources most identified with the Kid A and Amnesiac era. We conclude with a discussion highlighting some challenges for this approach, and the potential for a field of audio file based career analysis
DrumGAN: Synthesis of Drum Sounds With Timbral Feature Conditioning Using Generative Adversarial Networks
Synthetic creation of drum sounds (e.g., in drum machines) is commonly
performed using analog or digital synthesis, allowing a musician to sculpt the
desired timbre modifying various parameters. Typically, such parameters control
low-level features of the sound and often have no musical meaning or perceptual
correspondence. With the rise of Deep Learning, data-driven processing of audio
emerges as an alternative to traditional signal processing. This new paradigm
allows controlling the synthesis process through learned high-level features or
by conditioning a model on musically relevant information. In this paper, we
apply a Generative Adversarial Network to the task of audio synthesis of drum
sounds. By conditioning the model on perceptual features computed with a
publicly available feature-extractor, intuitive control is gained over the
generation process. The experiments are carried out on a large collection of
kick, snare, and cymbal sounds. We show that, compared to a specific prior work
based on a U-Net architecture, our approach considerably improves the quality
of the generated drum samples, and that the conditional input indeed shapes the
perceptual characteristics of the sounds. Also, we provide audio examples and
release the code used in our experiments.Comment: 8 pages, 1 figure, 3 tables, accepted in Proc. of the 21st
International Society for Music Information Retrieval (ISMIR2020
- …