7,804 research outputs found

    Beat histogram features for rhythm-based musical genre classification using multiple novelty functions

    Get PDF
    In this paper we present beat histogram features for multiple level rhythm description and evaluate them in a musical genre classification task. Audio features pertaining to various musical content categories and their related novelty functions are extracted as a basis for the creation of beat histograms. The proposed features capture not only amplitude, but also tonal and general spectral changes in the signal, aiming to represent as much rhythmic information as possible. The most and least informative features are identified through feature selection methods and are then tested using Support Vector Machines on five genre datasets concerning classification accuracy against a baseline feature set. Results show that the presented features provide comparable classification accuracy with respect to other genre classification approaches using periodicity histograms and display a performance close to that of much more elaborate up-to-date approaches for rhythm description. The use of bar boundary annotations for the texture frames has provided an improvement for the dance-oriented Ballroom dataset. The comparably small number of descriptors and the possibility of evaluating the influence of specific signal components to the general rhythmic content encourage the further use of the method in rhythm description tasks

    Rapid neural processing of grammatical tone in second language learners

    Get PDF
    The present dissertation investigates how beginner learners process grammatical tone in a second language and whether their processing is influenced by phonological transfer. Paper I focuses on the acquisition of Swedish grammatical tone by beginner learners from a non-tonal language, German. Results show that non-tonal beginner learners do not process the grammatical regularities of the tones but rather treat them akin to piano tones. A rightwards-going spread of activity in response to pitch difference in Swedish tones possibly indicates a process of tone sensitisation. Papers II to IV investigate how artificial grammatical tone, taught in a word-picture association paradigm, is acquired by German and Swedish learners. The results of paper II show that interspersed mismatches between grammatical tone and picture referents evoke an N400 only for the Swedish learners. Both learner groups produce N400 responses to picture mismatches related to grammatically meaningful vowel changes. While mismatch detection quickly reaches high accuracy rates, tone mismatches are least accurately and most slowly detected in both learner groups. For processing of the grammatical L2 words outside of mismatch contexts, the results of paper III reveal early, preconscious and late, conscious processing in the Swedish learner group within 20 minutes of acquisition (word recognition component, ELAN, LAN, P600). German learners only produce late responses: a P600 within 20 minutes and a LAN after sleep consolidation. The surprisingly rapid emergence of early grammatical ERP components (ELAN, LAN) is attributed to less resource-heavy processing outside of violation contexts. Results of paper IV, finally, indicate that memory trace formation, as visible in the word recognition component at ~50 ms, is only possible at the highest level of formal and functional similarity, that is, for words with falling tone in Swedish participants. Together, the findings emphasise the importance of phonological transfer in the initial stages of second language acquisition and suggest that the earlier the processing, the more important the impact of phonological transfer

    Sequential Complexity as a Descriptor for Musical Similarity

    Get PDF
    We propose string compressibility as a descriptor of temporal structure in audio, for the purpose of determining musical similarity. Our descriptors are based on computing track-wise compression rates of quantised audio features, using multiple temporal resolutions and quantisation granularities. To verify that our descriptors capture musically relevant information, we incorporate our descriptors into similarity rating prediction and song year prediction tasks. We base our evaluation on a dataset of 15500 track excerpts of Western popular music, for which we obtain 7800 web-sourced pairwise similarity ratings. To assess the agreement among similarity ratings, we perform an evaluation under controlled conditions, obtaining a rank correlation of 0.33 between intersected sets of ratings. Combined with bag-of-features descriptors, we obtain performance gains of 31.1% and 10.9% for similarity rating prediction and song year prediction. For both tasks, analysis of selected descriptors reveals that representing features at multiple time scales benefits prediction accuracy.Comment: 13 pages, 9 figures, 8 tables. Accepted versio

    Modularity and Neural Integration in Large-Vocabulary Continuous Speech Recognition

    Get PDF
    This Thesis tackles the problems of modularity in Large-Vocabulary Continuous Speech Recognition with use of Neural Network

    Unifying Amplitude and Phase Analysis: A Compositional Data Approach to Functional Multivariate Mixed-Effects Modeling of Mandarin Chinese

    Full text link
    Mandarin Chinese is characterized by being a tonal language; the pitch (or F0F_0) of its utterances carries considerable linguistic information. However, speech samples from different individuals are subject to changes in amplitude and phase which must be accounted for in any analysis which attempts to provide a linguistically meaningful description of the language. A joint model for amplitude, phase and duration is presented which combines elements from Functional Data Analysis, Compositional Data Analysis and Linear Mixed Effects Models. By decomposing functions via a functional principal component analysis, and connecting registration functions to compositional data analysis, a joint multivariate mixed effect model can be formulated which gives insights into the relationship between the different modes of variation as well as their dependence on linguistic and non-linguistic covariates. The model is applied to the COSPRO-1 data set, a comprehensive database of spoken Taiwanese Mandarin, containing approximately 50 thousand phonetically diverse sample F0F_0 contours (syllables), and reveals that phonetic information is jointly carried by both amplitude and phase variation.Comment: 49 pages, 13 figures, small changes to discussio

    Software agents in music and sound art research/creative work: Current state and a possible direction

    Get PDF
    Composers, musicians and computer scientists have begun to use software-based agents to create music and sound art in both linear and non-linear (non-predetermined form and/or content) idioms, with some robust approaches now drawing on various disciplines. This paper surveys recent work: agent technology is first introduced, a theoretical framework for its use in creating music/sound art works put forward, and an overview of common approaches then given. Identifying areas of neglect in recent research, a possible direction for further work is then briefly explored. Finally, a vision for a new hybrid model that integrates non-linear, generative, conversational and affective perspectives on interactivity is proposed
    corecore