29 research outputs found

    Automatic music genre classification

    Get PDF
    A dissertation submitted to the Faculty of Science, University of the Witwatersrand, in fulfillment of the requirements for the degree of Master of Science. 2014.No abstract provided

    Content-based music structure analysis

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Music Information Retrieval using Machine Learning and Convolutional Neural Networks

    Get PDF
    Στην παρούσα διπλωματική εργασία γίνεται προσπάθεια ανάλυσης και ανάκτησης χαρακτηριστικών μουσικού περιεχομένου με τη χρήση αλγορίθμων Μηχανικής Μάθησης και Συνελλικτικών Νευρωνικών Δικτύων. Στόχος είναι η αναγνώριση μουσικών κομματιών και η κατηγοριοποίηση τους με βάση τα συναισθήματα που προκαλούν, το είδος και την ομοιότητά τους με άλλα τραγούδια μιας συλλογής. Για την επίτευξη αυτού του στόχου εκπαιδεύτηκαν μοντέλα Συνελικτικών Νευρωνικών Δικτύων που αφορούν το συναισθηματικό πρόσημο - σθένος, την ενέργεια, την χορευτικότητα και το είδος ενός τραγουδιού. Οι κλάσεις που αφορούν το είδος, δημιουργήθηκαν με τη χρήση μεθόδων ομαδοποίησης σε κλάσεις (clustering). Όλα τα μοντέλα εκπαιδεύτηκαν σε μεγάλο όγκο μουσικών δεδομένων με τη χρήση Συνελικτικού Νευρωνικού Δικτύου της python βιβλιοθήκης Deep Audio Features. Τέλος, η αξιολόγηση των μοντέλων βασίστηκε σε ένα σύνολο δεδομένων που δημιουργήθηκε από την αλληλεπίδραση χρηστών συγκρίνοντας τριπλέτες τραγουδιών και αποφασίζοντας ποιο τραγούδι κατά τη γνώμη τους ήταν το λιγότερο ταιριαστό με τα υπόλοιπα δύο. Στην διαδικασία της αξιολόγησης των μοντέλων συνέβαλε επίσης η δημιουργία εφαρμογής βασισμένης στο μουσικό περιεχόμενο με στόχο την οπτικοποίηση και σύγκριση ως προς ορισμένα χαρακτηριστικά τους, τραγουδιών μιας μικρής συλλογής.In this thesis an attempt is made to analyze and retrieve musical information using Machine Learning algorithms and Convolutional Neural Networks. The goal is to recognize and classify musical tracks based on their emotional impact, their genre and their similarity between others in a song collection. To accomplish this goal, models of Convolutional Neural Networks have been built related to the valence, the energy, the danceability and the genre of a song. The genre classes are created using clustering methods. All models are trained on a large volume of musical data using a Convolutional Neural Network from the Deep Audio Features python library. Finally, the evaluation of the models is based on a set of data created by user inter- action comparing triplets of songs and deciding which song according to their opinion is the least compatible with the other two. A music content-based application has been created that contributed to the evaluation procedure of the CNN models so that songs of a small collection could be visualized and compared to each other according to some of their characteristics

    Singing Voice Recognition for Music Information Retrieval

    Get PDF
    This thesis proposes signal processing methods for analysis of singing voice audio signals, with the objectives of obtaining information about the identity and lyrics content of the singing. Two main topics are presented, singer identification in monophonic and polyphonic music, and lyrics transcription and alignment. The information automatically extracted from the singing voice is meant to be used for applications such as music classification, sorting and organizing music databases, music information retrieval, etc. For singer identification, the thesis introduces methods from general audio classification and specific methods for dealing with the presence of accompaniment. The emphasis is on singer identification in polyphonic audio, where the singing voice is present along with musical accompaniment. The presence of instruments is detrimental to voice identification performance, and eliminating the effect of instrumental accompaniment is an important aspect of the problem. The study of singer identification is centered around the degradation of classification performance in presence of instruments, and separation of the vocal line for improving performance. For the study, monophonic singing was mixed with instrumental accompaniment at different signal-to-noise (singing-to-accompaniment) ratios and the classification process was performed on the polyphonic mixture and on the vocal line separated from the polyphonic mixture. The method for classification including the step for separating the vocals is improving significantly the performance compared to classification of the polyphonic mixtures, but not close to the performance in classifying the monophonic singing itself. Nevertheless, the results show that classification of singing voices can be done robustly in polyphonic music when using source separation. In the problem of lyrics transcription, the thesis introduces the general speech recognition framework and various adjustments that can be done before applying the methods on singing voice. The variability of phonation in singing poses a significant challenge to the speech recognition approach. The thesis proposes using phoneme models trained on speech data and adapted to singing voice characteristics for the recognition of phonemes and words from a singing voice signal. Language models and adaptation techniques are an important aspect of the recognition process. There are two different ways of recognizing the phonemes in the audio: one is alignment, when the true transcription is known and the phonemes have to be located, other one is recognition, when both transcription and location of phonemes have to be found. The alignment is, obviously, a simplified form of the recognition task. Alignment of textual lyrics to music audio is performed by aligning the phonetic transcription of the lyrics with the vocal line separated from the polyphonic mixture, using a collection of commercial songs. The word recognition is tested for transcription of lyrics from monophonic singing. The performance of the proposed system for automatic alignment of lyrics and audio is sufficient for facilitating applications such as automatic karaoke annotation or song browsing. The word recognition accuracy of the lyrics transcription from singing is quite low, but it is shown to be useful in a query-by-singing application, for performing a textual search based on the words recognized from the query. When some key words in the query are recognized, the song can be reliably identified
    corecore