12 research outputs found

    Finding Clusters of Similar Artists -Analysis of DBSCAN and K-means Clustering

    Get PDF
    Abstract We have applied k-means clustering and DBSCAN to the problem of finding sets of similar artists based on a large number of artists and their genres. For our experiments we used data from the Million Song Dataset, which is a freely available collection of a million popular music tracks' metadata created specifically for research. We ran the algorithms with varying values on their parameters and studied the effects. The resulting clusters were analyzed and for k-means we found three different types of clusters. Although the results from k-means were quite noisy, many of the clusters could be used gain some insight in the similarity between artists.This implied that using distances as a representation of similarities between artists is viable. DBSCAN did not prove to be as useful. This was because its clustering method is density-based and the density of the clusters in the input data differed by far too much for DBSCAN to handle. We found that more features in the input data, such as genre per track, would be desirable and would probably improve the results of the algorithms. Further study of other clustering algorithms applied to the same data would shed light on the actual effectiveness of the algorithms studied here. 2 Referat Vi har tillämpat k-means klustring och DBSCAN på problemet att hitta grupper av liknande artister baserat på ett stort antal artister och deras genrer. Till våra experiment har vi använt data från Million Song Dataset, som är en fritt tillgänglig samling av en miljon populära sångers metadata, som skapats speciellt för forskning. Vi körde algoritmerna med varierande värden på deras parametrar och studerade effekterna. De resulterande klustren analyserades och för k-means fann vi tre olika typer av kluster. Trots att resultaten från k-means innehöll ganska mycket brus, så skulle många av klustren kunna användas för att få en viss inblick i likheten mellan artister. Detta implicerar att man kan använda avstånd som en representation för likheter mellan artister. Resultaten från DBSCAN visade sig inte vara lika använd-bara. Detta berodde på att dess klustringsmetod är densitetsbaserad och densiteten hos klustren i indata skilde sig alltför mycket för att DBSCAN skulle klara av hitta dem. Vi fann att fler egenskaper i indata, såsom genre per spår, skulle vara önskvärt och skulle sannolikt förbättra resultaten från algoritmerna. Ytterligare studier av andra klustringsalgoritmer som tillämpas på samma data skulle belysa den faktiska effekten av de algoritmer studerade här.

    Evaluating Collaborative Filtering Algorithms for Music Recommendations on Chinese Music Data

    Get PDF
    In this thesis, I explored Collaborative Filtering algorithms used in music recommendation tasks in the Music Information Retrieval field. To find out if those CF algorithms work on Chinese music data, I developed a new dataset from the mainstream Chinese music streaming platform NetEase Could Music, and compared the performance of a series of Memory-based and Model-based collaborative filtering algorithms on our dataset. Our experimental results prove that these CF algorithms aiming at users’ information are effective on our dataset, and they have the predictive ability of music recommendation tasks on Chinese music data. In general, Model-based algorithms perform better than Memory-based algorithms. Within them, the SVD++ algorithm from Matrix Factorization-based methods reaches the best overall accuracy.Bachelor of Scienc

    A content-based music recommender system

    Get PDF
    Music recommenders have become increasingly relevant due to increased accessibility provided by various music streaming services. Some of these streaming services, such as Spotify, include a recommender system of their own. Despite many advances in recommendation techniques, recommender systems still often do not provide accurate recommendations. This thesis provides an overview of the history and developments of music information retrieval from a more content-based perspective. Furthermore, this thesis describes recommendation as a problem and the methods used for music recommendation with special focus on content-based recommendation by providing detailed descriptions on the audio content features and content-based similarity measures used in content-based music recommender systems. Some of the presented features are used in our own content-based music recommender. Both objective and subjective evaluation of the implemented recommender system further confirm the findings of many researchers that music recommendation based solely on audio content does not provide very accurate recommendations

    Context based multimedia information retrieval

    Get PDF

    Attribute Oriented Induction High Level Emerging Pattern (AOI-HEP)

    Get PDF
    Attribute-Oriented Induction of High-level Emerging Pattern(AOI-HEP) is a combination of Attribute Oriented Induction (AOI) and Emerging Patterns (EP). AOI is a summarisation algorithm that compact a given dataset into small conceptual descriptions, where each attribute has a defined concept hierarchy. This presents patterns are easily readable and understandable.Emerging patterns are patterns discovered between two datasets and between two time periods such that patterns found in the first dataset have either grown (or reduced) in size, totally disappeared or new ones have emerged. AOI-HEP is not influenced by border-based algorithm like in EP mining algorithms. It is desirable therefore that we obtain summarised emerging patterns between two datasets. We propose High-level Emerging Pattern (HEP) algorithm. The main purpose of combining AOI and EP is to use the typical strength of AOI and EP to extract important high-level emerging patterns from data. The AOI characteristic rule algorithm was run twice with two input datasets,to create two rulesets which are then processed with the HEP algorithm. Firstly, the HEP algorithm starts with cartesian product between two rulesets which eliminates rules in rulesets by computing similarity metric (a categorization of attribute comparisons). Secondly, the output rules between two rulesets from the metric similarity are discriminated by computing a growth rate value to find ratio of supports between rules from two rulesets. The categorization of attribute comparisons is based on similarity hierarchy level. The categorisation of attributes was found to be with three options in how they subsume each other. These were Total Subsumption HEP (TSHEP), Subsumption Overlapping HEP (SOHEP) and Total Overlapping HEP (TOHEP) patterns. Meanwhile, from certain similarity hierarchy level and values, we can mine frequent and similar patterns that create discriminant rules. We used four large real datasets from UCI machine learning repository and discovered valuable HEP patterns including strong discriminant rules, frequent and similar patterns. Moreover, the experiments showed that most datasets have SOHEP but not TSHEP and TOHEP and the most rarely found were TOHEP. Since AOI- iii HEP can strongly discriminate high-level data, assuredly AOI-HEP can be implemented to discriminate datasets such as finding bad and good customers for banking loan systems or credit card applicants etc. Moreover, AOI-HEP can be implemented to mine similar patterns, for instance, mining similar customer loan patterns etc

    Linking Music Metadata.

    Get PDF
    PhDThe internet has facilitated music metadata production and distribution on an unprecedented scale. A contributing factor of this data deluge is a change in the authorship of this data from the expert few to the untrained crowd. The resulting unordered flood of imperfect annotations provides challenges and opportunities in identifying accurate metadata and linking it to the music audio in order to provide a richer listening experience. We advocate novel adaptations of Dynamic Programming for music metadata synchronisation, ranking and comparison. This thesis introduces Windowed Time Warping, Greedy, Constrained On-Line Time Warping for synchronisation and the Concurrence Factor for automatically ranking metadata. We begin by examining the availability of various music metadata on the web. We then review Dynamic Programming methods for aligning and comparing two source sequences whilst presenting novel, specialised adaptations for efficient, realtime synchronisation of music and metadata that make improvements in speed and accuracy over existing algorithms. The Concurrence Factor, which measures the degree in which an annotation of a song agrees with its peers, is proposed in order to utilise the wisdom of the crowds to establish a ranking system. This attribute uses a combination of the standard Dynamic Programming methods Levenshtein Edit Distance, Dynamic Time Warping, and Longest Common Subsequence to compare annotations. We present a synchronisation application for applying the aforementioned methods as well as a tablature-parsing application for mining and analysing guitar tablatures from the web. We evaluate the Concurrence Factor as a ranking system on a largescale collection of guitar tablatures and lyrics to show a correlation with accuracy that is superior to existing methods currently used in internet search engines, which are based on popularity and human ratingsEngineering and Physical Sciences Research Council; Travel grant from the Royal Engineering Society

    Probabilistic models for music

    Get PDF
    This thesis proposes to analyse symbolic musical data under a statistical viewpoint, using state-of-the-art machine learning techniques. Our main argument is to show that it is possible to design generative models that are able to predict and to generate music given arbitrary contexts in a genre similar to a training corpus, using a minimal amount of data. For instance, a carefully designed generative model could guess what would be a good accompaniment for a given melody. Conversely, we propose generative models in this thesis that can be sampled to generate realistic melodies given harmonic context. Most computer music research has been devoted so far to the direct modeling of audio data. However, most of the music models today do not consider the musical structure at all. We argue that reliable symbolic music models such a the ones presented in this thesis could dramatically improve the performance of audio algorithms applied in more general contexts. Hence, our main contributions in this thesis are three-fold: We have shown empirically that long term dependencies are present in music data and we provide quantitative measures of such dependencies; We have shown empirically that using domain knowledge allows to capture long term dependencies in music signal better than with standard statistical models for temporal data. We describe many probabilistic models aimed to capture various aspects of symbolic polyphonic music. Such models can be used for music prediction. Moreover, these models can be sampled to generate realistic music sequences; We designed various representations for music that could be used as observations by the proposed probabilistic models

    Content-based music classification, summarization and retrieval

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Correlation-based methods for data cleaning, with application to biological databases

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Analyse de structures répétitives dans les séquences musicales

    Get PDF
    Cette thèse rend compte de travaux portant sur l inférence de structures répétitives à partir du signal audio à l aide d algorithmes du texte. Son objectif principal est de proposer et d évaluer des algorithmes d inférence à partir d une étude formelle des notions de similarité et de répétition musicale.Nous présentons d abord une méthode permettant d obtenir une représentation séquentielle à partir du signal audio. Nous introduisons des outils d alignement permettant d estimer la similarité entre de telles séquences musicales, et évaluons l application de ces outils pour l identi cation automatique de reprises. Nous adaptons alors une technique d indexation de séquences biologiques permettant une estimation e cace de la similarité musicale au sein de bases de données conséquentes.Nous introduisons ensuite plusieurs répétitions musicales caractéristiques et employons les outils d alignement pour identi er ces répétitions. Une première structure, la répétition d un segment choisi, est analysée et évaluée dans le cadre dela reconstruction de données manquantes. Une deuxième structure, la répétition majeure, est dé nie, analysée et évaluée par rapport à un ensemble d annotations d experts, puis en tant qu alternative d indexation pour l identi cation de reprises.Nous présentons en n la problématique d inférence de structures répétitives telle qu elle est traitée dans la littérature, et proposons notre propre formalisation du problème. Nous exposons alors notre modélisation et proposons un algorithme permettant d identi er une hiérarchie de répétitions. Nous montrons la pertinence de notre méthode à travers plusieurs exemples et en l évaluant par rapport à l état de l art.The work presented in this thesis deals with repetitive structure inference from audio signal using string matching techniques. It aims at proposing and evaluating inference algorithms from a formal study of notions of similarity and repetition in music.We rst present a method for representing audio signals by symbolic strings. We introduce alignment tools enabling similarity estimation between such musical strings, and evaluate the application of these tools for automatic cover song identi cation. We further adapt a bioinformatics indexing technique to allow e cient assessments of music similarity in large-scale datasets. We then introduce several speci c repetitive structures and use alignment tools to analyse these repetitions. A rst structure, namely the repetition of a chosen segment, is retrieved and evaluated in the context of automatic assignment of missingaudio data. A second structure, namely the major repetition, is de ned, retrieved and evaluated regarding expert annotations, and as an alternative indexing method for cover song identi cation.We nally present the problem of repetitive structure inference as addressed in literature, and propose our own problem statement. We further describe our model and propose an algorithm enabling the identi cation of a hierarchical music structure. We emphasize the relevance of our method through several examples and by comparing it to the state of the art.BORDEAUX1-Bib.electronique (335229901) / SudocSudocFranceF
    corecore