14 research outputs found
Music recommendation and discovery in the long tail
Avui en dia, la música està esbiaixada cap al consum d'alguns artistes molt populars. Per exemple, el 2007 només l'1% de totes les cançons en format digital va representar el 80% de les vendes. De la mateixa manera, només 1.000 àlbums varen representar el 50% de totes les vendes, i el 80% de tots els àlbums venuts es varen comprar menys de 100 vegades. Es clar que hi ha una necessitat per tal d'ajudar a les persones a filtrar, descobrir, personalitzar i recomanar música, a partir de l'enorme quantitat de contingut musical disponible. Els algorismes de recomanació de música actuals intenten predir amb precisió el que els usuaris demanen escoltar. Tanmateix, molt sovint aquests algoritmes tendeixen a recomanar artistes famosos, o coneguts d'avantmà per l'usuari. Això fa que disminueixi l'eficàcia i utilitat de les recomanacions, ja que aquests algorismes es centren bàsicament en millorar la precisió de les recomanacions. És a dir, tracten de fer prediccions exactes sobre el que un usuari pugui escoltar o comprar, independentment de quant útils siguin les recomanacions generades. En aquesta tesi destaquem la importància que l'usuari valori les recomanacions rebudes. Per aquesta raó modelem la corba de popularitat dels artistes, per tal de poder recomanar música interessant i desconeguda per l'usuari. Les principals contribucions d'aquesta tesi són: (i) un nou enfocament basat en l'anàlisi de xarxes complexes i la popularitat dels productes, aplicada als sistemes de recomanació, (ii) una avaluació centrada en l'usuari, que mesura la importància i la desconeixença de les recomanacions, i (iii) dos prototips que implementen la idees derivades de la tasca teòrica. Els resultats obtinguts tenen una clara implicació per aquells sistemes de recomanació que ajuden a l'usuari a explorar i descobrir continguts que els pugui agradar.Actualmente, el consumo de música está sesgada hacia algunos artistas muy populares. Por ejemplo, en el año 2007 sólo el 1% de todas las canciones en formato digital representaron el 80% de las ventas. De igual modo, únicamente 1.000 álbumes representaron el 50% de todas las ventas, y el 80% de todos los álbumes vendidos se compraron menos de 100 veces. Existe, pues, una necesidad de ayudar a los usuarios a filtrar, descubrir, personalizar y recomendar música a partir de la enorme cantidad de contenido musical existente. Los algoritmos de recomendación musical existentes intentan predecir con precisión lo que la gente quiere escuchar. Sin embargo, muy a menudo estos algoritmos tienden a recomendar o bien artistas famosos, o bien artistas ya conocidos de antemano por el usuario.Esto disminuye la eficacia y la utilidad de las recomendaciones, ya que estos algoritmos se centran en mejorar la precisión de las recomendaciones. Con lo cuál, tratan de predecir lo que un usuario pudiera escuchar o comprar, independientemente de lo útiles que sean las recomendaciones generadas. En este sentido, la tesis destaca la importancia de que el usuario valore las recomendaciones propuestas. Para ello, modelamos la curva de popularidad de los artistas con el fin de recomendar música interesante y, a la vez, desconocida para el usuario.Las principales contribuciones de esta tesis son: (i) un nuevo enfoque basado en el análisis de redes complejas y la popularidad de los productos, aplicada a los sistemas de recomendación,(ii) una evaluación centrada en el usuario que mide la calidad y la novedad de las recomendaciones, y (iii) dos prototipos que implementan las ideas derivadas de la labor teórica. Los resultados obtenidos tienen importantes implicaciones para los sistemas de recomendación que ayudan al usuario a explorar y descubrir contenidos que le puedan gustar.Music consumption is biased towards a few popular artists. For instance, in 2007 only 1% of all digital tracks accounted for 80% of all sales. Similarly, 1,000 albums accounted for 50% of all album sales, and 80% of all albums sold were purchased less than 100 times. There is a need to assist people to filter, discover, personalise and recommend from the huge amount of music content available along the Long Tail.Current music recommendation algorithms try to accurately predict what people demand to listen to. However, quite often these algorithms tend to recommend popular -or well-known to the user- music, decreasing the effectiveness of the recommendations. These approaches focus on improving the accuracy of the recommendations. That is, try to make accurate predictions about what a user could listen to, or buy next, independently of how useful to the user could be the provided recommendations. In this Thesis we stress the importance of the user's perceived quality of the recommendations. We model the Long Tail curve of artist popularity to predict -potentially- interesting and unknown music, hidden in the tail of the popularity curve. Effective recommendation systems should promote novel and relevant material (non-obvious recommendations), taken primarily from the tail of a popularity distribution. The main contributions of this Thesis are: (i) a novel network-based approach for recommender systems, based on the analysis of the item (or user) similarity graph, and the popularity of the items, (ii) a user-centric evaluation that measures the user's relevance and novelty of the recommendations, and (iii) two prototype systems that implement the ideas derived from the theoretical work. Our findings have significant implications for recommender systems that assist users to explore the Long Tail, digging for content they might like
Bridging the Music Semantic Gap
In this paper we present the music information plane and the
dfferent levels of information extraction that exist in the musical domain.
Based on this approach we propose a way to overcome the existing semantic
gap in the music field. Our approximation is twofold: we propose
a set of music descriptors that can automatically be extracted from the
audio signals, and a top-down approach that adds explicit and formal
semantics to these annotations. These music descriptors are generated
in two ways: as derivations and combinations of lower-level descriptors
and as generalizations induced from manually annotated databases by
the intensive application of machine learning. We belive that merging
both approaches (bottom-up and top-down) can overcome the existing
semantic gap in the musical domain.The reported research has been funded by the EU-FP6-IST-507142 project
SIMAC (Semantic Interaction with Music Audio Contents)
Bridging the Music Semantic Gap
In this paper we present the music information plane and the
dfferent levels of information extraction that exist in the musical domain.
Based on this approach we propose a way to overcome the existing semantic
gap in the music field. Our approximation is twofold: we propose
a set of music descriptors that can automatically be extracted from the
audio signals, and a top-down approach that adds explicit and formal
semantics to these annotations. These music descriptors are generated
in two ways: as derivations and combinations of lower-level descriptors
and as generalizations induced from manually annotated databases by
the intensive application of machine learning. We belive that merging
both approaches (bottom-up and top-down) can overcome the existing
semantic gap in the musical domain.The reported research has been funded by the EU-FP6-IST-507142 project
SIMAC (Semantic Interaction with Music Audio Contents)
Inferring semantic facets of a music folksonomy with wikipedia
Music folksonomies include both general and detailed descriptions of music, and are usually continuously updated. These are significant advantages over music taxonomies, which tend to be incomplete and inconsistent. However, music folksonomies have an inherent loose and open semantics, which hampers their use in many applications, such as structured music browsing and recommendation. In this paper, we present a system that can (1) automatically obtain a set of semantic facets underlying the folksonomy of the social music website Last.fm, and (2) categorize Last.fm tags with respect to the obtained facets. The semantic facets are anchored upon the structure of Wikipedia, a dynamic repository of universal knowledge.Fabien Gouyon is supported by the Media Arts and Technologies project (MAT), NORTE-07-0124-FEDER-000061, co-financed by the North Portugal Regional Operational Programme (ON.2 O Novo Norte), under the National Strategic Reference Framework (NSRF), through the European Regional Development Fund (ERDF), and by national funds, through the Portuguese funding agency, Fundação para a Ciência e a Tecnologia (FCT).”
Extending the folksonomies of freesound.org using content-based audio analysis
Comunicació presentada a la 6th Sound and Music Computing Conference, celebrada els dies 23 a 25 de juliol de 2009 a Porto, Portugal.This paper presents an in–depth study of the social tagging
mechanisms used in Freesound.org, an online community
where users share and browse audio files by means of tags
and content–based audio similarity search. We performed
two analyses of the sound collection. The first one is related
with how the users tag the sounds, and we could detect some
well–known problems that occur in collaborative tagging
systems (i.e. polysemy, synonymy, and the scarcity of the
existing annotations). Moreover, we show that more than
10% of the collection were scarcely annotated with only one
or two tags per sound, thus frustrating the retrieval task. In
this sense, the second analysis focuses on enhancing the semantic
annotations of these sounds, by means of content–
based audio similarity (autotagging). In order to “autotag”
the sounds, we use a k–NN classifier that selects the available
tags from the most similar sounds. Human assessment
is performed in order to evaluate the perceived quality of the
candidate tags. The results show that, in 77% of the sounds
used, the annotations have been correctly extended with the
proposed tags derived from audio similarity
Inferring semantic facets of a music folksonomy with wikipedia
Music folksonomies include both general and detailed descriptions of music, and are usually continuously updated. These are significant advantages over music taxonomies, which tend to be incomplete and inconsistent. However, music folksonomies have an inherent loose and open semantics, which hampers their use in many applications, such as structured music browsing and recommendation. In this paper, we present a system that can (1) automatically obtain a set of semantic facets underlying the folksonomy of the social music website Last.fm, and (2) categorize Last.fm tags with respect to the obtained facets. The semantic facets are anchored upon the structure of Wikipedia, a dynamic repository of universal knowledge.Fabien Gouyon is supported by the Media Arts and Technologies project (MAT), NORTE-07-0124-FEDER-000061, co-financed by the North Portugal Regional Operational Programme (ON.2 O Novo Norte), under the National Strategic Reference Framework (NSRF), through the European Regional Development Fund (ERDF), and by national funds, through the Portuguese funding agency, Fundação para a Ciência e a Tecnologia (FCT).”
Extending the folksonomies of freesound.org using content-based audio analysis
Comunicació presentada a la 6th Sound and Music Computing Conference, celebrada els dies 23 a 25 de juliol de 2009 a Porto, Portugal.This paper presents an in–depth study of the social tagging
mechanisms used in Freesound.org, an online community
where users share and browse audio files by means of tags
and content–based audio similarity search. We performed
two analyses of the sound collection. The first one is related
with how the users tag the sounds, and we could detect some
well–known problems that occur in collaborative tagging
systems (i.e. polysemy, synonymy, and the scarcity of the
existing annotations). Moreover, we show that more than
10% of the collection were scarcely annotated with only one
or two tags per sound, thus frustrating the retrieval task. In
this sense, the second analysis focuses on enhancing the semantic
annotations of these sounds, by means of content–
based audio similarity (autotagging). In order to “autotag”
the sounds, we use a k–NN classifier that selects the available
tags from the most similar sounds. Human assessment
is performed in order to evaluate the perceived quality of the
candidate tags. The results show that, in 77% of the sounds
used, the annotations have been correctly extended with the
proposed tags derived from audio similarity
Mucosa: a music content semantic annotator
Comunicació presentada a: ISMIR 2005 6th International Conference on Music Information Retrieval, celebrada de l'11 al 15 de setembre de 2005 a Londres, Regne UnitMUCOSA (Music Content Semantic Annotator) is an
environment for the annotation and generation of music
metadata at different levels of abstraction. It is composed
of three tiers: an annotation client that deals with microannotations
(i.e. within-file annotations), a collection
tagger, which deals with macro-annotations (i.e. acrossfiles
annotations), and a collaborative annotation subsystem,
which manages large-scale annotation tasks that can
be shared among different research centres. The annotation
client is an enhanced version of WaveSurfer, a
speech annotation tool. The collection tagger includes
tools for automatic generation of unary descriptors, invention
of new descriptors, and propagation of descriptors
across sub-collections or playlists. Finally, the collaborative
annotation subsystem, based on Plone, makes
possible to share the annotation chores and results between
several research institutions. A collection of annotated
songs is available, as a “starter pack” to all the individuals
or institutions that are eager to join this initiative.The research and development reported here was partially
funded by the EU-FP6-IST-507142 project
SIMAC (Semantic Interaction with Music Audio Contents)
project. The authors would like to thank Edgar
Barroso, and the Audioclas and CLAM teams for their
support to the project
Singing voice synthesis combining excitation plus resonance and sinusoidal plus residual models
This paper presents an approach to the modeling of the singing voice with a particular emphasis on the naturalness of the resulting synthetic voice. The underlying analysis/synthesis technique is based on the Spectral Modeling Synthesis (SMS) and a newly developed Excitation plus Resonance (EpR) model. With this approach a complete singing voice synthesizer is developed that generates a vocal melody out of the score and the phonetic transcription of a song
Singing voice synthesis combining excitation plus resonance and sinusoidal plus residual models
This paper presents an approach to the modeling of the singing voice with a particular emphasis on the naturalness of the resulting synthetic voice. The underlying analysis/synthesis technique is based on the Spectral Modeling Synthesis (SMS) and a newly developed Excitation plus Resonance (EpR) model. With this approach a complete singing voice synthesizer is developed that generates a vocal melody out of the score and the phonetic transcription of a song