    Music feature extraction and analysis through Python

    En l'era digital, plataformes com Spotify s'han convertit en els principals canals de consum de música, ampliant les possibilitats per analitzar i entendre la música a través de les dades. Aquest projecte es centra en un examen exhaustiu d'un conjunt de dades obtingut de Spotify, utilitzant Python com a eina per a l'extracció i anàlisi de dades. L'objectiu principal es centra en la creació d'aquest conjunt de dades, emfatitzant una àmplia varietat de cançons de diversos subgèneres. La intenció és representar tant el panorama musical més tendenciós i popular com els nínxols, alineant-se amb el concepte de distribució de Cua Llarga, terme popularitzat com a "Long Tail" en anglès, que destaca el potencial de mercat de productes de nínxols amb menor popularitat. A través de l'anàlisi, es posen de manifest patrons en l'evolució de les característiques musicals al llarg de les dècades passades. Canvis en característiques com l'energia, el volum, la capacitat de ball, el positivisme que desprèn una cançó i la seva correlació amb la popularitat sorgeixen del conjunt de dades. Paral·lelament a aquesta anàlisi, es concep un sistema de recomanació musical basat en el contingut del conjunt de dades creat. L'objectiu és connectar cançons, especialment les menys conegudes, amb possibles oients. Aquest projecte ofereix perspectives beneficioses per a entusiastes de la música, científics de dades i professionals de la indústria. Les metodologies implementades i l'anàlisi realitzat presenten un punt de convergència de la ciència de dades i la indústria de la música en el context digital actualEn la era digital, plataformas como Spotify se han convertido en los principales canales de consumo de música, ampliando las posibilidades para analizar y entender la música a través de los datos. Este proyecto se centra en un examen exhaustivo de un conjunto de datos obtenido de Spotify, utilizando Python como herramienta para la extracción y análisis de datos. El objetivo principal se centra en la creación de este conjunto de datos, enfatizando una amplia variedad de canciones de diversos subgéneros. La intención es representar tanto el panorama musical más tendencioso y popular como los nichos, alineándose con el concepto de distribución de Cola Larga, término popularizado como Long Tail en inglés, que destaca el potencial de mercado de productos de nichos con menor popularidad. A través del análisis, se evidencian patrones en la evolución de las características musicales a lo largo de las décadas pasadas. Cambios en características como la energía, el volumen, la capacidad de baile, el positivismo que desprende una canción y su correlación con la popularidad surgen del conjunto de datos. Paralelamente a este análisis, se concibe un sistema de recomendación musical basado en el contenido del conjunto de datos creado. El objetivo es conectar canciones, especialmente las menos conocidas, con posibles oyentes. Este proyecto ofrece perspectivas beneficiosas para entusiastas de la música, científicos de datos y profesionales de la industria. Las metodologías implementadas y el análisis realizado presentan un punto de convergencia de la ciencia de datos y la industria de la música en el contexto digital actualIn the digital era, platforms like Spotify have become the primary channels of music consumption, broadening the possibilities for analyzing and understanding music through data. This project focuses on a comprehensive examination of a dataset sourced from Spotify, with Python as the tool for data extraction and analysis. The primary objective centers around the creation of this dataset, emphasizing a diverse range of songs from various subgenres. The intention is to represent both mainstream and niche musical landscapes, aligning with the Long Tail distribution concept, which highlights the market potential of less popular niche products. Through analysis, patterns in the evolution of musical features over past decades become evident. Shifts in features such as energy, loudness, danceability, and valence and their correlation with popularity emerge from the dataset. Parallel to this analysis is the conceptualization of a music recommendation system based on the content of the data set. The aim is to connect tracks, especially lesser-known ones, with potential listeners. This project provides insights beneficial for music enthusiasts, data scientists, and industry professionals. The methodologies and analyses present a convergence of data science and the music industry in today's digital contex

    Workshop proceedings:CBRecSys 2014. Workshop on New Trends in Content-based Recommender Systems

    Tag based Bayesian latent class models for movies : economic theory reaches out to big data science

    For the past 50 years, cultural economics has developed as an independent research specialism. At its core are the creative industries and the peculiar economics associated with them, central to which is a tension that arises from the notion that creative goods need to be experienced before an assessment can be made about the utility they deliver to the consumer. In this they differ from the standard private good that forms the basis of demand theory in economic textbooks, in which utility is known ex ante. Furthermore, creative goods are typically complex in composition and subject to heterogeneous and shifting consumer preferences. In response to this, models of linear optimization, rational addiction and Bayesian learning have been applied to better understand consumer decision- making, belief formation and revision. While valuable, these approaches do not lend themselves to forming verifiable hypothesis for the critical reason that they by-pass an essential aspect of creative products: namely, that of novelty. In contrast, computer sciences, and more specifically recommender theory, embrace creative products as a study object. Being items of online transactions, users of creative products share opinions on a massive scale and in doing so generate a flow of data driven research. Not limited by the multiple assumptions made in economic theory, data analysts deal with this type of commodity in a less constrained way, incorporating the variety of item characteristics, as well as their co-use by agents. They apply statistical techniques supporting big data, such as clustering, latent class analysis or singular value decomposition. This thesis is drawn from both disciplines, comparing models, methods and data sets. Based upon movie consumption, the work contrasts bottom-up versus top-down approaches, individual versus collective data, distance measures versus the utility-based comparisons. Rooted in Bayesian latent class models, a synthesis is formed, supported by the random utility theory and recommender algorithm methods. The Bayesian approach makes explicit the experience good nature of creative goods by formulating the prior uncertainty of users towards both movie features and preferences. The latent class method, thus, infers the heterogeneous aspect of preferences, while its dynamic variant- the latent Markov model - gets around one of the main paradoxes in studying creative products: how to analyse taste dynamics when confronted with a good that is novel at each decision point. Generated by mainly movie-user-rating and movie-user-tag triplets, collected from the Movielens recommender system and made available as open data for research by the GroupLens research team, this study of preference patterns formation for creative goods is drawn from individual level data

    Information of social media platforms: the case of Last.fm

    Social media has become a global phenomenon. Currently, there are 2 billion active users on Facebook. However, much of the research on social media is about the consumption side of social media rather than the production or operational aspects of social media. Although research on the production side is still relatively small, it is growing, indicating that it is a fruitful area to study. This thesis attempts to contribute to this area of research to unravel the inner operations of social media with one key research question: How does social media platform organize information? The theory of digital object of Kallinikos et al. (2013) is used to investigate this question. Information display that users of a social media platform interact with is a digital object and it is constructed by two key components which are a database and algorithms. The database and the algorithms shape how information is being organized on information displays, and these influence user behaviors which are then captured as social data in the database. This thesis also critically examines the technology of recommender system by importing engineering literature on information filtering and retrieval. While newsfeed algorithm such as EdgeRank of Facebook has already been critically examined, information systems and media scholars have yet to investigate recommendation algorithms, despite the fact that they have been widely deployed all over the Internet. It is found that the key weakness of recommendation algorithms is their inability to recommend novel items. This is because the main tenet of any recommender system is to “recommend similar items to those that users already like”. Fortunately, this problem can be alleviated when recommender system is being deployed in the digital information environment of social media platforms. In turn, seven theoretical conjectures can be postulated. These are (1) navigation of information display as assembled by social media is highly interactive, (2) information organization of social media is highly unstable which would also render user behaviors unstable, (3) quality of data aggregation casts significant implications on user behaviors, (4) the amount of data captured by social media platforms limits the usefulness of their information displays, (5) output from the recommendation algorithm (recommendation list) casts real implications on user behaviors, (6) circle of friends on a social network can influence user behaviors, and (7) metadata attached to items being displayed casts influence on user behaviors. Data from Last.fm, a social media for music discovery, is used to evaluate these conjectures. The analysis supported most of the conjectures except the instability of information display and the importance of metadata attached to items being displayed. Some kinds of information organization are more stable than initially expected and some kinds of user generated contents are not so important for user behaviors

    Sequential decision making in artificial musical intelligence

    Over the past 60 years, artificial intelligence has grown from a largely academic field of research to a ubiquitous array of tools and approaches used in everyday technology. Despite its many recent successes and growing prevalence, certain meaningful facets of computational intelligence have not been as thoroughly explored. Such additional facets cover a wide array of complex mental tasks which humans carry out easily, yet are difficult for computers to mimic. A prime example of a domain in which human intelligence thrives, but machine understanding is still fairly limited, is music. Over the last decade, many researchers have applied computational tools to carry out tasks such as genre identification, music summarization, music database querying, and melodic segmentation. While these are all useful algorithmic solutions, we are still a long way from constructing complete music agents, able to mimic (at least partially) the complexity with which humans approach music. One key aspect which hasn't been sufficiently studied is that of sequential decision making in musical intelligence. This thesis strives to answer the following question: Can a sequential decision making perspective guide us in the creation of better music agents, and social agents in general? And if so, how? More specifically, this thesis focuses on two aspects of musical intelligence: music recommendation and human-agent (and more generally agent-agent) interaction in the context of music. The key contributions of this thesis are the design of better music playlist recommendation algorithms; the design of algorithms for tracking user preferences over time; new approaches for modeling people's behavior in situations that involve music; and the design of agents capable of meaningful interaction with humans and other agents in a setting where music plays a roll (either directly or indirectly). Though motivated primarily by music-related tasks, and focusing largely on people's musical preferences, this thesis also establishes that insights from music-specific case studies can also be applicable in other concrete social domains, such as different types of content recommendation. Showing the generality of insights from musical data in other contexts serves as evidence for the utility of music domains as testbeds for the development of general artificial intelligence techniques. Ultimately, this thesis demonstrates the overall usefulness of taking a sequential decision making approach in settings previously unexplored from this perspectiveComputer Science

    Social contextuality and conversational recommender systems

    As people continue to become more involved in both creating and consuming information, new interactive methods of retrieval are being developed. In this thesis we examine conversational approaches to recommendation, that is, the act of suggesting items to users based on the system’s understanding of them. Conversational recommendation is a recent contribution to the task of information discovery. We propose a novel approach to conversation around recommendation, examining how it is improved to work with collaborative filtering, a common recommendation algorithm. In developing new ways to recommend information to people we also examine their methods of information seeking, exploring the role of conversational recommendation, using both interview and sensed brain signals. We also look at the implications of the wealth of social and sensed information now available and how it improves the task of accurate recommendation. By allowing systems to better understand the connections between users and how their social impact can be tracked we show improved recommendation accuracy. We look at the social information around recommendations, proposing a directed influence approach between socially connected individuals, for the purpose of weighting recommendations with the wisdom of influencers. We then look at the semantic relationships that might seem to indicate wisdom (i.e. authors on a book-ranking site) to see if the ``wisdom of the few'' can be traced back to those conventionally considered wise in the area. Finally we look at ``contextuality'' (the ability of sets of contextual sensors to accurately recommend items across groups of people) in recommendation, showing that different users have very different uses for context within recommendation. This thesis shows that conversational recommendation can be generalised to work well with collaborative filtering, that social influence contributes to recommendation accuracy, and that contextual factors should not be treated the same for each user

    Visual Analytics for the Exploratory Analysis and Labeling of Cultural Data

    Cultural data can come in various forms and modalities, such as text traditions, artworks, music, crafted objects, or even as intangible heritage such as biographies of people, performing arts, cultural customs and rites. The assignment of metadata to such cultural heritage objects is an important task that people working in galleries, libraries, archives, and museums (GLAM) do on a daily basis. These rich metadata collections are used to categorize, structure, and study collections, but can also be used to apply computational methods. Such computational methods are in the focus of Computational and Digital Humanities projects and research. For the longest time, the digital humanities community has focused on textual corpora, including text mining, and other natural language processing techniques. Although some disciplines of the humanities, such as art history and archaeology have a long history of using visualizations. In recent years, the digital humanities community has started to shift the focus to include other modalities, such as audio-visual data. In turn, methods in machine learning and computer vision have been proposed for the specificities of such corpora. Over the last decade, the visualization community has engaged in several collaborations with the digital humanities, often with a focus on exploratory or comparative analysis of the data at hand. This includes both methods and systems that support classical Close Reading of the material and Distant Reading methods that give an overview of larger collections, as well as methods in between, such as Meso Reading. Furthermore, a wider application of machine learning methods can be observed on cultural heritage collections. But they are rarely applied together with visualizations to allow for further perspectives on the collections in a visual analytics or human-in-the-loop setting. Visual analytics can help in the decision-making process by guiding domain experts through the collection of interest. However, state-of-the-art supervised machine learning methods are often not applicable to the collection of interest due to missing ground truth. One form of ground truth are class labels, e.g., of entities depicted in an image collection, assigned to the individual images. Labeling all objects in a collection is an arduous task when performed manually, because cultural heritage collections contain a wide variety of different objects with plenty of details. A problem that arises with these collections curated in different institutions is that not always a specific standard is followed, so the vocabulary used can drift apart from another, making it difficult to combine the data from these institutions for large-scale analysis. This thesis presents a series of projects that combine machine learning methods with interactive visualizations for the exploratory analysis and labeling of cultural data. First, we define cultural data with regard to heritage and contemporary data, then we look at the state-of-the-art of existing visualization, computer vision, and visual analytics methods and projects focusing on cultural data collections. After this, we present the problems addressed in this thesis and their solutions, starting with a series of visualizations to explore different facets of rap lyrics and rap artists with a focus on text reuse. Next, we engage in a more complex case of text reuse, the collation of medieval vernacular text editions. For this, a human-in-the-loop process is presented that applies word embeddings and interactive visualizations to perform textual alignments on under-resourced languages supported by labeling of the relations between lines and the relations between words. We then switch the focus from textual data to another modality of cultural data by presenting a Virtual Museum that combines interactive visualizations and computer vision in order to explore a collection of artworks. With the lessons learned from the previous projects, we engage in the labeling and analysis of medieval illuminated manuscripts and so combine some of the machine learning methods and visualizations that were used for textual data with computer vision methods. Finally, we give reflections on the interdisciplinary projects and the lessons learned, before we discuss existing challenges when working with cultural heritage data from the computer science perspective to outline potential research directions for machine learning and visual analytics of cultural heritage data

    Industrial Symbiosis Recommender Systems

    For a long time, humanity has lived upon the paradigm that the amounts of natural resources are unlimited and that the environment has ample regenerative capacity. However, the notion to shift towards sustainability has resulted in a worldwide adoption of policies addressing resource efficiency and preservation of natural resources.One of the key environmental and economic sustainable operations that is currently promoted and enacted in the European Union policy is Industrial Symbiosis. In industrial symbiosis, firms aim to reduce the total material and energy footprint by circulating traditional secondary production process outputs of firms to become part of an input for the production process of other firms.This thesis directs attention to the design considerations for recommender systems in the highly dynamic domain of industrial symbiosis. Recommender systems are a promising technology that may facilitate in multiple facets of the industrial symbiosis creation as they reduce the complexity of decision making. This typical strength of recommender systems has been responsible for improved sales and a higher return of investments. That provides the prospect for industrial symbiosis recommenders to increase the number of synergistic transactions that reduce the total environmental impact of the process industry in particular