7,277 research outputs found
Music 2025 : The Music Data Dilemma: issues facing the music industry in improving data management
© Crown Copyright 2019Music 2025ʼ investigates the infrastructure issues around the management of digital data in an increasingly stream driven industry. The findings are the culmination of over 50 interviews with high profile music industry representatives across the sector and reflects key issues as well as areas of consensus and contrasting views. The findings reveal whilst there are great examples of data initiatives across the value chain, there are opportunities to improve efficiency and interoperability
Current Challenges and Visions in Music Recommender Systems Research
Music recommender systems (MRS) have experienced a boom in recent years,
thanks to the emergence and success of online streaming services, which
nowadays make available almost all music in the world at the user's fingertip.
While today's MRS considerably help users to find interesting music in these
huge catalogs, MRS research is still facing substantial challenges. In
particular when it comes to build, incorporate, and evaluate recommendation
strategies that integrate information beyond simple user--item interactions or
content-based descriptors, but dig deep into the very essence of listener
needs, preferences, and intentions, MRS research becomes a big endeavor and
related publications quite sparse.
The purpose of this trends and survey article is twofold. We first identify
and shed light on what we believe are the most pressing challenges MRS research
is facing, from both academic and industry perspectives. We review the state of
the art towards solving these challenges and discuss its limitations. Second,
we detail possible future directions and visions we contemplate for the further
evolution of the field. The article should therefore serve two purposes: giving
the interested reader an overview of current challenges in MRS research and
providing guidance for young researchers by identifying interesting, yet
under-researched, directions in the field
트랜스포머 기반 음악 스트리밍 세션 추천 시스템
학위논문(석사) -- 서울대학교대학원 : 데이터사이언스대학원 데이터사이언스학과, 2022. 8. 신효필.Recommendation systems have grown in popularity over the last few years, with the rise of big data and development of computing resources. Compared to simple rule based methods or content based filtering methods used for recommendation during the early development stage of recommendation systems, recent methodologies try to implement much more complex models. Latent factor models and collaborative filtering methods were developed to find similarities between users and items without actually knowing their characteristics, and gained popularity. Various item domains, mainly movie and retail, have extensively used these recommendation algorithms.
With the development of deep learning architectures, various deep learning based recommendation systems emerged in recent years. While a lot of them were focused on generating the predicted item ratings when given a big data comprised of user ids, item ids, and ratings, there were some efforts to generate next-item recommendations as well. Next-item recommendations receive a session or sequence of actions by some user, and try to predict the next action of a user. NVIDIA recently used Transformers, a deep learning architecture in the field of Natural Language Processing (NLP), to build a session based recommendation system called Transformers4Rec. The system showed state of the art performances for the usual movie and retail domains.
In the music domain, unfortunately, advanced models for session-based recommendations have been explored to a small extent. Therefore, this thesis will attempt to apply Transformer based architectures to session-based recommendation for music streaming, by utilizing a dataset from Spotify and framework from NVIDIA. In this thesis, unique characteristics of music data that validates this research’s purpose are explored. The effectiveness of Transformer architectures on music data are shown with next-item prediction performances on actual user streaming session data, and methods for feature engineering and data preprocessing to ensure the best prediction results are investigated. An empirical analysis that compares various Transformer architectures is also provided, with models further analyzed with additional feature information.최근 트랜스포머 기반 추천시스템들이 다양한 분야에서 높은 성능을 보여왔다. 하지만 음악 스트리밍 분야에는 적용되지 않았었고, 이 논문을 통해 음악 스트리밍 분야에 트랜스포머 기반 세션 추천시스템이 어떤 성능을 보여주는지 탐색해 보았다. 데이터 전처리를 통해 유저들이 음악을 실제로 좋아해서 들었을 법한 세션들만 남기려 노력했고, 세션 기반 추천시스템에 맞게 데이터를 정제했다. 음악과 관련된 다양한 정보들도 모델 훈련에 반영하기 위해 카테고리 형태로 바꿔주었고, 훈련 자체는 세션 기반 추천시스템에서 자주 쓰이는 점진적 훈련법을 활용했다. 최종 실험 결과에서는 데이터의 비정제성과 비밀집성을 극복하고 비슷한 데이터셋과 경쟁력을 갖추는 성과를 보여주었다. 이 연구를 통해 음악 스트리밍 세션 추천시스템에 트랜스포머 기반 모델이라는 새로운 가능성을 보여 주었고, 추후 연구자들이 참고할 수 있는 시작점을 제공하였다.1 Introduction 1
1.1 Research Topic 1
1.2 Purpose of Research 2
1.3 Need for Research 3
1.3.1 Recent Trends 3
1.3.2 Dataset Characteristics 4
2 Related Works 9
2.1 Overview of NLP and RecSys 9
2.2 Past Works on Incorporating Features 12
3 Methodology 13
3.1 Music Streaming Sessions Dataset 13
3.2 Music Recommendation Model 14
3.2.1 NVTabular 15
3.2.2 Transformers4Rec 15
3.3 Feature Embeddings 16
3.4 Session Information 17
3.5 Transformer Architectures 18
3.6 Metrics 19
4 Experiments 21
4.1 Data Preprocessing 21
4.2 Embedding 23
4.2.1 No features 23
4.2.2 Session features 23
4.2.3 Song features 25
4.3 Hyperparameters 27
4.4 Training 28
4.4.1 Problem Statement 28
4.4.2 Pipeline 28
4.4.3 Incremental Training, Evaluation 29
4.5 Results 30
4.5.1 Simple item IDs 30
4.5.2 Item IDs + Session Information 31
4.5.3 Item IDs + Session Information + Track Metadata 32
5 Conclusion and Future Works 34
Bibliography 36
초 록 41석
Analysis of the music industry today
Treball Final de Grau en Administració d'Empreses. Codi: AE1049. Curs 2020/2021The music industry has undergone many changes in just 20 years, going from the physical
format such as vinyl to streaming platforms such as Spotify.
This industry is characterized by being very dynamic and in constant movement and has
gone through several key revolutions such as: the arrival of the Internet that caused the
consumption of pirated music through platforms like Napster or the peer-to-peer network, the
birth of iTunes where it was the first to make an easy and simple process of selling music in
digital format, and finally the consumption of music through streaming platforms.
Along the way there have been several changes in business models, changes in record
labels, new roles and agents in the value chain, new ways of monetizing music, new habits
of consuming music.... And all these challenges have meant that the music industry has had
to adapt by innovating until reaching the current music industry as we know it today.
In this project we will describe the music industry from live music to recorded music,
explaining the changes that the industry has had to face and its main characteristics. We will
also go into more detail about recorded music and its evolution in Spain and we will make
clear both the processes that form the value chain and the agents that are involved
Explainability in Music Recommender Systems
The most common way to listen to recorded music nowadays is via streaming
platforms which provide access to tens of millions of tracks. To assist users
in effectively browsing these large catalogs, the integration of Music
Recommender Systems (MRSs) has become essential. Current real-world MRSs are
often quite complex and optimized for recommendation accuracy. They combine
several building blocks based on collaborative filtering and content-based
recommendation. This complexity can hinder the ability to explain
recommendations to end users, which is particularly important for
recommendations perceived as unexpected or inappropriate. While pure
recommendation performance often correlates with user satisfaction,
explainability has a positive impact on other factors such as trust and
forgiveness, which are ultimately essential to maintain user loyalty.
In this article, we discuss how explainability can be addressed in the
context of MRSs. We provide perspectives on how explainability could improve
music recommendation algorithms and enhance user experience. First, we review
common dimensions and goals of recommenders' explainability and in general of
eXplainable Artificial Intelligence (XAI), and elaborate on the extent to which
these apply -- or need to be adapted -- to the specific characteristics of
music consumption and recommendation. Then, we show how explainability
components can be integrated within a MRS and in what form explanations can be
provided. Since the evaluation of explanation quality is decoupled from pure
accuracy-based evaluation criteria, we also discuss requirements and strategies
for evaluating explanations of music recommendations. Finally, we describe the
current challenges for introducing explainability within a large-scale
industrial music recommender system and provide research perspectives.Comment: To appear in AI Magazine, Special Topic on Recommender Systems 202
Deep Learning based Recommender System: A Survey and New Perspectives
With the ever-growing volume of online information, recommender systems have
been an effective strategy to overcome such information overload. The utility
of recommender systems cannot be overstated, given its widespread adoption in
many web applications, along with its potential impact to ameliorate many
problems related to over-choice. In recent years, deep learning has garnered
considerable interest in many research fields such as computer vision and
natural language processing, owing not only to stellar performance but also the
attractive property of learning feature representations from scratch. The
influence of deep learning is also pervasive, recently demonstrating its
effectiveness when applied to information retrieval and recommender systems
research. Evidently, the field of deep learning in recommender system is
flourishing. This article aims to provide a comprehensive review of recent
research efforts on deep learning based recommender systems. More concretely,
we provide and devise a taxonomy of deep learning based recommendation models,
along with providing a comprehensive summary of the state-of-the-art. Finally,
we expand on current trends and provide new perspectives pertaining to this new
exciting development of the field.Comment: The paper has been accepted by ACM Computing Surveys.
https://doi.acm.org/10.1145/328502
Reconocimiento de emociones de la voz aplicado sobre una arquitectura Cloud serverless
Trabajo de Fin de Grado en Ingeniería Informática, Facultad de Informática UCM, Departamento de Arquitectura de Computadores y Automática, Curso 2021-2022.
The source code of this project can be found both in GitHub and Google Drive:
https://github.com/RobertFarzan/Speech-Emotion-Recognition-system
https://drive.google.com/file/d/1XobYLxcARE73EFwZ3VUr6Po7vum42ajh/view?usp=sharingThe purpose of this final degree thesis Applied speech emotion recognition on a serverless Cloud architecture is to do research into emotion recognition on human voice through several techniques including audio signal processing and deep learning technologies to classify a certain emotion detected on a piece of audio, as well as finding ways to deploy this functionality on Cloud (serverless). From there we can get a brief implementation of a streaming nearly real-time system in which an end user could record audio and retrieve responses of the emotions continuously.
The idea intends to be a "emotion tracking system" that couples the technologies mentioned above along with a simple end-user GUI app that anyone could use purposefully to track their own voices in different situations - during a call, a meeting etc. - and get a brief summary visualization of their emotions across time with just a quick glance. This prototype seems to be one of the first software products of its kind, as there is a lot of literature on the Internet on Speech Emotion Recognition and tools for software engineers to facilitate this task but an easy final user product or solution for real-time SER appears to be non-existent. As a short summary of the project road map and the technologies involved, the process is as follows: development of a CNN model on Tensorflow 2.0 (with Python) to get emotion labels as output from a short chunk of audio as input; deployment of a Python script that uses this previously mentioned CNN model to return the emotion predictions in AWS Lambda (the Amazon service for serverless Cloud); and finally the design of a Python app with GUI integrated to send requests to the Lambda service and retrieve the responses with emotion predictions to present them with beautiful visualizations.El propósito de este TFG Reconocimiento de emociones de la voz aplicado sobre una arquitectura Clous serverless es investigar el reconocimiento de emociones en la voz humana usando diversas técnicas, entre las que se incluye el procesamiento de señal y deep learning para clasificar una cierta emoción en una pieza de audio, así como encontrar maneras de desplegar esta funcionalidad en el Cloud (serverless). A partir de estos pasos se podrá obtener una implementación de un sistema en streaming en tiempo cuasi real, en el que un usuario pueda grabarse a sí mismo y recibir respuestas cronológicas sobre su estado de ánimo continuamente. Esta idea trata de ser un "sistema monitor de emociones", que envuelva las tecnologías mencionadas arriba junto con una simple interfaz gráfica de usuario que cualquiera pueda usar para monitorizar intencionadamente su voz en diferentes situaciones - durante una llamada, una reunión etc. - y obtener una breve visualización de sus emociones a lo largo del tiempo en un simple vistazo. Este prototipo apunta a ser una de las primeras soluciones software de este tipo, ya que a pesar de haber mucha literatura en Internet acerca de Speech Emotion Recognition y herramientas para desarrolladores en esta tarea, parece no haber productos o soluciones de SER en tiempo real para usuarios. Como breve resumen de la hoja de ruta del proyecto y las tecnologías involucradas, el proceso es el siguiente: desarrollo de una red neuronal convolucional en TensorFlow 2.0 (con Python) para predecir emociones a partir de una pieza de audio como input; despliegue de un script de Python que use la red neuronal para devolver predicciones en AWS Lambda (el servicio de Amazon para serverless); y finalmente el diseño de una aplicación final para usuario en Python que incluya una interfaz gráfica que se conecte con los servicios de Lambda y devuelva respuestas con las predicciones y haga visualizaciones a partir de ellas.Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu
Using HeidiSongs Music as an Instructional Tool in the Elementary School Classroom: A Case Study
The purpose of this qualitative multiple case study is to understand how teachers use HeidiSongs music as an instructional tool in the elementary school classroom. HeidiSongs uses multisensory structured language education to teach by engaging multiple senses simultaneously to increase retention. The theories guiding this study include Gardner’s theory of multiple intelligences, which involves kinesthetic intelligences among other types of intelligences, and Krashen’s theory of second language acquisition. HeidiSongs uses both musical and kinesthetic activities to enhance literacy. The central research question focused on how teachers use HeidiSongs music as an instructional tool in the elementary school classroom. The sub-questions explored the different instructional settings where this literacy instruction could take place: whole group, small group, and individual instruction. Eleven participants were current or former users of HeidiSongs music, and data was collected virtually through documentation, individual interviews, and a single focus group interview. Data was analyzed through cross-case synthesis, searching for patterns, forming naturalistic generalizations, and explanation building. Findings indicated HeidiSongs is most applicable in the whole group setting in the elementary school classroom, with teachers and students using recall of the songs in small group and individual worktime to enhance memory. Teachers enjoyed the combination of multisensory music and movement in HeidiSongs and reported an overall positive effect on student engagement, even in diverse populations. Further research on instructional data distinguishing between audio, visual, or animated versions of the songs could help teachers determine which version of the songs is most ideal for each classroom
- …