Search CORE

7 research outputs found

Comparing Automated Methods to Detect Explicit Content in Song Lyrics

Author: Cabrio Elena
Corazza Michele
Fell Michael
Gandon Fabien
Publication venue: HAL CCSD
Publication date: 02/09/2019
Field of study

International audienceThe Parental Advisory Label (PAL) is a warning label that is placed on audio recordings inrecognition of profanity or inappropriate references, with the intention of alerting parents of material potentially unsuitable for children.Since 2015, digital providers – such as iTunes,Spotify, Amazon Music and Deezer – also follow PAL guidelines and tag such tracks as “explicit”. Nowadays, such labelling is carried out mainly manually on voluntary basis, with the drawbacks of being time consuming and therefore costly, error prone and partly a subjective task. In this paper, we compare auto-mated methods ranging from dictionary-basedlookup to state-of-the-art deep neural networks to automatically detect explicit contents in English lyrics. We show that more complex models perform only slightly better on this task, and relying on a qualitative analysis of thedata, we discuss the inherent hardness and subjectivity of the task

Crossref

INRIA a CCSD electronic archive server

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

HAL-Rennes 1

Comparing Automated Methods to Detect Explicit Content in Song Lyrics

Author: Cabrio Elena
Corazza Michele
Fell Michael
Gandon Fabien
Publication venue: HAL CCSD
Publication date: 02/09/2019
Field of study

INRIA a CCSD electronic archive server

Protecting Children from Harmful Audio Content: Automated Profanity Detection From English Audio in Songs and Social-Media

Author: Kalyan V Sai Pavan
Murugan T Senthil
Publication venue: Auricle Global Society of Education and Research
Publication date: 10/07/2023
Field of study

A novel approach for the automated detection of profanity in English audio songs using machine learning techniques. One of the primary drawbacks of existing systems is only confined to textual data. The proposed method utilizes a combination of feature extraction techniques and machine learning algorithms to identify profanity in audio songs. Specifically, the approach employs the popular feature extraction techniques of Term frequency–inverse document frequency (TF-IDF), Bidirectional Encoder Representations from Transformers (BERT) and Doc2vec to extract relevant features from the audio songs. TF-IDF is used to capture the frequency and importance of each word in the song, while BERT is utilized to extract contextualized representations of words that can capture more nuanced meanings. To capture the semantic meaning of words in audio songs, also explored the use of the Doc2Vec model, which is a neural network-based approach that can extract relevant features from the audio songs. The study utilizes Open Whisper, an open-source machine learning library, to develop and implement the approach. A dataset of English audio songs was used to evaluate the performance of the proposed method. The results showed that both the TF-IDF and BERT models outperformed the Doc2Vec model in terms of accuracy in identifying profanity in English audio songs. The proposed approach has potential applications in identifying profanity in various forms of audio content, including songs, audio clips, social media, reels, and shorts

International Journal on Recent and Innovation Trends in Computing and Communication

Assessing Fine-Grained Explicitness of Song Lyrics

Author: Marco Rospocher
Samaneh Eksir
Publication venue: 'MDPI AG'
Publication date: 01/01/2023
Field of study

Music plays a crucial role in our lives, with growing consumption and engagement through streaming services and social media platforms. However, caution is needed for children, who may be exposed to explicit content through songs. Initiatives such as the Parental Advisory Label (PAL) and similar labelling from streaming content providers aim to protect children from harmful content. However, so far, the labelling has been limited to tagging the song as explicit (if so), without providing any additional information on the reasons for the explicitness (e.g., strong language, sexual reference). This paper addresses this issue by developing a system capable of detecting explicit song lyrics and assessing the kind of explicit content detected. The novel contributions of the work include (i) a new dataset of 4000 song lyrics annotated with five possible reasons for content explicitness and (ii) experiments with machine learning classifiers to predict explicitness and the reasons for it. The results demonstrated the feasibility of automatically detecting explicit content and the reasons for explicitness in song lyrics. This work is the first to address explicitness at this level of detail and provides a valuable contribution to the music industry, helping to protect children from exposure to inappropriate content

Catalogo dei prodotti della ricerca

Detecting explicit lyrics: a case study in Italian music

Author: Marco Rospocher
Publication venue
Publication date: 01/01/2023
Field of study

Preventing the reproduction of songs whose textual content is offensive or inappropriate for kids is an important issue in the music industry. In this paper, we investigate the problem of assessing whether music lyrics contain content unsuitable for children (a.k.a., explicit content). Previous works that have computationally tackled this problem have dealt with English or Korean songs, comparing the performance of various machine learning approaches. We investigate the automatic detection of explicit lyrics for Italian songs, complementing previous analyses performed on different languages. We assess the performance of many classifiers, including those-not fully exploited so far for this task-leveraging neural language models, i.e., rich language representations built from textual corpora in an unsupervised way, that can be fine-tuned on various natural language processing tasks, including text classification. For the comparison of the different systems, we exploit a novel dataset we contribute, consisting of approximately 34K songs, annotated with labels indicating explicit content. The evaluation shows that, on this dataset, most of the classifiers built on top of neural language models perform substantially better than non-neural approaches. We also provide further analyses, including: a qualitative assessment of the predictions produced by the classifiers, an assessment of the performance of the best performing classifier in a few-shot learning scenario, and the impact of dataset balancing

Catalogo dei prodotti della ricerca

Classification of Explicit Songs Based on Lyrics Using Random Forest Algorithm

Author: I Made Agus Dwi Suarjaya
Luh Kade Devi Dwiyani
Ni Kadek Dwi Rusjayanthi
Publication venue: Asosiasi Perguruan Tinggi Informatika dan Komputer (APTIKOM) Sumsel
Publication date: 01/05/2023
Field of study

This study focuses on the potential negative impact of explicit songs on children and adolescents. Although an explicit song labeling program is currently in place, its coverage is limited to songs released by artists affiliated with the Recording Industry Association of America (RIAA). Consequently, songs falling outside the program's scope remain inadequately labeled. To address this issue, a machine learning model was developed to effectively classify explicit songs and mitigate mislabeling challenges. A comprehensive dataset of song lyrics was collected using web scraping techniques for the purpose of constructing the classification model. The model was trained using the TF-IDF vectorization method and the random forest algorithm. A meticulous comparison of distribution parameters was conducted between the training and testing data sets to determine the optimal model. This superior model achieved a training-testing data distribution ratio of 90:10, with an impressive accuracy of 96.3%, precision of 99.3%, recall of 93.5%, and an f1-score of 96.3%. The classification results revealed that explicit songs accounted for 39.22% of the dataset, and the visual representation highlighted the fluctuating prevalence of explicit songs over time. Additionally, the hip-hop/rap genre exhibited the highest proportion of explicit songs, reaching a staggering 92%

Directory of Open Access Journals

Comparing Automated Methods to Detect Explicit Content in Song Lyrics

Author: Michael Fell Elena Cabrio, Michele Corazza, Fabien Gandon
Publication venue
Publication date: 01/01/2019
Field of study

The Parental Advisory Label (PAL) is a warning label that is placed on audio recordings in recognition of profanity or inappropriate references, with the intention of alerting parents of material potentially unsuitable for children. Since 2015, digital providers\u2013such as iTunes, Spotify, Amazon Music and Deezer\u2013also follow PAL guidelines and tag such tracks as \u201cexplicit\u201d. Nowadays, such labelling is carried out mainly manually on voluntary basis, with the drawbacks of being time consuming and therefore costly, error prone and partly a subjective task. In this paper, we compare automated methods ranging from dictionary-based lookup to state-of-the-art deep neural networks to automatically detect explicit contents in English lyrics. We show that more complex models perform only slightly better on this task, and relying on a qualitative analysis of the data, we discuss the inherent hardness and subjectivity of the task

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna