University of Zagreb. Faculty of Electrical Engineering and Computing.
Abstract
U ovom radu obrađen je problem stvaranja tekstova pjesama s određenom strukturom. Zbog nedostupnosti skupa tekstova pjesama s označenom strukturom definiran je jednostavan algoritam za označavanje strukture teksta pjesme koja je prethodno segmentirana na paragrafe. 55000+ Song Lyrics skup podataka označen je pomoću algoritma kako bi se mogao koristiti u svrhu učenja i vrednovanja modela. Korišteni su sljedeći modeli strojnog učenja: N-gram jezični model, RNN i LSTM neuronske mreže te generativna suparnička mreža SeqGAN. Modeli su vrednovani pomoću tri mjere: zbunjenosti, mjere kvalitete strukture teksta pjesme koja je definirana u ovom radu te Levenshteinove udaljenosti kako bi se mjerila udaljenost između određenih paragrafa. Rad zaključujemo pregledom ostvarenih rezultata navedenih modela nad označenim skupom tekstova pjesama. RNN i LSTM neuronske mreže ostvarile su najbolje rezultate, zatim mreža SeqGAN, a najlošije N-gram jezični model.In this thesis we discuss the problem of generating song lyrics with a certain structure. Due to unavailability of song lyrics dataset with marked lyrics structure, a simple algorithm is defined to mark the structure of the lyrics of a song that has been previously segmented into paragraphs. The algorithm is then used to mark lyrics structure on 55000+ Song Lyrics dataset so the dataset can be used for learning and evaluating models. Following machine learning models were used: N-gram language model, RNN and LSTM recurrent networks and generative adversarial network SeqGAN. The models where evaluated using three measures: perplexity, measure of song lyrics structure defined in this thesis and Levenshtein's distance to measure the distance between certain paragraphs. We conclude the paper by reviewing the achieved results of the mentioned models over the marked lyrics dataset. RNN and LSTM neural networks achieved the best results, followed by the SeqGAN network, and the worst by the N-gram language model