Recommender system for Slovene electronic books

Abstract

V diplomski nalogi je bil izdelan prototip priporočilnega sistema za izposojo elektronskih knjig. Vir podatkov je bila množica knjig v formatu ePub, anonimiziran seznam uporabnikov ter nabor transakcij med knjigami in uporabniki. Knjigam smo določili številske stilometrične lastnosti, ki smo jih z uporabo ocen SPEC in Laplace razvrstili po pomembnosti. Besedilo smo predhodno razdelili na enote, oblikoslovno označili in lematizirali. Izluščenim vektorjem značilk knjig in uporabnikov smo zmanjšali dimenzionalnost. Za potrebe priporočilnih algoritmov smo knjige in uporabnike razvrstili v skupine. Za klasifikacijo smo uporabili enorazredno metodo podpornih vektorjev ter metodo Elkana and Nota. Osnovali smo več priporočilnih algoritmov s filtriranjem, osnovanim na vsebini, ter s kolaborativnim filtriranjem. Razvite pristope smo testirali na dva načina. Rezultati so pokazali, da je mogoče izdelati priporočilni algoritem na podlagi stilometričnih lastnosti knjig. Boljše rezultate smo dosegli s kolaborativnim filtriranjem.The thesis describes a prototype of recommender system for Slovenian e-library. Data includes multitude of books in the ePub format, anonymized list of users and a set of transactions between the books and users. Book content was tokenized, POS-tagged and lemmatized. Data extraction methods were used to extract numerical stylometric features of books. Features were evaluated with SPEC and Laplacian scores. We reduced dimensionality of feature vectors of both books and users and clustered them. We used the method proposed by Elkan and Noto as well as one-class SVM method to classify the books. We constructed several variants of recommender systems based on content and collaborative filtering. The evaluation results show that recommender system using only stylometric features is possible, however, collaborative filtering offers better overall performance

    Similar works