8 research outputs found

    A Hybrid Music Recommendation System Based On Different Features Of The Music And Users

    Get PDF
    Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2007Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2007Günümüzde müzik insanların hayatının önemli bir parçası haline gelmiştir. Müzik çalarlar giderek yaygınlaşmaktadır ve müzik tabanlı uygulamalar içeren birçok cihaz vardır. Cep telefonu bu cihazlardan birisidir. Arayan kişiye ulaşılıncaya kadar zil sesi dinlemek yerine seçilmiş bir şarkıyı dinlemek, çağrı anında telefonun zil sesi yerine müzik parçaları ile çalması, her geçen gün daha fazla kişi tarafından tercih edilen uygulamalardan sadece ikisidir. Müziğin bu kadar yaygın olduğu bir ortamda müzik tercihleri de önem kazanmaktadır. Günümüzde müzik tavsiye sistemleri kişilerin geçmiş tercihlerine bakarak ve onlara ait başka bilgileri kullanarak müzik tavsiyesinde bulunabilecek metodlar üzerinde çalışmaktadırlar. Gerek ticari, gerek akademik anlamda kullanılan birçok müzik tavsiye sistemine İnternet üzerinden de ulaşılabilmektedir. Bu tezde, Zil-Dönüş-Tonu Sistemi ile ya da kişilerin bir miktar şarkı içinden çeşitli şarkılar seçtikleri herhangi bir system ile birlikte çalışabilecek bir müzik tavsiye sistemi üzerinde çalıştık. Bu sistem müzik parçalarını tempo, tını gibi temel özelliklerle temsil eder ve onları bu gösterimdeki uzaklık metriğine gore gruplar. Bir kullanıcıya geçmişte dinlediği şarkılara bakarak bundan sonra dinlemek isteyebileceği şarkıları tavsiye etmeye çalışır. Bunu yaparken, benzer zaman dilimleri içerisinde başka insanların dinledikleri şarkıları dikkate alır. Müzik parçaları arasındaki benzerliğe de parçaların benzerliği ve onların yorumcularının benzerliğine göre karar verir. Bunları dikkate alarak kullanıcıları geçmişteki seçimlerinin benzerliğine göre gruplar. Son olarak bu şarkı ve kullanıcı demetlerini kullanarak kişiye seçmesi muhtemel olan müzik parçalarını tavsiye etmeye çalışır. Bu çalışmada müzik parçalarını tavsiye etmek için 6 adet değişik metod kullanılmıştır. a) İlk önce, kullanıcıların dinledikleri müzik parçaları arasındaki uzaklıklar hesaplanır. Sonra dinlenilen müzik parçalarına en küçük ortalam uzaklıkta olan müzik parçaları tavsiye edilir. (Euclid/Cosine Distance Based Music recommendation) b) Bir kullanıcının dinlediği müzik parçalarının özellikleri, entropi ve popülarite kullanılarak müzik parçaları tavsiye edilir. (Content Based Recommendation Using Entropy and Popularity Metrics) c) Sistemdeki bütün müzik parçaları yakın zaman diliminde dinlenilenler ve uzak zaman diliminde dinlenilenler diye 2 önemli gruba ayrılırlar ve bu gruplardan belli sayılarda şarkı seçilerek müzik parçaları tavsiye edilir. (STA) d) Sistemdeki bütün müzik parçaları değişik niteliklerine (tını, tempo, perdesel özellikler) göre demetlenir. Her kullanıcının değişik niteliklere verdiği önem, kullanıcının daha önceden dinlediği parçalara göre belirlenir ve her niteliğe ait öbekten farklı sayıda müzik parçası tavsiye eden bir yöntem uygulanır. (Simple Adaptive Method, Adaptive Recommendation Method) e) Kullanıcılar benzer tercihlerde bulunan diğer kullanıcılarla demetlenir ve bu duruma göre popülarite, entropi gibi metrikler de kullanılarak müzik parçası tavsiye edilir. (Learning Approach on an Adaptive Music Recommendation System with Popularity Data and Using User Grouping) Bütün bu yöntemleri destekleyerek çalışan müzik tavsiye sistemine bir kullanıcı arayüzü de yazılmıştır. Bu çalışmanın testlerinde bir cep telefonu operatörü için çeşitli müzik içerikli uygulamalar üreten bir firmanın veri kümesi kullanılmıştır. Aynı veri kümesi üzerinde geliştirilen farklı algoritmalar denenmiş ve performansları kıyaslanmıştır. Yapılan test sonuçlarına göre, sadece müzik parçalarının benzerliğinin kullanılması ile %2-5 oranında başarılı öneriler yapılabiliyor iken, kullanıcının önem verdiği müzik özellikleri değerlendirilerek %5-%10, popülarite ve benzer müzik zevki olan kullanıcıların hesaba katılması ile %75 başarı oranı ile öneride bulunma imkanı vardır.Today, music has become an important part of the people’s life. Music players are widely used and there are many tools with music content integrated in some of their applications. Cellular phone is one such tool. When calling someone, hearing the Colored-Ring–Back–Tone which is a selected song, instead of the Ring-Back-Tone or hearing a song when the phone rings instead of the classical ring tone are just two of the applications which are chosen by more people. When music is widely used, music choices become quite important. Music recommendation systems study methods of recommending music to users based on their past music selections and other information about the users. There is academic and commercial music recommendation system available on the internet. In this thesis, we study a music recommendation system that can be used within the Ring-Back-Tone system or any system where a user chooses some songs among a number of choices. Our system represents musical pieces with basic audio features such as beat and timbre and groups them according to a distance metric in this representation. By observing the past choices of a user, it tries to recommend songs that could be chosen by that user. While doing this, it takes into account the songs listened by other users in similar time periods. It uses the similarity among music pieces and their singers to decide on the similarity between music pieces. By using these similarities, it produces groups (clusters) of people who made similar choices in the past. Finally, by using song and user clusters, it tries to recommend audio files that are likely to be selected by a user. We study 6 different methods to recommend music pieces: a) First, distances between music pieces listened by users are calculated. Then the music pieces whose average distance to the songs already listened by the user are recommended. (Euclid/Cosine Distance Based Music recommendation) b) Musical pieces are recommended by using the features of the music pieces listened by the users, entropy and popularity. (Content Based Recommendation Using Entropy and Popularity Metrics) c) All the music pieces in the system are divided into two important groups; the ones are listened in the short period and the ones listened in the long term period. Musical pieces are recommended by selecting a specified number of music pieces from these two groups. (STA) d) All the music pieces in the system are clustered based on different features (timbre, beat, and pitch). The importance of the features is specified based on the musical pieces listened by the users in the past, and different number of music pieces from each cluster of each feature are recommended. (Simple Adaptive Method, Adaptive Recommendation Method) e) Users are clustered with the other users who have similar preferences and musical pieces are recommended via using some metrics such as popularity, entropy. (Learning Approach on an Adaptive Music recommendation System with Popularity Data and Using User Grouping) A graphical user interface is created for the music recommendation system which supports all the above mentioned methods. In this study, a user session dataset provided by a company that produces musical content applications for a cellular phone company is used. Different algorithms are used with this dataset, and their performances are compared. According to test results; while using only the similarity of music pieces it is possible to recommend with %2-5 success rate, by using the features important to a particular user, it is possible to recommend with %5-10 success rate. By using popularity and user clustering the recommendation success ratio increases to %75.Yüksek LisansM.Sc

    From past to present: spam detection and identifying opinion leaders in social networks

    Get PDF
    On microblogging sites, which are gaining more and more users every day, a wide range of ideas are quickly emerging, spreading, and creating interactive environments. In some cases, in Turkey as well as in the rest of the world, it was noticed that events were published on microblogging sites before appearing in visual, audio and printed news sources. Thanks to the rapid flow of information in social networks, it can reach millions of people in seconds. In this context, social media can be seen as one of the most important sources of information affecting public opinion. Since the information in social networks became accessible, research started to be conducted using the information on the social networks. While the studies about spam detection and identification of opinion leaders gained popularity, surveys about these topics began to be published. This study also shows the importance of spam detection and identification of opinion leaders in social networks. It is seen that the data collected from social platforms, especially in recent years, has sourced many state-of-art applications. There are independent surveys that focus on filtering the spam content and detecting influencers on social networks. This survey analyzes both spam detection studies and opinion leader identification and categorizes these studies by their methodologies. As far as we know there is no survey that contains approaches for both spam detection and opinion leader identification in social networks. This survey contains an overview of the past and recent advances in both spam detection and opinion leader identification studies in social networks. Furthermore, readers of this survey have the opportunity of understanding general aspects of different studies about spam detection and opinion leader identification while observing key points and comparisons of these studies.This work is supported in part by the Scientific and Technological Research Council of Turkey (TUBITAK) through grant number 118E315 and grant number 120E187. Points of view in this document are those of the authors and do not necessarily represent the official position or policies of TUBITAK.Publisher's VersionEmerging Sources Citation Index (ESCI)Q4WOS:00080858480001

    A corpus-based semantic kernel for text classification by using meaning values of terms

    Get PDF
    Text categorization plays a crucial role in both academic and commercial platforms due to the growing demand for automatic organization of documents. Kernel-based classification algorithms such as Support Vector Machines (SVM) have become highly popular in the task of text mining. This is mainly due to their relatively high classification accuracy on several application domains as well as their ability to handle high dimensional and sparse data which is the prohibitive characteristics of textual data representation. Recently, there is an increased interest in the exploitation of background knowledge such as ontologies and corpus-based statistical knowledge in text categorization. It has been shown that, by replacing the standard kernel functions such as linear kernel with customized kernel functions which take advantage of this background knowledge, it is possible to increase the performance of SVM in the text classification domain. Based on this, we propose a novel semantic smoothing kernel for SVM. The suggested approach is based on a meaning measure, which calculates the meaningfulness of the terms in the context of classes. The documents vectors are smoothed based on these meaning values of the terms in the context of classes. Since we efficiently make use of the class information in the smoothing process, it can be considered a supervised smoothing kernel. The meaning measure is based on the Helmholtz principle from Gestalt theory and has previously been applied to several text mining applications such as document summarization and feature extraction. However, to the best of our knowledge, ours is the first study to use meaning measure in a supervised setting to build a semantic kernel for SVM. We evaluated the proposed approach by conducting a large number of experiments on well-known textual datasets and present results with respect to different experimental conditions. We compare our results with traditional kernels used in SVM such as linear kernel as well as with several corpus-based semantic kernels. Our results show that classification performance of the proposed approach outperforms other kernels

    Metinsel veri madenciliği için anlamsal yarı-eğitimli algoritmaların geliştirilmesi

    Get PDF
    Ganiz, Murat Can (Dogus Author) -- Zeynep Hilal, Kilimci (Dogus Author)Metinsel veri madenciliği büyük miktarlardaki metinsel verilerden faydalı bilgilerin çıkarılması veya bunların otomatik olarak organize edilmesini içerir. Büyük miktarlarda metinsel belgenin otomatik olarak organize edilmesinde metin sınıflandırma algoritmaları önemli bir rol oynar. Bu alanda kullanılan sınıflandırma algoritmaları “eğitimli” (supervised), kümeleme algoritmaları ise “eğitimsiz” (unsupervised) olarak adlandırılırlar. Bunların ortasında yer alan “yarı-eğitimli” (semisupervised) algoritmalar ise etiketli verinin yanı sıra bol miktarda bulunan etiketsiz veriden faydalanarak sınıflandırma başarımını arttırabilirler. Metinsel veri madenciliği algoritmalarında geleneksel olarak kelime sepeti (bag-of-words) olarak tabir edilen model kullanılmaktadır. Kelime sepeti modeli metinde geçen kelimeleri bulundukları yerden ve birbirinden bağımsız olarak değerlendirir. Ayrıca geleneksel algoritmalardaki bir başka varsayım ise metinlerin birbirinden bağımsız ve eşit olarak dağıldıklarıdır. Sonuç olarak bu yaklaşım tarzı kelimelerin ve metinlerin birbirleri arasındaki anlamsal ilişkileri göz ardı etmektedir. Metinsel veri madenciliği alanında son yıllarda özellikle kelimeler arasındaki anlamsal ilişkilerden faydalanan çalışmalara ilgi artmaktadır. Anlamsal bilginin kullanılması geleneksel makine öğrenmesi algoritmalarının başarımını özellikle eldeki verinin az, seyrek veya gürültülü olduğu durumlarda arttırmaktadır. Gerçek hayat uygulamalarında algoritmaların eğitim için kullanacağı veri genellikle sınırlı ve gürültülüdür. Bu yüzden anlamsal bilgiyi kullanabilen algoritmalar gerçek hayat problemlerinde büyük yarar sağlama potansiyeline sahiptir. Bu projede, ilk aşamada eğitimli metinsel veri madenciliği için anlamsal algoritmalar geliştirdik. Bu anlamsal algoritmalar metin sınıflandırma ve özellik seçimi alanlarında performans artışı sağlamaktadır. Projenin ikinci aşamasında ise bu yöntemlerden yola çıkarak etiketli ve etiketsiz verileri kullanan yarı-eğitimli metin sınıflandırma algoritmaları geliştirme faaliyetleri yürüttük. Proje süresince 5 yüksek lisans tezi tamamlanmış, 1 Doktora tezi tez savunma aşamasına gelmiş, 2 adet SCI dergi makalesi yayınlanmış, 8 adet bildiri ulusal ve uluslararası konferanslar ve sempozyumlarda sunulmuş ve yayınlanmıştır. Hazırlanan 2 adet dergi makalesi ise dergilere gönderilmiş ve değerlendirme aşamasındadır. Projenin son aşamasındaki bulgularımızı içeren 1 adet konferans bildirisi 2 adet dergi makalesi de hazırlık aşamasındadır. Ayrıca proje ile ilgili olarak üniversite çıkışlı bir girişim şirketi (spin-off) kurulmuştur.Textual data mining is the process of extracting useful knowledge from large amount of textual data. In this field, classification algorithms are called supervised and clustering algorithms are called unsupervised algorithms. Between these there are semi supervised algorithms which can improve the accuracy of the classification by making use of the unlabeled data. Traditionally, bag-of-words model is being used in textual data mining algorithms. Bag-of-words model assumes that words independent from each other and their positions in the text. Furthermore, traditional algorithms assume that texts are independent and identically distributed. As a result this approach ignores the semantic relationship between words and between texts. There has been a recent interest in works that make use of the semantic relationships especially between the words. Use of semantic knowledge increase the performance of the systems especially when there are few, sparse and noisy data. In fact, there are very sparse and noisy data in real world settings. As a result, algorithms that can make use of the semantic knowledge have a great potential to increase the performance. In this project, in the first phase, we developed semantic algorithms and methods for supervised classification. These semantic algorithms provide performance improvements on text classification and feature selection. On the second phase of the project we have pursued development activities for semi-supervised classification algorithms that make use of labeled and unlabeled data, based on the methods developed in the first phase. During the project, 5 master’s thesis is completed, the PhD student is advanced to the dissertation defense stage, two articles are published on SCI indexed journals, 8 proceedings are presented in national and international conferences. Two journal articles are sent and 1 conference proceeding and two journal articles are in preparation, which include the findings of the last phase of the project. Furthermore, a spin-off technology company is founded related to the project.TÜBİTA

    A novel semantic smoothing kernel for text classification with class-based weighting

    No full text
    Altınel, Berna (Dogus Author), Diri, Banu (Dogus Author), Ganiz, Murat Can (Dogus Author) -- #articleinpress#Altınel, Berna (Dogus Author), Diri, Banu (Dogus Author), Ganiz, Murat Can (Dogus Author)In this study, we propose a novel methodology to build a semantic smoothing kernel to use with Support Vector Machines (SVM) for text classification. The suggested approach is based on two key concepts; class-based term weighting and changing the orthogonality of vector space. A class-based term weighting methodology is used for transformation of documents from the original space to the feature space. This class-based weighting basically groups terms based on their importance for each class and consequently smooths the representation of documents. This is accomplished by changing the orthogonality of the Vector Space Model (VSM) with introducing class-based dependencies between terms. As a result, on the extreme case, two documents can be seen as similar even if they do not share any terms but their terms are similarly weighted for a particular class. The resulting semantic kernel can directly make use of class information in extracting semantic information between terms, therefore it can be considered as a supervised kernel. For our experimental evaluation, we analyze the performance of the suggested kernel with a large number of experiments on benchmark textual datasets and present results with respect to varying experimental conditions. To the best of our knowledge, this is the first study to use class-based term weighting in order to build a supervised semantic kernel for SVM. We compare our results with kernels that are commonly used in SVM such as linear kernel, polynomial kernel, Radial Basis Function (RBF) kernel and with several corpus-based semantic kernels. According to our experimental results the proposed method favorably improves classification accuracy over linear kernel and several corpus-based semantic kernels in terms of both accuracy and speed

    A simple semantic kernel approach for SVM using higher-order paths

    No full text
    Ganiz, Murat Can (Dogus Author) -- Conference full title: 2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA 2014) : Alberobello, Italy, 23-25 June 2014.The bag of words (BOW) representation of documents is very common in text classification systems. However, the BOW approach ignores the position of the words in the document and more importantly, the semantic relations between the words. In this study, we present a simple semantic kernel for Support Vector Machines (SVM) algorithm. This kernel uses higher-order relations between terms in order to incorporate semantic information into the SVM. This is an easy to implement algorithm which forms a basis for future improvements. We perform a serious of experiments on different well known textual datasets. Experiment results show that classification performance improves over the traditional kernels used in SVM such as linear kernel which is commonly used in text classification

    A novel higher-order semantic kernel for text classification

    No full text
    Ganiz, Murat Can (Dogus Author) -- Conference full title: 2013 10th International Conference on Electronics, Computer and Computation, ICECCO 2013; Ankara; Turkey; 7 November 2013 through 8 November 2013.In conventional text categorization algorithms, documents are symbolized as “bag of words” (BOW) with the fact that documents are supposed to be independent from each other. While this approach simplifies the models, it ignores the semantic information between terms of each document. In this study, we develop a novel method to measure semantic similarity based on higher-order dependencies between documents. We propose a kernel for Support Vector Machines (SVM) algorithm using these dependencies which is called Higher-Order Semantic Kernel. With the aim of presenting comparative performance of Higher-Order Semantic Kernel we performed many experiments not only with our algorithm but also with existing traditional first-order kernels such as Polynomial Kernel, Radial Basis Function Kernel, and Linear Kernel. The experiments using Higher-Order Semantic Kernel on several well-known datasets show that classification performance improves significantly over the first-order methods
    corecore