9,905 research outputs found

    Polychotomiser for case-based reasoning beyond the traditional Bayesian classification approach

    Get PDF
    This work implements an enhanced Bayesian classifier with better performance as compared to the ordinary naïve Bayes classifier when used with domains and datasets of varying characteristics. Text classification is an active and on-going research field of Artificial Intelligence (AI). Text classification is defined as the task of learning methods for categorising collections of electronic text documents into their annotated classes, based on its contents. An increasing number of statistical approaches have been developed for text classification, including k-nearest neighbor classification, naïve Bayes classification, decision tree, rules induction, and the algorithm implementing the structural risk minimisation theory called the support vector machine. Among the approaches used in these applications, naïve Bayes classifiers have been widely used because of its simplicity. However this generative method has been reported to be less accurate than the discriminative methods such as SVM. Some researches have proven that the naïve Bayes classifier performs surprisingly well in many other domains with certain specialised characteristics. The main aim of this work is to quantify the weakness of traditional naïve Bayes classification and introduce an enhance Bayesian classification approach with additional innovative techniques to perform better than the traditional naïve Bayes classifier. Our research goal is to develop an enhanced Bayesian probabilistic classifier by introducing different tournament structures ranking algorithms along with a high relevance keywords extraction facility and an accurately calculated weighting factors facility. These were done to improve the performance of the classification tasks for specific datasets with different characteristics. Other researches have used general datasets, such as Reuters-21578 and 20_newsgroups to validate the performance of their classifiers. Our approach is easily adapted to datasets with different characteristics in terms of the degree of similarity between classes, multi-categorised documents, and different dataset organisations. As previously mentioned we introduce several techniques such as tournament structures ranking algorithms, higher relevance keyword extraction, and automatically computed document dependent (ACDD) weighting factors. Each technique has unique response while been implemented in datasets with different characteristics but has shown to give outstanding performance in most cases. We have successfully optimised our techniques for individual datasets with different characteristics based on our experimental results

    Analisis sentimen komentar youtube terhadap Anies Baswedan sebagai bakal calon presiden 2024 menggunakan metode naive bayes classifier

    Get PDF
    One of the figures as a presidential candidate is Anies Baswedan, the former governor of DKI Jakarta who received many awards and has an effective work program policy for problems in the DKI Jakarta area. Many comments about Anies Baswedan as a 2024 presidential candidate are found on YouTube social media. Youtube facilitates users to provide comments in response to videos which can be used as sentiment analysis information to find out positive comments and negative comments. The algorithm used in this research is the naïve bayes classifier. There are five main processes in this research, namely data collection, text preprocessing, word weighting (TF-IDF), classification (Naïve Bayes Classifier) and testing. From 1009 comment data on Indonesian-language youtube related to the Anies Baswedan video as a 2024 presidential candidate. Based on the analysis results, there are 610 positive comments and 399 negative comments. The accuracy result using the naïve bayes classifier algorithm is 78% which is obtained by using a comparison of 90% training data and 10% test data.Suatu tokoh sebagai bakal calon presiden adalah Anies Baswedan mantan gubernur DKI Jakarta yang menerima banyak penghargaan dan memiliki kebijakan program kerja yang efektif dalam permasalahan di wilayah DKI Jakarta. Komentar mengenai anies baswedan sebagai bakal calon presiden 2024 banyak dijumpai pada media sosial youtube. Youtube  menfasilitasi pengguna untuk memberikan komentar dalam menanggapi video yang dapat dijadikan sebuah informasi analisis sentimen untuk mengetahui komentar positif serta komentar negatif. Algorima yang dipakai pada penelitian ini ialah naïve bayes classifier. Terdapat lima proses utama pada penelitian ini, yaitu penghimpunan data, pembobotan kata (TF-IDF), text preprocessing, klasifikasi (naïve bayes classifier) dan pengujian. Dari 1009 data komentar di youtube berbahasa Indonsia terkait video Anies Baswedan sebagai bakal calon presiden 2024. Berdasarkan hasil analaisis, terdapat 610 komentar positif serta 399 negatif. Hasil akurasi menggunakan algoritma naïve bayes classifier sebesar 78% yang di dapat dengan menggunakan perbandingan 10% data uji serta 90% data latih

    Klasifikasi Berita Olahraga Menggunakan Algoritma Naïve Bayes Classifier (NBC)

    Get PDF
    Berita online merupakan salah satu informasi yang banyak dicari di era informasi. Karena banyaknya berita yang muncul akan lebih mudah jika berita-berita tersebut sudah di klasifikasikan. Salah satunya terkait berita tentang olahraga. Oleh karena itu dalam Tugas akhir ini akan dirancang system klasifikasi yang dapat mengkelompokan artikel berita Olahraga menggukanan mode text mining dan naïve bayes classifier (NBC). Text mining merupakan penerapan konsep dan teknik data mining untuk mencari informasi dalam teks. Naïve bayes classifier adalah metoda klasifikasi menggunakan metode probabilitas. Naïve bayes classifier dipilih kerana data dan karekteristisknya paling sesuai. Pada pengklasifikasian berita olahraga akan di buat menjadi 8 kanal yaitu Sepak Bola, Basket, Tenis, Bulu Tangkis, Moto GP, F1, Voli, Golf. Pada penelitian kali ini juga membuktikan bahwa klasifikasi berita olahraga menggunakan lgoritma Naïve Bayes Classifier memiliki performansi yang sangat baik dengan tinggkat akurasi mencapai 99,82%

    Implementasi Algoritma Naïve Bayes Classifier Berbasis Particle Swarm Optimization (PSO) Untuk Klasifikasi Konten Berita Digital Bahasa Indonesia

    Get PDF
    Abstract - A lot of important information is stored in the document word, and have each topic, then text classification is one solution to manage the information that is growing rapidly and the abundant, and already many agencies engaged in the distribution of information or news already started using web-based systems to deliver up to date news. However, the news divide into these categories for now still dilakukkan manually, so it is very troublesome and can also take a long time. In this study will be used merging feature selection methods, namely Particle Swarm Optimization based Naïve Bayes classifier to look at the accuracy of the method. This research has resulted in the form of text classification category of gossip, culinary, and travel from digital news content. Measurement is based on Naïve Bayes classifier accuracy before and after the addition of feature selection methods. The evaluation was done using a 10 fold cross validation. While the measurement accuracy is measured by confusion matrix. The results of this study obtained accuracy by using Naïve Bayes classifier algorithm method amounted to 94.17%.Keywords: Particle Swarm Optimization, Naïve Bayes classifier, classification News Content, Text Mining Abstrak - Banyak informasi penting yang tersimpan didalam dokumen berita, dan mempunyai topik masing-masing, kemudian klasifikasi teks merupakan salah satu solusi untuk mengelola informasi yang berkembang pesat dan melimpah tersebut, serta sudah banyak juga instansi yang bergerak dalam penyaluran informasi atau berita sudah mulai menggunakan sistem berbasis web untuk menyampaikan berita secara up to date. Namun, dalam membagi berita ke dalam kategori-kategori tersebut untuk saat ini masih dilakukkan secara manual, sehingga sangat merepotkan dan juga dapat memakan waktu yang lama. Dalam penelitian ini akan digunakan penggabungan metode pemilihan fitur, yaitu Particle Swarm Optimization berbasis Naïve Bayes Classifier untuk melihat akurasi pada metode tersebut. Penelitian ini menghasilkan klasifikasi teks dalam bentuk kategori gosip, kuliner, dan travel dari konten berita digital. Pengukuran berdasarkan akurasi Naïve Bayes Classifier sebelum dan sesudah penambahan metode pemilihan fitur. Evaluasi dilakukan menggunakan 10 fold cross validation. Sedangkan pengukuran akurasi diukur dengan confusion matrix. Hasil penelitian ini didapat akurasi dengan menggunakan metode algoritma Naïve Bayes Classifier sebesar 94.17%.Kata kunci : Particle Swarm Optimization, Naïve Bayes Classifier, Klasifikasi Konten Berita, Text Minin


    Get PDF
    ABSTRAK Lagu merupakan hiburan dalam aktivitas manusia yang melibatkan suara-suara yang teratur. Lagu berupa sekumpulan nada-nada yang dirangkai menjadi sebuah bunyi yang sangat indah dan harmoni. Emosi pada lagu menjelaskan makna emosional yang melekat pada sebuah klip lagu. Dalam Tugas Akhir ini akan dilakukan klasifikasi emosi berdasarkan lirik lagu, sebagai media yang digunakan untuk mengklasifikasi ekspresi dan emosi seseorang. Kemudian untuk dapat mengklasifikasi emosi berdasarkan lirik lagu sesuai dengan yang pendengar inginkan, dibutuhkan metode yang tepat dan metode yang digunakan oleh penulis adalah Naïve Bayes Classifier, sebagai metode yang dapat melakukan klasifikasi emosi lirik lagu. Naive Bayes Classifier merupakan salah satu metode Machine Learning yang menggunakan perhitungan probabilitas. Konsep dasar yang digunakan oleh Naïve Bayes Classifier adalah Teorema Bayes, yaitu teorema yang digunakan dalam statistika untuk menghitung suatu peluang. Kata Kunci : Emosi, Text Mining, Naïve Bayes Classifier

    Analisis sentimen komentar youtube terhadap Anies Baswedan sebagai bakal calon presiden 2024 menggunakan metode naive bayes classifier

    Get PDF
    Abstrak Suatu tokoh sebagai bakal calon presiden adalah Anies Baswedan mantan gubernur DKI Jakarta yang menerima banyak penghargaan dan memiliki kebijakan program kerja yang efektif dalam permasalahan di wilayah DKI Jakarta. Komentar mengenai anies baswedan sebagai bakal calon presiden 2024 banyak dijumpai pada media sosial youtube. Youtube menfasilitasi pengguna untuk memberikan komentar dalam menanggapi video yang dapat dijadikan sebuah informasi analisis sentimen untuk mengetahui komentar positif serta komentar negatif. Algorima yang dipakai pada penelitian ini ialah naïve bayes classifier. Terdapat lima proses utama pada penelitian ini, yaitu penghimpunan data, pembobotan kata (TF-IDF), text preprocessing, klasifikasi (naïve bayes classifier) dan pengujian. Dari 1009 data komentar di youtube berbahasa Indonsia terkait video Anies Baswedan sebagai bakal calon presiden 2024. Berdasarkan hasil analaisis, terdapat 610 komentar positif serta 399 negatif. Hasil akurasi menggunakan algoritma naïve bayes classifier sebesar 78% yang di dapat dengan menggunakan perbandingan 10% data uji serta 90% data latih. Kata kunci: anies baswedan, naïve bayes classifier, analisis sentimen, youtube

    Improving the Prediction Accuracy of Text Data and Attribute Data Mining with Data Preprocessing

    Get PDF
    Data Mining is the extraction of valuable information from the patterns of data and turning it into useful knowledge. Data preprocessing is an important step in the data mining process. The quality of the data affects the result and accuracy of the data mining results. Hence, Data preprocessing becomes one of the critical steps in a data mining process. In the research of text mining, document classification is a growing field. Even though we have many existing classifying approaches, Naïve Bayes Classifier is good at classification because of its simplicity and effectiveness. The aim of this paper is to identify the impact of preprocessing the dataset on the performance of a Naïve Bayes Classifier. Naïve Bayes Classifier is suggested as the best method to identify the spam emails. The Impact of preprocessing phase on the performance of the Naïve Bayes classifier is analyzed by comparing the output of both the preprocessed dataset result and non-preprocessed dataset result. The test results show that combining Naïve Bayes classification with the proper data preprocessing can improve the prediction accuracy. In the research of Attributed data mining, a decision tree is an important classification technique. Decision trees have proved to be valuable tools for the classification, description, and generalization of data. J48 is a decision tree algorithm which is used to create classification model. J48 is an open source Java implementation of the C4.5 algorithm in the Weka data mining tool. In this paper, we present the method of improving accuracy for decision tree mining with data preprocessing. We applied the supervised filter discretization on J48 algorithm to construct a decision tree. We compared the results with the J48 without discretization. The results obtained from experiments show that accuracy of J48 after discretization is better than J48 before discretization

    E-mail Spam Filtering by A New Hybrid Feature Selection Method Using Chi2 as Filter and Random Tree as Wrapper

    Get PDF
    The purpose of this research is presenting a machine learning approach for enhancing the accuracy of automatic spam detecting and filtering and separating them from legitimate messages. In this regard, for reducing the error rate and increasing the efficiency, the hybrid architecture on feature selection has been used. Features used in these systems, are the body of text messages. Proposed system of this research has used the combination of two filtering models, Filter and Wrapper, with Chi Squared (Chi2) filter and Random Tree wrapper as feature selectors. In addition, Multinomial Naïve Bayes (MNB) classifier, Discriminative Multinomial Naïve Bayes (DMNB) classifier, Support Vector Machine (SVM) classifier and Random Forest classifier are used for classification. Finally, the output results of this classifiers and feature selection methods are examined and the best design is selected and it is compared with another similar works by considering different parameters. The optimal accuracy of the proposed system is evaluated equal to 99%