20 research outputs found

    Implementation of Deep Learning to Detect Indonesian Hoax News with Convolutional Neural Network Method

    Get PDF
    This study aims to establish and test a model that is used to determine valid news and hoax news. The method used is the Convolutional Neural Network (CNN) method and Word2Vec as embeddings. The research stages consist of data collection, pre-processing, word embeddings, model formation and testing the results obtained. The data used is 958 news. After testing with the distribution of data by 80% as training data and 20% as test data and 5 times epoch, the model that has been formed can determine valid news and hoax news well. In this study, a model with a vector dimension of 400 as input data and a multiple filter size of 3,4,5 became the best model. The resulting accuracy, precision and recall are 0.91. These results are influenced by the selection of the size of the vector dimensions on the output of Word2Vec, the selection of the filter size on the convolution layer and the addition of the Indonesian Wikipedia corpus into the corpus used

    Klasifikasi Suara Tangisan Bayi Berdasarkan Prosodic Features Menggunakan Metode Moments of Distribution dan K-Nearest Neighbours

    Get PDF
    Bagi sebagian orang, suara tangisan bayi terdengar sangat mengganggu, apalagi jika tangisannya berlarut-larut. Sulit untuk dimengerti arti dari suara tangisan bayi. Di era teknologi informasi, pengenalan suara tangisan bayi dapat dilakukan secara otomatis menggunakan komputer. Hal tersebut tentu dapat membantu bagi orang tua untuk mengenali kebutuhan bayi agar dapat segera tenang. Untuk mengidentifikasi suara tangisan bayi dapat menggunakan salah satu algoritma klasifikasi di bidang Machine Learning, salah satunya adalah algoritma K-Nearest Neighbour. Langkah pertama untuk melakukan klasifikasi suara tangisan bayi, yakni data audio suara tangisan bayi diubah menjadi data numerik yang disebut proses ekstraksi fitur yang menghasilkan Prosodic Features. Setelah melewati proses ekstraksi fitur perlu dilakukan identifikasi pola untuk mendapatkan perbedaan pola  antara satu data suara tangisan bayi dengan data suara tangisan bayi yang lain menggunakan Metode Moment of Dsitribution. Pengenalan suara tangisan bayi dilakukan dengan menerapkan algoritma klasifikasi menggunakan K-Nearest Neighbour. Akurasi terbaik pada proses klasifikasi menggunakan data sampling Percentage Rate yaitu 76% dimana nilai K yang digunakan adalah 9. Sedangkan akurasi terbaik pada proses klasifikasi menggunakan data sampling Leave One Out yaitu 42% dengan nilai K yang digunakan adalah 5

    Lyric Text Mining Of Dangdut: Visualizing The Selected Words And Word Pairs Of The Legendary Rhoma Irama’s Dangdut Song In The 1970s Era

    Get PDF
    Dangdut is a new genre of music introduced by Rhoma Irama, Indonesian popular musician who was the Legendary dangdut singer in the 1970s era until now. The expression of  Rhoma Irama’s lyric has themes of the human being, the way of life, love, law and human right, tradition, social equality, and Islamic messages. But interestingly, the song lyrics were written by Rhoma Irama in the 1970s were mostly on the love song themes. In order to prove this, it is necessary to identify the songs through several approaches to explore the selected word and the relationship between word pairs. If each Rhoma Irama’s lyric is identified in text mining field, the lyric text extraction will be an interesting knowledge pattern. We collected the lyric from web were used as datasets, and then we have done the data extraction to store the component of lyric including the part and line of the song. We successfully applied the most word frequencies in the form of data visualization including bar chart, word cloud, term frequency-inverse document frequency, and network graph. As a results, several word pairs that often was used by Rhoma Irama in writing his song including heart-love (19 lines), heart-longing (13 lines), heart-beloved (12 lines), love-beloved (12 lines), love-longing (11 lines)

    Data Mining Approach for Breast Cancer Patient Recovery

    Get PDF
    Breast cancer is the second highest cancer type which attacked Indonesian women. There are several factors known related to encourage an increased risk of breast cancer, but especially in Indonesia that factors often depends on the treatment routinely. This research examines the determinant factors of breast cancer and measures the breast cancer patient data to build the useful classification model using data mining approach.The dataset was originally taken from one of Oncology Hospital in East Java, Indonesia, which consists of 1097 samples, 21 attributes and 2 classes. We used three different feature selection algorithms which are Information Gain, Fisher’s Discriminant Ratio and Chi-square to select the best attributes that have great contribution to the data. We applied Hierarchical K-means Clustering to remove attributes which have lowest contribution. Our experiment showed that only 14 of 21 original attributes have the highest contribution factor of the breast cancer data. The clustering algorithmdecreased the error ratio from 44.48% (using 21 original attributes) to 18.32% (using 14 most important attributes).We also applied the classification algorithm to build the classification model and measure the precision of breast cancer patient data. The comparison of classification algorithms between Naïve Bayes and Decision Tree were both given precision reach 92.76% and 92.99% respectively by leave-one-out cross validation. The information based on our data research, the breast cancer patient in Indonesia especially in East Java must be improved by the treatment routinely in the hospital to get early recover of breast cancer which it is related with adherence of patient

    GIS implementation and classterization of potential blood donors using the agglomerative hierarchical clustering method

    Get PDF
    The blood needs of PMI (Indonesian Red Cross) in the Surabaya City area are sometimes erratic, the problem occurs because the amount of blood demand continues to increase while the blood supply is running low. As the main objective of this research, data mining was applied to able to cluster the blood donor data in UTD-PMI Surabaya City Center which was to determine both potential and no potential donors and also visualize the pattern of donor distribution in Geographic Information System (GIS). Agglomerative Hierarchical Clustering was applied to obtain the clustering result from the existing of 8757 donors. The experiment result shown that the cluster quality was quite good which reached 0.6065410 using Silhouette Coefficient. We concluded the one interesting analysis that private male employees with blood type O, and live in the eastern part of Surabaya City are the most potential donors

    Implementasi Perbandingan Algoritma Apriori Dan FP-Growth Untuk Mengetahui Pola Pembelian Konsumen Pada Produk Panel Di PT Surya Multi Perkasa Movinko

    Get PDF
    Some companies have not used much consumer purchase transaction data as one of their sales strategies, this transaction data contains what items are often bought by consumers in one purchase transaction at a different time and structure. If the transaction data is analyzed and explored in more depth, the company will gain insight into consumer purchase patterns analysis and be profitable for the company. In this research, an analysis of consumer purchase transaction data was carried out using Apriori algorithm and FP-Growth, both of which are association rule method group that aims to determine consumer purchasing patterns. The data used in this study were obtained from panel product purchase transaction data at PT Surya Multi Perkasa Movinko. The transaction data consist of 23 types of product items and 492 transactions. The experimental results of this study showed that the best performance of Apriori algorithm with a support factor of 0.0054 and a confidence factor of 0.30 generating 12 association rules, while the best performance of FP-Growth algorithm with a supporting factor of 2 and a confidence factor of 0.7 generating 9 association rules.Beberapa perusahaan belum banyak memanfaatkan data transaksi pembelian konsumen sebagai salah satu strategi penjualannya, data transaksi ini meliputi barang apa saja yang sering dibeli oleh konsumen dalam satu transaksi pembelian pada struk dan waktu yang berbeda.  Jika data transaksi tersebut dianalisis dan digali lebih mendalam, maka perusahaan mendapatkan suatu insight berupa analisis pola pembelian konsumen dan menguntungkan bagi perusahaan. Pada penelitian ini dilakukan analisis data transaksi pembelian konsumen menggunakan perbandingan algoritma Apriori dan FP-Growth, dimana keduanya merupakan kelompok Metode Association Rule yang bertujuan untuk mengetahui pola pembelian konsumen. Data yang digunakan pada penelitian ini diperoleh dari data transaksi pembelian produk panel pada PT Surya Multi Perkasa Movinko. Data transaksi tersebut terdiri dari 23 jenis item produk dan 492 transaksi. Hasil eksperimen dari penelitian ini menunjukkan bahwa kinerja terbaik algoritma Apriori dengan support factor sebesar 0.0054 dan confidence factor sebesar 0.30 menghasilkan 12 aturan asosiasi, sedangkan kinerja terbaik algoritma FP-Growth dengan support factor sebesar 2 dan confidence factor sebesar 0.7 menghasilkan 9 aturan asosiasi

    Implementation of Web Scraping on Google Search Engine for Text Collection Into Structured 2D List

    Get PDF
    Purpose: This research proposes the implementation of web scraping on Google Search Engine to collect text into a structured 2D list.Design/methodology/approach: Implementing two important stages in the process of collecting data through web scraping, namely the HTML parsing process to extract links (URL) on Google Search Engine pages, and HTML parsing process to extract the body text from website pages on each link that has been collected.Findings/result: The inputted query is adjusted to the latest issues and news in Indonesia, for example the President's important figures, the month of Ramadan and Idul Fitri, riots tragedy (stadium) and natural disasters, rising prices of basic commodities, oil and gold, as well as other news. The least number of links obtained was 56 links and the most was 151 links, while the processing time to obtain links for each of the fastest queries was 1 minute 6.3 seconds and the longest was 2 minutes 49.1 seconds. The results of scraping links from these queries were obtained from Wikipedia, Detik, Kompas, the Election Supervisory Body (Bawaslu), CNN Indonesia, the General Election Commission (KPU), Pikiran Rakyat, and others.Originality/value/state of the art: Based on previous research, this study provides an alternative to produce optimal collection of links and text from web scraping results in the form of a 2D list structure. Lists in the Python programming language can store character sequences in the form of strings and can be accessed using index keys, and manipulate text efficiently

    IS implementation and classterization of potential blood donors using the agglomerative hierarchical clustering method

    Get PDF
    he blood needs of PMI (Indonesian Red Cross) in the Surabaya City area are sometimes erratic, the problem occurs because the amount of blood demand continues to increase while the blood supply is running low. As the main objective of this research, data mining was applied to able to cluster the blood donor data in UTD-PMI Surabaya City Center which was to determine both potential and no potential donors and also visualize the pattern of donor distribution in Geographic Information System (GIS). Agglomerative Hierarchical Clustering was applied to obtain the clustering result from the existing of 8757 donors. The experiment result shown that the cluster quality was quite good which reached 0.6065410 using Silhouette Coefficient. We concluded the one interesting analysis that private male employees with blood type O, and live in the eastern part of Surabaya City are the most potential donors Keywords: Blood Donor, Clustering, Agglomerative Hierarchical Clustering, Data Mining, Geographic Information Syste

    Analysis and Development of KEBI 1.0 Checker Framework as an Application of Indonesian Spelling Error Detection

    Get PDF
    At educational institutions, especially at University, writing scientific papers is a skill that must be possessed by academics such as educators and students. However, writing scientific papers is not easy, there are many provisions and rules that need to be fulfilled. Several studies show that there are still many academics who make mistakes in writing their scientific papers. Some of the mistakes made include punctuation errors, typographic writing errors and the use of non-standard words in Indonesian. Researchers in Indonesia have developed various spelling error detection applications in Indonesian-language scientific papers. This study tries to analyze the development of an application framework for detecting Indonesian spelling errors from various assessment indicators. This study tries to compare the application framework for detecting spelling errors between other studies with proposed application that named KEBI 1.0 Checker. KEBI 1.0 Checker as a spelling error detection application has 3 main features, namely detecting errors in the use of punctuation marks, writing typography, and using non-standard words in accordance with the standards of the Big Indonesian Dictionary and the General Guidelines for Indonesian Spelling. In addition, this study tries to objectively examine the complexity of the features, advantages and disadvantages, methods and the level of accuracy of each application. The results of the analysis show that KEBI 1.0 Checker has the completeness of features, fast computation time, easy application access, and an attractive user interface. However, it is still necessary to improve the precision in correcting spelling errors in typographic words
    corecore