41 research outputs found

    Pengurangan Dimensi dengan Metode Linear Discriminant Analist (LDA)

    Get PDF
    The purpose of this study is to reduce the dimensions of the dataset that affect the prediction of breast cancer. The data used in research is very much data or is called high-dimensional data. The use of classification algorithms has weaknesses when used on high-dimensional data, so an appropriate method is needed to reduce the dimensions or variables used. There are several methods that can be used to reduce dimensions. In this study using the method of linear discriminant analysis (LDA). LDA is a supervised machine learning algorithm that is used to classify data into several classes, using a linear technique to determine the best set of linear variables to unify class data. LDA is used to reduce the dataset variables used by retaining information that is important for the classification process. The method used in this research is using LDA in data processing and then using a logistic regression model for the classification process. The conclusion obtained in this study is that LDA can overcome the problem of multiclass classification. The results obtained were 16 wrong cases out of a total of 455 cases so that the results obtained were 0.035% misclassification

    Penerapan Random Forest dan Adaboost untuk Klasifikasi Serangan DDoS

    Get PDF
    Among the different types of attacks in the field of Information Technology, DDOS attacks are one of the biggest threats to internet sites and pose a devastating risk to the security of computer systems, mainly due to their potential impact. Hence why research in this area is growing rapidly, with researchers focusing on new ways to address intrusion detection and prevention. Machine learning and Artificial Intelligence are some of the latest additions to the list of technologies studied to perform intrusion detection classification. This study explores the behavior and application of DDoS datasets for machine learning in the context of intrusion detection. The flow in this study, first is to collect raw DDoS datasets from reputable sources. After the data is obtained, the final data set is created for modeling. Data management involves data cleansing, data type transformation and data exchange on data collection. The selection process is accompanied by a model. Two separate algorithms, random and adaboost, are used to train a model with a dataset. The model is validated and retrained with a k-fold cross. The model was eventually evaluated using invisible data. The result is determined by various output sizes. In the experiment, DDoS datasets were used: CICDDoS_2019 The intrusion detection performance of this dataset was analyzed using two machine learning models. The dataset is divided in an 80:20 ratio for model training, validation and testing. Machine learning models are selected systematically and carefully to ensure that experiments are conducted in the right way. The results were analyzed using a set of performance metrics, including accuracy, precision, recall, f-measure, and compute tim

    COMPARISON OF PORTERS STEMMING ALGORITHM AND NAZIEF & ADRIANI'S STEMMING ALGORITHM IN DETERMINING INDONESIAN LANGUAGE LEARNING MODULES

    Get PDF
    One of the methods used to improve the performance of text summarization to obtain complete information in a learning module is by transforming the words in a module into basic words or, in other words, through a steaming process. The steaming process in Indonesian language texts is more complicated/complex because there are word affixes that must be removed to get the root word (root word) of a word, so this research will compare the two stemming algorithms of Porter and stemming Nazief & Adriani in the learning module at Mataram University of Technology. The test results of the Nazief & Adriani stemming algorithm on an average process duration of 51.8 seconds with an average accuracy of 74.175%. In Porter's Algorithm, the average processing time is 16.875 seconds, with an accuracy of 73.225%

    Algoritma Triple Exponential Smoothing Untuk Prediksi Trend Turis Pariwisata Jatim Park Batu Saat Pandemi Covid-19

    Get PDF
    The level of tourism visits in 2021 both local and foreign to Indonesian tourism has decreased drastically. The COVID-19 pandemic is one of the causes of this loss. In the last 1 year, the level of tourism has dropped dramatically due to this pandemic. The impact on a country is an economic recession, Singapore is a country that is experiencing a severe recession of up to -40%, a country is a country that also depends, one of which is on tourism. Jatim Park Batu is a tourism learning park and family recreation area in Batu, East Java. Jatim Park is a well-known tourism object in East Java. The uncertainty of the number of tourists each month affects the operational management of Jatim Park in making every decision, both technical and strategic decisions. The researcher proposes to use the Triple Exponential Smoothing algorithm, the Holt Winters model, where this algorithm is classified as a prediction algorithm that can consider trend and seasonal factors. The method of measuring accuracy uses the MAPE (Mean Absolute Percetage Error) method. Tests were carried out by initiating the alpha beta gamma parameter 30 times and obtained an average of 9%.Tingkat kunjungan pariwisata ditahun 2021 baik lokal maupun mancanegara terhadap pariwisata Indonesia mengalami penurunan drastis.  Pandemi COVID-19 menjadi salah satu sebab dari adanya kerugian tersebut.  Dalam 1 tahun terakhir ini, tingkat pariwisata menurun drastis dikarenakan pandemi ini.  Dampak terhadap sebuah negara adalah resesi ekonomi, Singapura adalah negara yang mengalami resesi cukup parah hingga -40%, negara adalah negara yang juga bergantung salah satunya pada pariwisata.  Jatim Park Batu adalah sebuah pariwisata taman belajar dan tempat rekreasi keluarga di Batu, Jawa Timur. Jatim Park merupakan tergolong pariwisata yang terkenal di Jawa Timur.  Ketidakpastian jumlah turis tiap bulannya mempengaruhi manajemen operasional Jatim Park dalam melakukan setiap pengambilan keputusan, baik keputusan yang bersifat teknis maupun strategis.  Peneliti mengusulkan untuk menggunakan algoritma Triple Exponential Smoothing, model Holt Winters, dimana algoritma ini adalah tergolong algoritma prediksi yang dapat mempertimbangkan faktor trend dan musiman.  Metode pengukuran akurasi menggunakan metode (Mean Absolute Percetage Error) MAPE. Pengujian dilakukan dengan inisiasi parameter alfa beta gamma sebanyak 30 kali dan didapatkan rata – rata sebesar 9%

    COMPARISON OF ACCURACY LEVELS OF RANDOM FOREST AND K-NEAREST NEIGHBOR (KNN) ALGORITHMS FOR CLASSIFYING SMOOTH BANK CREDIT PAYMENTS

    Get PDF
    Providing credit is one of the bank offers offered to customers, but extending credit to customers who are not appropriate can cause problems such as customers who do not pay installments on time and even delay payment of installments for several months until bad credit occurs so that this can be detrimental to the bank. Therefore, in this study a comparative method will be carried out to find out which method is the best in classifying the smoothness of bank credit payments. It is hoped that the results of the research can be used as material for consideration by the bank in the selection of bank credit customers. In this study using a dataset from the UCI Machine Learning Repository, the credit payment data totaled 29,998. The dataset is split by dividing 70% train data and 30% test data with the amount of each data, namely 24000 train data and 6000 test data. Meanwhile, the labels used are Eligible and Ineligible. In this study, implementing the data mining process using the CRISP-DM framework and using the Python programming language. From the results of the evaluation using the confusion matrix, the best accuracy value was obtained for the random forest algorithm, namely 82.22%, precision of 80.44%, recall of 82.22% and f1-score of 80.0%. Meanwhile, the KNN algorithm obtains an accuracy value of 81.55%, a precision of 79.5%, a recall of 81.55% and an f1-score of 79.11%. Based on the results of this evaluation, the Random Forest algorithm has the best accuracy compared to the KNN algorithm in classifying bank credit payments

    Comparison of Naïve Bayes Algorithm and XGBoost on Local Product Review Text Classification

    Get PDF
    Online reviews are critical in supporting purchasing decisions because, with the development of e-commerce, there are more and more fake reviews, so more and more consumers are worried about being deceived in online shopping. Sentiment analysis can be applied to Marketplace product reviews. This study aims to compare the two categories of Naïve Bayes and XGBoost by using the two vector spaces wod2vec and TFIDF. The methods used in this research are data collection, data cleaning, data labelling, data pre-processing, classification and evaluation. The data scraping process produced 25,581 data which was divided into 80% training data and 20% test data. The data is divided into two classes, namely good sentiment and bad sentiment. Based on the research that has been done, the combination of Word2vec + XGBoost F1 scores higher by 0.941, followed by TF-IDF + XGBoost by 0.940. Meanwhile, Naïve Bayes has an F1-Score of 0.915 with TF-IDF and 0.900 with word2vec. Classification using XGBoost proved to be able to classify unbalanced data better than Naïve Bayes

    Stock Price Time Series Data Forecasting Using the Light Gradient Boosting Machine (LightGBM) Model

    Get PDF
    In the world of stock investment, one of the things that commonly happens is stock price fluctuations or the ups and downs of stock prices. As a result of these fluctuations, many novice investors are afraid to play stocks. However, on the other hand, stocks are a type of investment that can be relied upon during disasters or economic turmoil, such as in 2019, namely the Covid-19 pandemic. For stock price fluctuations to be estimated by investors, it is necessary to carry out a forecasting activity. This study builds stock price forecasting using the Light Gradient Boosting Machine (LightGBM) algorithm, which has high accuracy and efficiency. To forecast stock price time series, the model used is the LightGBM ensemble. At the same time, they were optimizing the determination of hyperparameters using Grid Search Cross Validation (GSCV). This study will also compare the LGBM algorithm with other algorithms to see which model is optimal in forecasting price stock data. In this study, the test used the RMSE metric by comparing the original data (testing data) with the predicted results. The experimental results show that the LightGBM model can compete with and outperform boosting-based forecasting models like XGBoost, AdaBoost, and CatBoost. In comparing forecasting models, the same dataset is used so that the results are accurate, and the comparisons are equivalent. In future research, paying attention to the data during pre-processing is necessary because it has many outliers. In addition, it is necessary to include exogenous variables and external variables, which are determined to involve many parties

    Perbandingan Algoritma Word Matching dan Naive Bayes untuk Klasifikasi Sentimen Analisis Komentar Instagram

    Get PDF
    Analisis sentimen telah menunjukkan bahwa otomatisasi dan pengenalan komputasi terhadap sentimen adalah mungkin dan berkembang seiring berjalannya waktu, karena faktor munculnya tren teknologi baru dan keadaan yang semakin dinamis dari bahasa manusia sebagai bentuk komunikasi. Dengan adanya media sosial semakin banyak pula teks-teks berupa data informal, menyebabkan proses ekstraksi dan penguraian informasi yang relevan menjadi masalah. Oleh karena itu pada penelitian ini penulis mengusulkan dua metode klasifikasi yang kemudian akan melakukan perbandingan hasil dari kedua metode tersebut

    Studi Literatur Mengenai Klasifikasi Citra Kucing Dengan Menggunakan Deep Learning: Convolutional Neural Network (CNN)

    Get PDF
    Deep learning merupakan bagian dari machine learning yang memiliki kemampuan untuk mengenali pola gambar, suara, teks dan data lainnya yang kompleks sehingga dapat menghasilkan prediksi yang akurat. Salah satu kemampuan deep learning adalah klasifikasi citra pada objek. CNN adalah salah satu metode dalam machine learning yang digunakan untuk mengklasifikasikan citra objek. Algoritma Convolutional Neural Network (CNN) adalah bagian dari deep learning network yaitu jenis jaringan saraf tiruan yang saat ini banyak digunakan untuk pengenalan suatu citra. Dalam penelitian ini, algoritma yang digunakan adalah CNN karena akurasinya yang cukup baik. Deep learning dengan convolutional neural network (CNN) yang banyak digunakan untuk melakukan deteksi, klasifikasi, dan prediksi pada gambar. Citra objek dalam penelitian ini adalah kucing yang terdiri dari berbagai macam jenis. Tujuan dari penelitian ini adalah untuk mengklasifikasikan citra kucing sesuai dengan jenisnya. Jurnal ini merupakan tinjauan literatur untuk menambah pengetahuan berharga mengenai penelitian terbaru tentang klasifikasi citra kucing menggunakan CNN. Jurnal ini membahas studi literatur tentang variabel input, metode yang digunakan dan hasil literatur dari penelitian sebelumnya. Metode yang paling banyak digunakan pada penelitian sebelumnya adalah CN

    Ontological Mapping Cobit 2019 Pada Penilaian Kesehatan Bank Di Indonesia

    Get PDF
    Penelitian ini bertujuan untuk melakukan pemetaan COBIT 2019 dengan Kebijakan Penilaian Tingkat Kesehatan Bank di Indonesia menggunakan teknik ontological mapping. Dengan mengadopsi bahasa ArchiMate, penelitian ini menganalisis dan menggambarkan hubungan antara konsep-konsep dalam COBIT 2019 dan faktor-faktor penilaian tingkat kesehatan bank. Metode analisis yang digunakan mencakup identifikasi tujuan dalam domain COBIT, pemetaan faktor penilaian TKB, dan pembuatan model COBIT 2019 dengan faktor penilaian TKB. Hasil penelitian menunjukkan bahwa beberapa domain COBIT 2019  memiliki relasi yang dengan faktor-faktor penilaian TKB, termasuk profil risiko, GCG, earning, dan capital. Kesimpulan dari penelitian ini memperkuat pemahaman mengenai keterkaitan antara kerangka kerja tata kelola TI (COBIT 2019) dan praktik penilaian tingkat kesehatan bank di Indonesia
    corecore