8 research outputs found

    Perbandingan Klasifikasi dengan Pendekatan Pembelajaran Mesin untuk Mengidentifikasi Tweet Hoaks di Media Sosial Twitter

    Get PDF
    Perkembangan teknologi tidak luput dari dampak negatif, salah satunya hoaks. Twitter menjadi salah satu media sosial yang paling aktif digunakan sebagai pertukaran informasi, komunikasi, dan hiburan. Oleh karena itu pengguna Twitter dapat menyebarkan berita atau hoaks dengan mudah. Penelitian ini bertujuan mengidentifikasi tweet yang berisi informasi hoaks maupun valid menggunakan pembelajaran mesin. Algoritma yang digunakan adalah Stochastic Gradient Descent, Naïve Bayes, Random Forest, dan Rocchio. Keempat algoritma tersebut dibandingkan untuk kemudian dicari hasil terbaik dalam mengidentifikasi dan memverifikasi tweet di Twitter yang berisi hoaks atau informasi valid secara otomatis. Kata kunci yang digunakan adalah Corona, Mutasi Corona, PSBB, Dana Bansos, Dana Otsus, Utang Pemerintah, dan Sekolah Tatap Muka sebanyak 898 tweet. Data dikelompokkan berdasarkan kelas hoaks dan valid lalu diolah menjadi dataset dengan melewati tahap pra-proses hingga pembobotan kata dengan TF-IDF. Hasil pengujian menunjukkan algoritma Stochastic Gradient Descent merupakan algoritma terbaik dengan hasil akurasi rata-rata sebesar 84.92%. Pengujian lanjutan dilakukan dengan menghitung nilai presisi, recall, dan F-1. Hasil presisi terbaik sebesar 82.95% pada algoritma Naïve Bayes, sedangkan hasil recall dan F-1 terbaik didapat dari algoritma Stochastic Gradient Descent sebesar 85.05% dan 82.42%

    Performance Analysis of a new Filter and Wrapper Sequence for the Survivability Prediction of Breast Cancer Patients

    Get PDF
    Feature selection is an essential preprocessing step for removing redundant or irrelevant features from multidimensional data to improve predictive performance. Currently, medical clinical datasets are increasingly large and multidimensional and not every feature helps in the necessary predictions. So, feature selection techniques are used to determine relevant feature set that can improve the performance of a learning algorithm. This study presents a performance analysis of a new filter and wrapper sequence involving the intersection of filter methods, Mutual Information and Chi-Square followed by one of the wrapper methods: Sequential Forward Selection and Sequential Backward Selection to obtain a more informative feature set for improved prediction of the survivability of breast cancer patients from the clinical breast cancer dataset, SEER. The improvement in performance due to this filter and wrapper sequence in terms of Accuracy, False Positive Rate, False Negative Rate and Area under the Receiver Operating Characteristics curve is tested using the Machine learning algorithms: Logistic Regression, K-Nearest Neighbour, Decision Tree, Random Forest, Support Vector Machine and Multilayer Perceptron. The performance analysis supports the Sequential Backward Selection of the new filter and wrapper sequence over Sequential Forward Selection for the SEER dataset

    Traffic Classification using Deep Learning Approach for End-to-End Slice Management in 5G/B5G

    Get PDF
    Network slicing is a key role in future networks.5G networks are intended to meet different service demands of an application offered to users. 5G architecture is used to match the requirement of the Quality of Service (QoS) by addressing different scenarios in terms of latency, scalability and throughput with different service types. Using machine learning with network slicing allows network operators to create multiple virtual networks or slices on the same physical infrastructure. These slices are independent and customized. Precisely, These slices will be managed dynamically according to the requirements defined between the network operators and the users. For this research, multi-machine learning algorithms are used to train our model, classify network traffic and predict accurate slice type for each user. After the traffic classification, we compared and analysed the performance of various machine learning algorithms in terms of learning percentage, accuracy, precision and F1 score

    Analisis Performa Algoritma Decision Tree, Naive Bayes, K-Nearest Neighbor untuk Klasifikasi Zona Daerah Risiko Covid-19 di Indonesia

    Get PDF
    Pandemi Covid-19 terjadi di Indonesia. Pemerintah berupaya melakukan penanganan Covid-19, salah satunya dengan pembuatan peta risiko Covid-19. Peta risiko Covid-19 membagi zona berdasarkan Kabupaten/Kota. Zona risiko Covid-19 menjadi patokan pemerintah dalam mengambil kebijakan setiap daerah. Pemerintah menggunakan pembobotan dari 15 indikator untuk menentukan zona. Beberapa kali perubahan zona risiko Covid-19 pada website mengalami keterlambatan. Klasifikasi dapat menjadi alternatif penentuan zona risiko Covid-19, sehingga perubahan zona dapat dilakukan secara cepat dan efisien. Klasifikasi memiliki berbagai algoritma, setiap algoritma memiliki keunggulan dan kelemahan. Algoritma klasifikasi yang memiliki akurasi yang baik dengan waktu relatif cepat yaitu Decision Tree, Naïve Bayes dan K-Nearest Neighbor. Tujuan penelitian ini menghitung performa setiap algoritma, mendapatkan algoritma terbaik dan mendapatkan pola klasifikasi dari algoritma terbaik. Metode penelitian menggunakan 10-fold cross validation untuk pembagian data dan confusion matrix untuk menilai performa. Software yang digunakan yaitu Rapidminer dan WEKA. Hasil dari pengolahan data menunjukan semua algoritma mempunyai nilai performa yang baik yaitu diatas 70%. Semua algoritma tidak memerlukan waktu yang lama dalam pembuatan model. Nilai performa terbaik didapatkan dengan menggunakan algoritma decision tree dengan software WEKA dengan nilai performa 88% dan waktu 0,32 detik. Pola klasifikasi dari algoritma terbaik menghasilkan 77 aturan  yang membagi 3 zona klasifikasi yaitu rendah, sedang, dan tinggi. Atribut yang berpengaruh dalam klasifikasi zona risiko Covid-19 yaitu aktif, CR, CFR, laju insidensi, positif, dan meninggal. AbstractThe Covid-19 pandemic occurred in Indonesia. The government is trying to handle Covid-19, one of which is by making a Covid-19 risk map. The Covid-19 risk map divides zones based on Regency/City. The Covid-19 risk zone is the government's benchmark policy for each region. The government uses a weighting of 15 indicators to determine the zone. Several times the Covid-19 risk zone change on the website has been delayed. Classification can be an alternative to determining the Covid-19 risk zone,  that zone changes can be quickly and efficiently. Many algorithms can be used for classification. Several classification algorithms have good accuracy with relatively fast time are Decision Tree, K-Nearest Neighbor, and Naïve Bayes. The purpose of this study is to calculate the performance of each algorithm, get the best algorithm, and get the classification pattern from the best algorithm. The research method uses 10-fold cross validation for data sharing and confusion matrix to assess performance. The software used is Rapidminer. The results show that all algorithms have good performance values, which are above 70%. All algorithms do not require a long time in modeling. The best performance value using a Decision Tree algorithm. The classification pattern of the best algorithm produces 20 rules that divide 3 classification zones, namely low, medium, and high. Attributes that influence the classification of the Covid-19 risk zone are active, CR, CFR, incidence rate, positive, and death. 

    Speech Mode Classification using the Fusion of CNNs and LSTM Networks

    Get PDF
    Speech mode classification is an area that has not been as widely explored in the field of sound classification as others such as environmental sounds, music genre, and speaker identification. But what is speech mode? While mode is defined as the way or the manner in which something occurs or is expressed or done, speech mode is defined as the style in which the speech is delivered by a person. There are some reports on speech mode classification using conventional methods, such as whispering and talking using a normal phonetic sound. However, to the best of our knowledge, deep learning-based methods have not been reported in the open literature for the aforementioned classification scenario. Specifically, in this work we assess the performance of image-based classification algorithms on this challenging speech mode classification problem, including the usage of pre-trained deep neural networks, namely AlexNet, ResNet18 and SqueezeNet. Thus, we compare the classification efficiency of a set of deep learning-based classifiers, while we also assess the impact of different 2D image representations (spectrograms, mel-spectrograms, and their image-based fusion) on classification accuracy. These representations are used as input to the networks after being generated from the original audio signals. Next, we compare the accuracy of the DL-based classifies to a set of machine learning (ML) ones that use as their inputs Mel-Frequency Cepstral Coefficients (MFCCs) features. Then, after determining the most efficient sampling rate for our classification problem (i.e. 32kHz), we study the performance of our proposed method of combining CNN with LSTM (Long Short-Term Memory) networks. For this purpose, we use the features extracted from the deep networks of the previous step. We conclude our study by evaluating the role of sampling rates on classification accuracy by generating two sets of 2D image representations – one with 32kHz and the other with 16kHz sampling. Experimental results show that after cross validation the accuracy of DL-based approaches is 15% higher than ML ones, with SqueezeNet yielding an accuracy of more than 91% at 32kHz, whether we use transfer learning, feature-level fusion or score-level fusion (92.5%). Our proposed method using LSTMs further increased that accuracy by more than 3%, resulting in an average accuracy of 95.7%

    Machine Learning for Water Quality Assessment Based on Macrophyte Presence

    Get PDF
    This is the final version. Available on open access from MDPI via the DOI in this recordThe ecological state of the Danube River, as the world’s most international river basin, will always be the focus of scientists in the field of ecology and environmental engineering. The concentration of orthophosphate anions in the river is one of the main indicators of the ecological state, i.e., water quality and level of eutrophication. The sedentary nature and ability to survive in river sections, combined with the presence of high levels of orthophosphate anions, make macrophytes an appropriate biological parameter for in situ prediction of in-river monitoring processes. However, a preliminary literature review identified a lack of comprehensive analysis that can enable the prediction of the ecological state of rivers using biological parameters as the input to machine learning (ML) techniques. This work focuses on comparing eight state-of-the-art ML classification models developed for this task. The data were collected at 68 sampling sites on both river sides. The predictive models use macrophyte presence scores as input variables, and classes of the ecological state of the Danube River based on orthophosphate anions, converted into a binary scale, as outputs. The results of the predictive model comparisons show that support vector machines and tree-based models provided the best prediction capabilities. They are also a low-cost and sustainable solution to assess the ecological state of the rivers

    A Comparative Study on Agent Based Decision Making Models: A Proof of Concept Focused on Farmers’ Decisions Regarding Best Management Practices

    Get PDF
    In recent times, with the increasing availability of large datasets, applications of machine learning techniques have grown at a rapid speed. However, due to the black-box nature of these tools, it can be hard for model builders to understand the detailed structure of the system that machine learning models simulate. Agent-based modelling (ABM) is a popular approach to studying complex systems., One of the challenges for this technique is to design the decision making processes of the agents in the model. As machine learning tools have a strong ability to transform the information from the raw data into a functional model as the decision making processes for agents in ABMs. Because an ABM can provide a detailed structure for the system that the machine learning model simulates, it is reasonable to combine the two kinds of models. However, although in previous studies, some researchers combine the two models, most of them use one of the two models as a validation tool for the other, rather than to integrate the machine learning model into the decision making processes of agents in ABMs. Therefore, this thesis focuses on integrating a machine learning model into the ABM, and contrast it with the ABMs with two traditional decision making models, including an optimal model and a stochastic model. To compare the three decision making models, we use farmers’ BMP adoption case in the Upper Medway subwatershed, and contrast the three models through three metrics, including the percentage of BMP adoption, size of agricultural land of BMP adoption, and the correlation between BMP adoption and landuse types. As a result, the ABM with the machine learning model presents a high level of accuracy compared with the other two traditional models, but its adaptability to other cases and the robustness to uncertainties still require a further study
    corecore