Search CORE

4 research outputs found

Rancang Bangun Aplikasi untuk Klasifikasi Komentar Netizen pada Media Sosial Pemerintah Daerah di Indonesia Menggunakan Algoritma Random Forest

Author: Hazmi Muhammad Fikry
Publication venue
Publication date: 01/07/2018
Field of study

Perkembangan teknologi informasi dan komunikasi (TIK) mendorong pemerintah untuk menerapkan e-government. E-Government diyakini dapat memberikan dampak yang luas apabila dijalankan dengan baik. Media sosial dipilih sebagai jembatan komunikasi antara masyarakat dan pemerintah dengan pengguna aktif di Indonesia hampir 50% dari total populasi berdasarkan statistik dari Digital Global Statistic. Salah satu upaya yang dapat dilakukan untuk meningkatkan partisipasi publik adalah dengan membangun komunikasi dua arah yang efektif dengan masyarakat melalui media sosial. Komentar dalam media sosial merupakan salah satu bentuk keterlibatan masyarakat terhadap pemerintahan. Berdasarkan kasus tersebut, dibutuhkan sebuah platform yang mampu mengklasifikasikan dan memberikan informasi visual terhadap topik pembicaraan masyarakat pada kiriman di media sosial pemerintahan secara real-time dan Random forest dipilih sebagai metode klasifikasi. Adapun data yang digunakan diambil dari akun Facebook, Twitter, dan Youtube milik pemerintah daerah. Setelah data dididapatkan serta dianalisa menggunakan Random Forest kemudian dilakukan visualisasi terhadap kategori komentar. Platform ini menggunakan Kafka sebagai data pipeline dan menggunakan Spark untuk membantu proses machine learning. Melalui proses pembuatan model, telah didapatkan hasil akurasi terbaik adalah 74,25% dengan menggunakan parameter num of trees dan max depth sebesar 100 dan 30. Selain itu pada proses aplikasi streaming, rata-rata Processing Time adalah 20,385 detik dan rata-rata Total Delay adalah 20,854 detik untuk setiap batch. ============== The development of information technology and communication has encouraged the government to apply e-government. They believe it could give a broad impact when it well implemented. Social media was chosen to be the communication tool between the society and the government. Based on statistics from the Digital Global Statistic, its active users in Indonesia reach almost 50% of the total population. One of the efforts that could be done to increase public participation is to build an effective two-way communication with society through the social media. Comments in the social media are one form of social participation to the government. Based on that case, we need a platform that could classify and give visual information about topics those spread widely in the society, from the posts they put on the government's social media accounts, in real time. To satisfy this, we chose Random Forest to be the method that used in the classification process. The data we used in this research is taken from Facebook, Twitter, and Youtube account of each local government. After gathering the data and processed it with random forest, then we visualized it in the comment's categories. This platform uses Kafka as a pipeline data and uses Spark to help machine learning process. Through the model development, we got 74,25% as the best accuracy with parameters; the num of trees and max depth are 100 and 30 respectively. Moreover, at the streaming application process, the average Processing Time is 20.385 seconds and the average Total Delay is 20.854 seconds for each batch

ITS Repository

Detection of suspicious URLs in online social networks using supervised machine learning algorithms

Author: Al-Janabi Mohammed Fadhil Zamil
Publication venue
Publication date: 01/12/2018
Field of study

This thesis proposes the use of several supervised machine learning classification models that were built to detect the distribution of malicious content in OSNs. The main focus was on ensemble learning algorithms such as Random Forest, gradient boosting trees, extra trees, and XGBoost. Features were used to identify social network posts that contain malicious URLs derived from several sources, such as domain WHOIS record, web page content, URL lexical and redirection data, and Twitter metadata. The thesis describes a systematic analysis of the hyper-parameters of tree-based models. The impact of key parameters, such as the number of trees, depth of trees and minimum size of leaf nodes on classification performance, was assessed. The results show that controlling the complexity of Random Forest classifiers applied to social media spam is essential to avoid overfitting and optimise performance. The model complexity could be reduced by removing uninformative features, as the complexity they add to the model is greater than the advantages they give to the model to make decisions. Moreover, model-combining methods were tested, which are the voting and stacking methods. Both show advantages and disadvantages; however, in general, they appear to provide a statistically significant improvement in comparison to the highest singular model. The critical benefit of applying the stacking method to automate the model selection process is that it is effective in giving more weight to more topperforming models and less affected by weak ones. Finally, 'SuspectRate', an online malicious URL detection system, was built to offer a service to give a suspicious probability of tweets with attached URLs. A key feature of this system is that it can dynamically retrain and expand current models

Keele Research Repository

A systematic analysis of random forest based social media spam classification

Author: A Liaw
C Yang
JP Bradford
L Breiman
M McCord
M Pal
RE Banfield
V Lempitsky
Z Chu
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 25/07/2017
Field of study

Recently random forest classification became a popular choice machine learning applications aimed to detect spam content in online social networks. In this paper, we report a systematic analysis of random forest classification for this purpose. We assessed the impact of key parameters, such as number of trees, depth of trees and minimum size of leaf nodes on classification performance. Our results show that controlling the complexity of random forest classifiers applied to social media spam is important in order to avoid overfitting and optimize performance We also conclude that in order to support reproducibility of experimental results it is important to report key parameters of random forest classifiers

Keele Research Repository

Crossref