Search CORE

7 research outputs found

Gender Detection on Social Networks using Ensemble Deep Learning

Author: CC Aggarwal
D CireşAn
D Liben-Nowell
D Scherer
G Salton
HP Luhn
J Pennington
K Kowsari
K Kowsari
LE Krueger
M Jaderberg
MK Dalal
T Verma
V Gupta
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/09/2020
Field of study

Analyzing the ever-increasing volume of posts on social media sites such as Facebook and Twitter requires improved information processing methods for profiling authorship. Document classification is central to this task, but the performance of traditional supervised classifiers has degraded as the volume of social media has increased. This paper addresses this problem in the context of gender detection through ensemble classification that employs multi-model deep learning architectures to generate specialized understanding from different feature spaces

arXiv.org e-Print Archive

Crossref

Klasifikasi Teks Multilabel pada Artikel Berita Menggunakan Long Short-Term Memory dengan Word2Vec

Author: Iman Saladin B. Azhar
Reza Firsandaya Malik
Rini Dian Palupi
Winda Kurnia Sari
Publication venue: 'Ikatan Ahli Informatika Indonesia (IAII)'
Publication date: 19/04/2020
Field of study

Multilabel text classification is a task of categorizing text into one or more categories. Like other machine learning, multilabel classification performance is limited to the small labeled data and leads to the difficulty of capturing semantic relationships. It requires a multilabel text classification technique that can group four labels from news articles. Deep Learning is a proposed method for solving problems in multilabel text classification techniques. Some of the deep learning methods used for text classification include Convolutional Neural Networks, Autoencoders, Deep Belief Networks, and Recurrent Neural Networks (RNN). RNN is one of the most popular architectures used in natural language processing (NLP) because the recurrent structure is appropriate for processing variable-length text. One of the deep learning methods proposed in this study is RNN with the application of the Long Short-Term Memory (LSTM) architecture. The models are trained based on trial and error experiments using LSTM and 300-dimensional words embedding features with Word2Vec. By tuning the parameters and comparing the eight proposed Long Short-Term Memory (LSTM) models with a large-scale dataset, to show that LSTM with features Word2Vec can achieve good performance in text classification. The results show that text classification using LSTM with Word2Vec obtain the highest accuracy is in the fifth model with 95.38, the average of precision, recall, and F1-score is 95. Also, LSTM with the Word2Vec feature gets graphic results that are close to good-fit on seventh and eighth models.Klasifikasi teks multilabel adalah tugas mengategorikan teks ke dalam satu atau lebih kategori. Seperti pembelajaran mesin lainnya, kinerja klasifikasi multilabel terbatas ketika ada data kecil berlabel dan mengarah pada kesulitan menangkap hubungan semantik. Dibutuhkan teknik klasifikasi teks multilabel yang dapat mengelompokkan empat label dari artikel berita untuk penelitian ini. Deep Learning adalah metode yang diusulkan untuk memecahkan masalah dalam klasifikasi teks multilabel. Beberapa contoh metode deep learning yang digunakan untuk pengklasifikasian teks antara lain Convolutional Neural Networks, Autoencoder, Deep Belief Networks, dan Recurrent Neural Networks (RNN). RNN merupakan salah satu arsitektur yang paling popular yang digunakan dalam Pemrosesan Bahasa Alami (PBA) karena struktur recurrent cocok untuk proses teks bervariabel panjang. Salah satu metode deep learning yang diusulkan pada penelitian ini adalah RNN dengan penerapan arsitektur Long Short-Term Memory (LSTM). Dalam penelitian ini untuk mendapatkan model yang optimal pada klasifikasi teks dilakukan percobaan trial dan error menggunakan LSTM dengan fitur word embedding Word2Vec 300 dimensi. Dengan tuning hyperparameter dan membuat perbandingan delapan model LSTM yang diusulkan dengan dataset skala besar, dan untuk menunjukkan bahwa LSTM dengan fitur Word2Vec dapat mencapai kinerja yang baik dalam klasifikasi teks. Hasil penelitian menunjukkan bahwa klasifikasi teks menggunakan LSTM dengan fitur Word2Vec memperoleh akurasi tertinggi pada model kelima dengan 95,38%, sedangkan rata-rata nilai presisi, recall, dan F1-score adalah 95%. Selain itu, LSTM dengan fitur Word2Vec mendapatkan hasil grafik yang dekat dengan good-fit untuk model ketujuh dan kedelapan.  &nbsp

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)