    A Multimodal Feature Selection Method for Remote Sensing Data Analysis Based on Double Graph Laplacian Diagonalization

    When dealing with multivariate remotely sensed records collected by multiple sensors, an accurate selection of information at the data, feature, or decision level is instrumental in improving the scenes’ characterization. This will also enhance the system’s efficiency and provide more details on modeling the physical phenomena occurring on the Earth’s surface. In this article, we introduce a flexible and efficient method based on graph Laplacians for information selection at different levels of data fusion. The proposed approach combines data structure and information content to address the limitations of existing graph-Laplacian-based methods in dealing with heterogeneous datasets. Moreover, it adapts the selection to each homogenous area of the considered images according to their underlying properties. Experimental tests carried out on several multivariate remote sensing datasets show the consistency of the proposed approach

    Ensemble of Feature Selection Techniques for High Dimensional Data

    Data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships from large amounts of data stored in databases, data warehouses, or other information repositories. Feature selection is an important preprocessing step of data mining that helps increase the predictive performance of a model. The main aim of feature selection is to choose a subset of features with high predictive information and eliminate irrelevant features with little or no predictive information. Using a single feature selection technique may generate local optima. In this thesis we propose an ensemble approach for feature selection, where multiple feature selection techniques are combined to yield more robust and stable results. Ensemble of multiple feature ranking techniques is performed in two steps. The first step involves creating a set of different feature selectors, each providing its sorted order of features, while the second step aggregates the results of all feature ranking techniques. The ensemble method used in our study is frequency count which is accompanied by mean to resolve any frequency count collision. Experiments conducted in this work are performed on the datasets collected from Kent Ridge bio-medical data repository. Lung Cancer dataset and Lymphoma dataset are selected from the repository to perform experiments. Lung Cancer dataset consists of 57 attributes and 32 instances and Lymphoma dataset consists of 4027 attributes and 96 ix instances. Experiments are performed on the reduced datasets obtained from feature ranking. These datasets are used to build the classification models. Model performance is evaluated in terms of AUC (Area under Receiver Operating Characteristic Curve) performance metric. ANOVA tests are also performed on the AUC performance metric. Experimental results suggest that ensemble of multiple feature selection techniques is more effective than an individual feature selection technique


    Twitter merupakan salah satu media sosial yang sering digunakan oleh pengguna Indonesia untuk menyebarkan hate speech dan abusive language. Untuk mendeteksi label tweetnya, dapat dilakukan teknik klasifikasi teks dengan menggunakan metode Naïve Bayes Classifier. Klasifikasi dilakukan sebanyak tiga kali di dalam penelitian ini, yaitu klasifikasi hate speech, abusive language, dan level hate speech. Kemudian, feature selection dilakukan untuk mencari model dengan akurasi terbaik. Hasil penelitian ini menunjukkan bahwa akurasi yang tertinggi untuk klasifikasi hate speech sebesar 82,992%, klasifikasi abusive language sebesar 86,484%, dan klasifikasi level hate speech sebesar 71,875%. Penerapan klasifikasi teks dengan metode Naïve Bayes Classifier untuk klasifikasi hate speech dan abusive language pada Twitter bahasa Indonesia dapat disimpulkan berhasil karena hasil akurasi yang tinggi. Kata kunci: akurasi, abusive language, hate speech, klasifikasi, naïve bayes classifier, twitte


    Ujaran kebencian (hatespeech) dan Bahasa kasar (abusive language) adalah suatu tindakan negatif yang sering kali terjadi di lingkungan kita. Terlebih lagi dengan bantuan teknologi yang semakin maju, siapa saja bisa melakukan penyebaran ujaran kebencian ataupun bahasa kasar ke siapa saja yang ia sukai. Umumnya peristiwa ini berlaku pada masa pemilihan presiden, bupati, dan sebagainya. Sering terjadi pertikaian antara masing-masing pihak yang berkepentingan. Salah satunya melalui media sosial twitter. Dengan melakukan sebuah tweet ujaran kebencian dan di re-tweet oleh sekelompok lainnya. Namun hampir tak dapat dibedakan sebuah tweet itu apakah termasuk ke dalam hatespeech ataupun abusive. Dengan adanya penelitian ini diharapkan dapat membantu kita semua untuk dapat membedakan keduanya dan lebih bijak lagi dalam ber-media sosial. Dengan menggunakan data twitter sebanyak 13.167 tweet serta metode Random Forest sebagai pengklasifikasi. Berdasarkan hasil pengujian metode Random Forest dengan menggunakan fitur terbaik dan parameter terbaik mendapatkan hasil akurasi sebesar 75,96%. Kata Kunci : hatespeech, abusive, tweet, twitter, Klasifikasi, Random Forest