    Word2Vec model for sentiment analysis of product reviews in Indonesian language

    Online product reviews have become a source of greatly valuable information for consumers in making purchase decisions and producers to improve their product and marketing strategies. However, it becomes more and more difficult for people to understand and evaluate what the general opinion about a particular product in manual way since the number of reviews available increases. Hence, the automatic way is preferred. One of the most popular techniques is using machine learning approach such as Support Vector Machine (SVM). In this study, we explore the use of Word2Vec model as features in the SVM based sentiment analysis of product reviews in Indonesian language. The experiment result show that SVM can performs well on the sentiment classification task using any model used. However, the Word2vec model has the lowest accuracy (only 0.70), compared to other baseline method including Bag of Words model using Binary TF, Raw TF, and TF.IDF. This is because only small dataset used to train the Word2Vec model. Word2Vec need large examples to learn the word representation and place similar words into closer position

    Automatic Complaint Classification System Using Classifier Ensembles

    Sambat Online is an online complaint system run by the city government of Malang, Indonesia. Because most citizens do not know to which work units (Satuan Kerja Pemerintah Daerah [SKPDs]) their complaints should be sent, the system administrator must manually sort and classify all of the incoming complaints with respect to the appropriate SKPDs. This study empirically evaluated the application of an automated system to replace the manual classification process. The experiments, which used Sambat Online data, involved five individual classification algorithms— Naïve Bayes, Maximum Entropy, K-Nearest Neighbors, Random Forest, and Support Vector Machines—and two ensemble strategies—hard voting and soft voting. The results show that the Multinomial Naïve Bayes classifier achieved the best performance, an 80.7% accuracy value, of the five individual classifiers. The results also indicate that generally all of the ensemble methods performed better than the individual classifiers. Almost all of them had the same accuracy level of 81.2%. In addition, the soft voting strategy had slightly higher accuracy than the hard one when all five classifiers were used. However, when the three best classifier combinations were used, both had the same level of accuracy

    Arabic Book Retrieval using Class and Book Index Based Term Weighting

    One of the most common issue in information retrieval is documents ranking. Documents ranking system collects search terms from the user and orderly retrieves documents based on the relevance. Vector space models based on TF.IDF term weighting is the most common method for this topic. In this study, we are concerned with the study of automatic retrieval of Islamic Fiqh (Law) book collection. This collection contains many books, each of which has tens to hundreds of pages. Each page of the book is treated as a document that will be ranked based on the user query. We developed class-based indexing method called inverse class frequency (ICF) and book-based indexing method inverse book frequency (IBF) for this Arabic information retrieval. Those method then been incorporated with the previous method so that it becomes TF.IDF.ICF.IBF. The term weighting method also used for feature selection due to high dimensionality of the feature space. This novel method was tested using a dataset from 13 Arabic Fiqh e-books. The experimental results showed that the proposed method have the highest precision, recall, and F-Measure than the other three methods at variations of feature selection. The best performance of this method was obtained when using best 1000 features by precision value of 76%, recall value of 74%, and F-Measure value of 75%

    Improving Sentiment Analysis of Short Informal Indonesian Product Reviews using Synonym Based Feature Expansion

    Sentiment analysis in short informal texts like product reviews is more challenging. Short texts are sparse, noisy, and lack of context information. Traditional text classification methods may not be suitable for analyzing sentiment of short texts given all those difficulties. A common approach to overcome these problems is to enrich the original texts with additional semantics to make it appear like a large document of text. Then, traditional classification methods can be applied to it. In this study, we developed an automatic sentiment analysis system of short informal Indonesian texts using Naïve Bayes and Synonym Based Feature Expansion. The system consists of three main stages, preprocessing and normalization, features expansion and classification. After preprocessing and normalization, we utilize Kateglo to find some synonyms of every words in original texts and append them. Finally, the text is classified using Naïve Bayes. The experiment shows that the proposed method can improve the performance of sentiment analysis of short informal Indonesian product reviews. The best sentiment classification performance using proposed feature expansion is obtained by accuracy of 98%.The experiment also show that feature expansion will give higher improvement in small number of training data than in the large number of them

    Twitter Sentiment Analysis on 2013 Curriculum Using Ensemble Features and K-Nearest Neighbor

    2013 curriculum is a new curriculum in the Indonesian education system which has been enacted by the government to replace KTSP curriculum. The implementation of this curriculum in the last few years has sparked various opinions among students, teachers, and public in general, especially on social media twitter. In this study, a sentimental analysis on 2013 curriculum is conducted. Ensemble of several feature sets were used twitter specific features, textual features, Parts of Speech (POS) features, lexicon based features, and Bag of Words (BOW) features for the sentiment classification using K-Nearest Neighbor method. The experiment result showed that the the ensemble features have the best performance of sentiment classification compared to only using individual features. The best accuracy using ensemble features is 96% when k=5 is used

    An academic perspective of assessment questions bank

    There are several electronic assessment systems being used in institutions of higher education (HE), especially in Open and Distance Learning (ODL) institutions. Some of these institutions built their assessment system into their institution’s Virtual Learning Environments (VLE). Most of these assessment systems are for general purposes where assessment questions are in the form of simple multiple choice question (MCQ) or short-answer questions. In practice, these types of assessment questions do not match many of the current learning requirements and learning outcomes. The concept of an assessment question bank that can be used by academics to share assessment content within or across an institution is not new, but the advancement of technology and technical developments now have made such a repository realizable than ever before. A question bank is now a specialized repository that can be accessed via a web interface for platform independence. The use of technology in developing the question bank provides much relief for the chores associated with preparing assessments, which in turn enhances the quality of the questions and improves the quality of the assessments. This paper presents the experience of Open University Malaysia (OUM) in developing its own Question Bank (QBank). This QBank system is designed to help the Subject Matter Experts (SMEs) who need to develop, classify and store their assessment such as MCQ and essay-type exam questions. This software is integrated with the OUM’s Virtual Learning Environments (myVLE) in order to allow easier and wider access to the SMEs and faculty. (Abstract by author

    Jurnal Arkeologi Siddhayatra Vol.20 No.2 Tahun 2015

    Jurnal Jurnal terbitan bulan November ini terdiri dari enam tulisan, yang berdasarkan kronologi data yang digunakan berasal dari masa prasejarah sampai masa kolonial. Adapun topik yang ditulis juga menampilkan variasi yang berbeda, yaitu berkaitan dengan permukiman, studi gender, teknologi dan metode penelitian arkeologi. Tulisan-tulisan ini antara lain arkeologi Makam Sultan Muhammad Ali Ternate di Maluku Utara, Perempuan dan tradisi ziatah makam, penggunaan total station dalam perekaman data arkeologi di Indonesia, Seni lukis dan gores pada Megalitik Pasemah, Provinsi Sumatera Selatan, Batu bergores (batu Gong) di tepi sungai Mesumai Jambi kajian awal seni cadas, Megalitik dalam konteks kekinian, legenda dibalik batu Larung (kajian etnografi mengenai hubungan mitos dan artefak megalit

    Fast Obstacle Distance Estimation using Laser Line Imaging Technique for Smart Wheelchair

    This paper presents an approach of obstacle distance estimation for smart wheelchair. A smart wheelchair was equipped with a camera and a laser line. The camera was used to capture an image from the environment in order to sense the pathway condition. The laser line was used in combination with camera to recognize an obstacle in the pathway based on the shape of laser line image in certain angle. A blob method detection was then applied on the laser line image to separate and recognize the pattern of the detected obstacles. The laser line projector and camera which was mounted in fixed-certain position ensured a fixed relation between blobs-gap and obstacle-to-wheelchair distance. A simple linear regression from 16 obtained data was used to respresent this relation as the estimated obstacle distance. As a result, the average error between the estimation and the actual distance was 1.25 cm from 7 data testing experiments. Therefore, the experiment results show that the proposed method was able to estimate the distance between wheelchair and the obstacle

    Analisis Sentimen Pada Ulasan Aplikasi Mobile Menggunakan Naive Bayes dan Normalisasi Kata Berbasis Levenshtein Distance (Studi Kasus Aplikasi BCA Mobile)

    Perkembangan aplikasi mobile yang pesat membuat banyak aplikasi diciptakan dengan berbagai kegunaan untuk memenuhi kebutuhan pengguna. Setiap aplikasi memungkinkan pengguna untuk memberi ulasan tentang aplikasi tersebut. Tujuan dari ulasan adalah untuk mengevaluasi dan meningkatkan kualitas produk ke depannya. Untuk mengetahui hal tersebut, analisis sentimen dapat digunakan untuk mengklasifikasikan ulasan ke dalam sentimen positif atau negatif. Pada ulasan aplikasi biasanya terdapat salah eja sehingga sulit dipahami. Kata yang mengalami salah eja perlu dilakukan normalisasi kata untuk diubah menjadi kata standar. Karena itu, normalisasi kata dibutuhkan untuk menyelesaikan masalah salah eja. Penelitian ini menggunakan normalisasi kata berbasis Levenshtein distance. Berdasarkan pengujian, nilai akurasi tertinggi terdapat pada perbandingan data latih 70% dan data uji 30%. Hasil akurasi tertinggi dari pengujian menggunakan nilai edit <=2 adalah 100%, nilai edit tertinggi kedua didapat pada nilai edit <=1 dengan akurasi 96,4%, sedangkan nilai edit dengan akurasi terendah diperoleh pada nilai edit <=4 dan <=5 dengan akurasi 66,6%. Hasil dari pengujian Naive Bayes-Levenshtein Distance memiliki nilai akurasi tertinggi yaitu 96,9% dibandingkan dengan pengujian Naive Bayes tanpa Levenshtein Distance dengan nilai akurasi 94,4%. &nbsp

    Word Sense Disambiguation (WSD) for Indonesian Homograph Word Meaning Determination by LESK Algorithm Application

    Indonesian has several words which are commonly known as ambiguous words, confusing the meaning of a sentence or a statement to be less understood or even not delivered. It is different to a human perception which has linguistic ability to determine the meaning of ambiguity or more than a word meaning. Word Sense Disambiguation is one of a topic from natural language processing (NLP) which deals with ambiguity handling. Word Sense Disambiguation is a linguistic computational process which aims to identify the proper meaning of words based on the context. This current study is designed as a system to handle the ambiguous words. It is conducted by looking up and defining the meaning of ambiguous words by using LESK algorithm. The test is performed towards the functionality from a system in which the result system test is in line with test data from KBBI. The result presents accuracy level of 78.6% for one of an ambiguous word and 62.5 % for two of ambiguous words in determining the meaning appropriately