17 research outputs found

    The Effect of Stemming and Removal of Stopwords on the Accuracy of Sentiment Analysis on Indonesian-language Texts

    Get PDF
    Preprocessing is an essential task for sentiment analysis since textual information carries a lot of noisy and unstructured data. Both stemming and stopword removal are pretty popular preprocessing techniques for text classification. However, the prior research gives different results concerning the influence of both methods toward accuracy on sentiment classification. Therefore, this paper conducts further investigations about the effect of stemming and stopword removal on Indonesian language sentiment analysis. Furthermore, we propose four preprocessing conditions which are with using both stemming and stopword removal, without using stemming, without using stopword removal, and without using both. Support Vector Machine was used for the classification algorithm and TF-IDF as a weighting scheme. The result was evaluated using confusion matrix and k-fold cross-validation methods. The experiments result show that all accuracy did not improve and tends to decrease when performing stemming or stopword removal scenarios. This work concludes that the application of stemming and stopword removal technique does not significantly affect the accuracy of sentiment analysis in Indonesian text documents

    Implementasi Software Plagiasi dan Google Classroom Untuk Membantu Penilaian Tugas Siswa Pada SMK Nasional Berbah-Seleman

    Get PDF
    Work on student assignments and assessments are done manually so that evaluation cannot be done objectively because many tasks are similar or even the same as other student assignments. Copying other people's work is unlawful; students lack an understanding of the definition of plagiarism. Therefore education about this is done early, especially for the world of knowledge, which incidentally really appreciates the work of others. Making plagiarism software is needed to answer these challenges; this service activity provides training to teachers in managing online-based student assignments and checking assignment documents using plagiarism software. This activity can make it easier for teachers to offer assignment assessments and provide students with an understanding of the originality of a work. Pengerjaan tugas-tugas siswa dan penilaiannya selama ini dilakukan secara manual sehingga penilaian tidak bisa dilakukan secara objektif mengingat banyaknya tugas yang mirip atau bahkan sama dengan tugas siswa lainnya. Menjiplak atau mencontek hasil karya orang lain adalah perbuatan yang melanggar hukum, bahwa minimnya pemahaman siswa atau mahasiswa tentang definisi plagiat. Oleh karena itu edukasi tentang hal ini dilakukan sejak dini terutama bagi dunia pendidikan yang notabene sangat menghargai sebuah hasil karya orang lain. Pembuatan software plagiasi dibutuhkan untuk menjawab tantangan tersebut, kegiatan pengabdian ini memberikan pelatihan kepada para guru dalam pengelolaan tugas siswa berbasis online serta pengecekan dokumen tugas menggunakan software plagiasi. Kegiatan ini mampu mempermudah guru dalam memberikan penilaian tugas serta memberikan pemahaman kepada siswa tentang orisinilitas sebuah hasil karya

    JARO WINKLER ALGORITHM FOR MEASURING SIMILARITY ONLINE NEWS

    Get PDF
    Online news is a source of information for people; this impacts journalists as news writers who can find news information quickly and accurately every day. Journalists can plagiarise other journalists or take news material from other news media sites and use it to publish in the media without including the source. An algorithm is needed to measure the similarity of online news. This work proposed the Jaro Winkler algorithm, with the value obtained from the calculation normalised so that the value 0 means there is no resemblance, and one means it has the exact resemblance. The data used is 20 online news media sites in the Central Kalimantan area. The Scraping process utilised the Custome Search JSON API and used keywords to get the news on the same topic. The results of the calculation of news similarity with the Jaro Winkler algorithm obtained an average value of online news similarity of 74.49%, with 43 news data with severe plagiarism levels and 12 news data with moderate plagiarism levels. There are weaknesses in the Jaro Winkler algorithm in calculating the similarity value in the data obtained. Some undetected data should have a heavy plagiarism level but not severe and vice versa

    Performance of Lexical Resource and Manual Labeling on Long Short-Term Memory Model for Text Classification

    Get PDF
    Data labeling is an essential stage in the sentiment analysis process. It involves assigning labels to text data in order to indicate the sentiment expressed in the text. This is typically done through manual labeling, where a human annotator reads the text and assigns a label based on their interpretation of the sentiment expressed. However, this process can be time-consuming and costly, especially when dealing with large volumes of text data. One way to automate the data labeling process is to use lexicon resources. These are dictionaries or databases of words and phrases that have been pre-labeled with sentiment information. By using a lexicon resource, it is possible to automatically assign labels to text data based on the sentiment expressed in the words and phrases it contains. However, the effectiveness of this approach is highly dependent on the quality of the lexicon resource being used. In this study, we compared the performance of a Long Short Term Memory (LSTM) model trained using manual labeling and several lexicon resources. The LSTM model was trained on a dataset of text data that had been manually labeled with sentiment information. The model was then tested on a data that had also been manually labeled. The results showed that the model trained with manual labeling outperformed the models trained using the lexicon resources, with a testing accuracy of 0.80. The highest-performing lexicon resource only achieved a testing accuracy of 0.56. These results suggest that the use of lexicon resources for document labeling cannot replace manual labeling, as the effectiveness of the lexicon is highly dependent on the quality of the dictionaries it uses. Manual labeling may be more time-consuming and costly, but it is likely to produce more accurate results for sentiment analysis tasks. In order to achieve high levels of accuracy in sentiment analysis, it may be necessary to rely on human annotators rather than automated lexicon resources

    Implementasi Website Berbasis Search Engine Optimization (SEO) Sebagai Media Promosi

    Get PDF
    Abstrak Kemajuan teknologi informasi berkembang secara pesat di berbagai bidang kehidupan. Internet adalah salah satu bagian dari teknologi informasi dan komunikasi  mempunyai efek dan pengaruh yang sangat besar. Website salah satu teknologi internet tidak hanya sebagai media informasi tetapi menjadi proses pendukung bisnis perusahaan, akan tetapi penjualan melalui website belum cukup efektif jika tidak didukung dengan strategi promosi yang baik. SEO (Search Engine Optimization) adalah salah satu teknik promosi dengan cara memanfaatkan pengoptimalan mesin pencari agar website yang sudah kita buat berada diperingkat teratas atau halaman pertama (first page) sebuah halaman mesin pencari. Peneltian ini dilakukan pada sebuah website yang awalnya belum dilakukan teknik-teknik dari SEO, kemudian dengan menerapkan metode SEO on Page seperti optimasi keyword pada title tag, content, meta keyword,meta description, dan  share ke sosial media, pada tahap ini juga dilakukan beberapa pengujian sebagai tolak ukur keberhasilan penerapan teknik-teknik SEO. Hasil dari penerapan teknik-teknik SEO mampu meningkatkan SERP (Search Engine Results Page) website di mesin pencari dan berhasil terindek oleh google berada di page kedua pada bulan kedua dan berhasil terindex di page  pertama dalam pencarian google denga waktu kurang dari 3 bulan.   Kata kunci : Website, SEO, Seo on Page, SERP   Abstract The advancement of information technology is growing rapidly in various fields of life. The internet is one part of information and communication technology has a very large effect and influence. The website not only serves as a medium of information but supports the company's business, but sales through the website are not effective enough if it is not supported by a good promotional strategy. SEO (Search Engine Optimization) is a promotional technique by utilizing search engine optimization so that the website we have created is at the top or first page (first page) of a search engine page.This research was conducted on a website that has not been done techniques from SEO, then by applying the SEO on Page method such as keyword optimization on title tags, content, meta keywords, meta description, and sharing to social media, at this stage several tests are also carried out. as a benchmark for the successful application of SEO techniques.The results of the application of SEO techniques are able to increase the SERP (Search Engine Results Page) of websites in search engines and successfully indexed by Google on the second page in the second month and successfully indexed on the first page in google search with less than 3 months.   Keywords: Website, SEO, Seo on Page, SER

    Random and Synthetic Over-Sampling Approach to Resolve Data Imbalance in Classification

    Get PDF
    High accuracy value is one of the parameters of the success of classification in predicting classes. The higher the value, the more correct the class prediction.  One way to improve accuracy is dataset has a balanced class composition. It is complicated to ensure the dataset has a stable class, especially in rare cases. This study used a blood donor dataset; the classification process predicts donors are feasible and not feasible; in this case, the reward ratio is quite high. This work aims to increase the number of minority class data randomly and synthetically so that the amount of data in both classes is balanced. The application of SOS and ROS succeeded in increasing the accuracy of inappropriate class recognition from 12% to 100% in the KNN algorithm. In contrast, the naïve Bayes algorithm did not experience an increase before and after the balancing process, which was 89%.

    Facial Images Improvement in the LBPH Algorithm Using the Histogram Equalization Method

    Get PDF
    In face recognition research, detecting several parts of the face becomes a necessary part of the study. The main factor in this work is lighting; some obstacles emerge when the low light's intensity falls in the process of face detection because of some conditions, such as weather, season, and sunlight. This study focuses on detecting faces in dim lighting using the Local Binary Pattern Histogram (LBPH) algorithm assisted by the Classifier Method, which is often used in face detection, namely the Haar Cascade Classifier. Furthermore, It will employ the image enhancement method, namely Histogram Equalization (HE), to improve the image source from the webcam. In the evaluation, different light intensities and various head poses affect the accuracy of the method. As a result, The research reaches 88% accuracy for successful face detection. Some factors such as head accessories, hair covering the face, and several parts of the face, like the eye, mouth, and nose that are invisible, should not be extreme

    Sistem kendali dan pemantauan penggunaan listrik berbasis IoT menggunakan Wemos dan aplikasi Blynk

    Get PDF
    This study aims to apply the Internet of Things (IoT) technology to control electronic devices and monitor the electrical power usage remotely via the Internet. The system was implemented using Wemos D1, ACS712 current sensor, relay, and the Blynk application as the system interface on the smartphone. The system used an average of 0.4-3.3 seconds to respond to commands from the Blynk application via a Wifi connection at a distance of 50-1000 meters and the device control and power monitoring system can function properly. The system response time was not affected by distance. This system with Wifi access can be an alternative to control devices and monitor their power usage in addition to a longer time of SMS access and shorter range of Bluetooth.Penelitian ini bertujuan mengkaji penerapan teknologi Internet of Things (IoT) untuk mengendalikan alat elektornik dan memantau daya listrik terpakai pada alat tersebut dari jarak jauh melalui Internet. Sistem diimplementasikan menggunakan Wemos D1, sensor arus ACS712, relay, dan aplikasi Blynk sebagai antarmuka sistem di smartphone. Sistem membutuhkan waktu rata-rata 0,4-3,3 detik untuk merespons perintah dari aplikasi Blynk melalui koneksi Wifi pada jarak 50-1000 meter serta sistem kendali dan pemantauan daya listrik dapat berfungsi dengan baik. Lama waktu respons sistem tidak dipengaruhi oleh jarak. Sistem dengan akses Wifi ini menjadi alternatif kendali alat dan pemantauan daya listrik selain akses SMS yang lebih lama dan Bluetooth dengan jangkauan lebih pendek

    Abstractive text summarization using Pre-Trained Language Model "Text-to-Text Transfer Transformer (T5)"

    Get PDF
    Automatic Text Summarization (ATS) is one of the utilizations of technological sophistication in terms of text processing assisting humans in producing a summary or key points of a document in large quantities. We use Indonesian language as objects because there are few resources in NLP research using Indonesian language. This paper utilized PLTMs (Pre-Trained Language Models) from the transformer architecture, namely T5 (Text-to-Text Transfer Transformer) which has been completed previously with a larger dataset. Evaluation in this study was measured through comparison of the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) calculation results between the reference summary and the model summary. The experiments with the pre-trained t5-base model with fine tuning parameters of 220M for the Indonesian news dataset yielded relatively high ROUGE values, namely ROUGE-1 = 0.68, ROUGE-2 = 0.61, and ROUGE-L = 0.65. The evaluation value worked well, but the resulting model has not achieved satisfactory results because in terms of abstraction, the model did not work optimally. We also found several errors in the reference summary in the dataset used

    Implementasi Algoritma Rabin-Karp untuk Pendeteksi Plagiarisme pada Dokumen Tugas Mahasiswa

    No full text
    Perkembangan pada dunia teknologi informasi mengakibatkan perguruan tinggi mengurangi penggunaan kertas sehingga banyak tugas mahasiswa yang dikumpulkan dalam bentuk digital. Penggunaan digital menyebabkan semakin mudahnya mahasiswa untuk melakukan plagiarisme. Sehingga diperlukan sebuah sistem untuk melakukan pemeriksaan plagiarisme pada dokumen tugas antar mahasiswa dengan cepat dan tepat. Metode yang dapat digunakan adalah menggunakan algoritma Rabin-Karp. Algoritma Rabin-Karp memiliki keunggulan pencarian string dengan pola yang panjang. Algoritma Rabin-karp dalam sistem ini memiliki langkah - langkah text preprocessing yang terdiri case folding, tokenizing, punctuation removal, stopword removal dan stemming. Hasil dari text preprocessing inilah yang akan di proses menggunakan algoritma Rabin-karp. Hasil dari metode ini adalah nilai kemiripan dari tugas - tugas mahasiswa yang dihitung menggunakan dice coefficient. Perhitungan akurasi dengan melakukan 20 perbandingan antara sistem pendeteksi plagiarisme dan software Plagiarisme Checker X menggunakan confusion matrix menghasilkan tingkat keakuratan sebesar 90%
    corecore