Search CORE

8 research outputs found

Application for Selection of Student Final Project Supervisors Based on the Selected Category and Expertise of Lecturers Using the Naive Bayes Classifier Method

Author: Ikhsanudin Muhamad
Irawan Yuda
Publication venue: 'Universitas Muhammadiyah Yogyakarta'
Publication date: 01/07/2021
Field of study

At the end of the task the supervisor has an important role for the success achieved and graduation of students. For this reason, ideal supervisors are needed for students. As discussed in the STMIK Hang Tuah Pekanbaru in the process of submitting the title of this thesis, so are some of the problems that arise, namely regarding the matter of coaching because the process is still using conventional methods that is based on personal knowledge of the Head of Study Program, the difficulty of the development process of submitting the Student's final position for difficulties check the final supervisor's assignment. The application of selecting the final project supervisor for students is the solution of the debate. The supervisor lecturer recommendation system that can utilize the naïve bayes classifier algorithm as a determinant of the probability of the lecturer results students can choose. Naive Bayes is a prediction technique based on simple probabilities based on the application of the Bayes theorem (Bayes rule) with a strong assumption of independence. The selection is based on the final criteria for the assignment and expertise of the lecturer. From the application of this recommendation is obtained from the recommendations of supervisors in accordance with the concept of the student's final project. With reference data, training and Bayes rules obtained sufficient results to satisfy students in getting a supervisor who is in accordance with the topic of the student's final project

Leading & Enlightening Journal UMY

PREDIKSI RATING FILM MENGGUNAKAN METODE NAIVE BAYES

Author: Riszki Wijayatun Pratiwi
Yusuf Sulistyo Nugroho
Publication venue: 'Universitas Negeri Semarang - Departmant of Guidance and Counseling'
Publication date: 01/12/2016
Field of study

Pada saat ini perkembangan dunia perfilman sudah sangat pesat, contohnya dengan banyaknya film-film yang silih berganti untuk ditayangkan, Para penikmat film juga membutuhkan film-film yang mempunyai kualitas gambar, suara, alur cerita dan nilai positif yang baik dalam sebuah film, agar mereka tetap antusias dalam mengikuti film-film yang terbaru. Namun film-film yang ada tidak semuanya dapat dinikmati dan tidak semua kalangan menyukai semua film. Agar suatu film dapat terus berkembang, tentunya membutuhkan penilaian-penilaian dari para penikmat film, untuk mengetahui selera film yang sesuai dengan para penikmat film. Untuk itu dibutuhkan analisis agar dapat mengetahui bagaimana minat penikmat film yaitu dengan membuat penilaian-penilaian yang nantinya digunakan untuk mengetahui rating suatu film menggunakan metode nae bayes yaitu metode yang melakukan pendekatan statistika yang fundamental dalam pengenalan pola (pattern recognition). Pendekatan ini didasarkan pada kuantifikasi trade-off antara baerbagai keputusan klasifikasi dengan menggunakan probanilitas dan resiko yang ditimbulkan dalam keputusan-keputusan tersebut. Metode tersebut merupakan salah satu metode dari data mining, dengan atribut yang sudah ditentukan, yaitu meliputi genre film, aktor film, bahasa,warna, durasi film, negara, dan lainnya yang dapat digunakan sebagai tolak ukur sutradara untuk membuat film

Directory of Open Access Journals

Topic Modeling on Online News.Portal Using Latent Dirichlet Allocation (LDA)

Author: Fahlevvi Mohammad Rezza
SN Azhari
Publication venue: 'Universitas Gadjah Mada'
Publication date: 01/10/2022
Field of study

The amount of News displayed on online news portals. Often does not indicate the topic being discussed, but the News can be read and analyzed. You can find the main issues and trends in the News being discussed. It would be best if you had a quick and efficient way to find trending topics in the News. One of the methods that can be used to solve this problem is topic modeling. Theme modeling is necessary to allow users to easily and quickly understand modern themes' development. One of the algorithms in topic modeling is the Latent Dirichlet Allocation (LDA). This research stage begins with data collection, preprocessing, n-gram formation, dictionary representation, weighting, topic model validation, topic model formation, and topic modeling results. Based on the results of the topic evaluation, the. The best value of topic modeling using coherence was related to the number of passes. The number of topics produced 20 keys, five cases with a 0.53 coherence value. It can be said to be relatively stable based on the standard coherence value

Directory of Open Access Journals

IJCCS (Indonesian Journal of Computing and Cybernetics Systems)

Analisis Sentimen penggunaan Mypertamina untuk Pembelian BBM Bersubsidi mengggunakan Algoritma Naive Bayes

Author: Zahra Denada Fatimah
Publication venue: 'Universitas Bandar Lampung Publication Center'
Publication date: 03/07/2023
Field of study

Penelitian ini bertujuan untuk menganalisis sentimen penggunaan aplikasi Mypertamina dalam pembelian bahan bakar minyak (BBM) bersubsidi menggunakan algoritma Naive Bayes. Penelitian ini melibatkan tahap pre-processing data, seperti full preprocessing dan penghilangan stopword, serta pengujian akurasi dengan variasi pembagian data latih dan data uji. Hasil penelitian menunjukkan bahwa dengan melakukan full preprocessing pada data dan menggunakan 70 persen data latih, model klasifikasi mencapai akurasi sebesar 85%. Penggunaan 80 persen data latih meningkatkan akurasi menjadi 87 persen, sedangkan penggunaan 90 persen data latih menghasilkan akurasi sebesar 89 persen. Hal ini menunjukkan bahwa semakin banyak data latih yang digunakan, semakin baik performa model klasifikasi. Penghilangan stopword juga berdampak signifikan terhadap akurasi model. Tanpa penghilangan stopword, akurasi model dengan pembagian data 70 persen, 80 persen, dan 90 persen adalah 80 persen, 82 persen, dan 84 persen secara berturut-turut. Meskipun akurasi lebih rendah dibandingkan dengan full preprocessing, model tetap memberikan prediksi yang cukup baik. Berdasarkan hasil pengujian tersebut, dapat disimpulkan bahwa penerapan full preprocessing dengan lebih banyak data latih cenderung menghasilkan kinerja model yang lebih baik. Namun, penghilangan stopword juga memberikan kontribusi signifikan dalam meningkatkan akurasi. Oleh karena itu, dalam pengembangan model klasifikasi teks, pre-processing yang komprehensif dan penghilangan stopword yang tepat perlu dipertimbangkan sesuai dengan karakteristik data dan kebutuhan analisis. Dalam pengujian klasifikasi menggunakan metode Naïve Bayes Classifier, pembagian data latih dan data uji juga berpengaruh. Penggunaan 70 persen data latih menghasilkan akurasi 85 persen, sedangkan penggunaan 80 persen dan 90 persen data latih menghasilkan akurasi 87 persen dan 89 persen secara berturut-turut. Semakin banyak data latih yang digunakan, semakin baik performa model klasifikasi Naïve Bayes Classifier. Dalam kesimpulan akhir, proporsi 90% data latih memberikan performa terbaik dalam mengklasifikasikan data uji dengan akurasi tertinggi. Namun, penggunaan data uji yang lebih kecil dapat menyebabkan variasi hasil yang lebih tinggi. Oleh karena itu, metode validasi silang atau pengujian dengan lebih banyak fold dapat memberikan informasi yang lebih komprehensif tentang performa model klasifikasi

Jurnal - Universitas Bandar Lampung (UBL)

EXPLORE

Fair Preprocessing: Towards Understanding Compositional Fairness of Data Transformers in Machine Learning Pipeline

Author: Biswas Sumon
Rajan Hridesh
Rajan Hridesh
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/06/2021
Field of study

In recent years, many incidents have been reported where machine learning models exhibited discrimination among people based on race, sex, age, etc. Research has been conducted to measure and mitigate unfairness in machine learning models. For a machine learning task, it is a common practice to build a pipeline that includes an ordered set of data preprocessing stages followed by a classifier. However, most of the research on fairness has considered a single classifier based prediction task. What are the fairness impacts of the preprocessing stages in machine learning pipeline? Furthermore, studies showed that often the root cause of unfairness is ingrained in the data itself, rather than the model. But no research has been conducted to measure the unfairness caused by a specific transformation made in the data preprocessing stage. In this paper, we introduced the causal method of fairness to reason about the fairness impact of data preprocessing stages in ML pipeline. We leveraged existing metrics to define the fairness measures of the stages. Then we conducted a detailed fairness evaluation of the preprocessing stages in 37 pipelines collected from three different sources. Our results show that certain data transformers are causing the model to exhibit unfairness. We identified a number of fairness patterns in several categories of data transformers. Finally, we showed how the local fairness of a preprocessing stage composes in the global fairness of the pipeline. We used the fairness composition to choose appropriate downstream transformer that mitigates unfairness in the machine learning pipeline.Comment: ESEC/FSE'2021: The 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, August 23-28, 202

arXiv.org e-Print Archive

Digital Repository @ Iowa State University (ISU)

Modelado e implementación de algoritmos inteligentes de análisis de opinión

Author: Tessore Juan Pablo
Publication venue
Publication date: 03/10/2023
Field of study

A la par de la amplia adopción que han tenido las redes sociales, ha crecido también la generación contenidos en ellas, en particular en forma de texto. La proliferación de este tipo de contenido ha creado la materia prima necesaria para aplicar técnicas de minería de textos a esos datos con el objetivo de extraer información valiosa. Numerosos trabajos que intentan categorizar, mediante clasificadores basados en aprendizaje automático, textos provenientes de redes sociales, dependen del etiquetado manual del contenido o de la utilización de datasets públicos previamente etiquetados. Dichos abordajes presentan sus inconvenientes, uno de ellos es el tiempo que demanda la clasificación manual de los datos de entrenamiento. Otro problema es que los clasificadores suelen construirse utilizando datos de distinto origen a los que finalmente analizan, esto plantea un desafío debido a que, si el clasificador no fue expuesto, durante la etapa de entrenamiento, a datos similares a los que finalmente debe categorizar, difícilmente pueda hacerlo de manera adecuada. Por otro lado, la cantidad de recursos disponibles (tales como datasets etiquetados, corpus o diccionarios afectivos) no es abundante para idiomas distintos del inglés, limitando las posibilidades de construcción de los mencionados clasificadores de texto para otros idiomas, entre ellos el español. La tarea de recopilación y validación de recursos en el idioma a utilizar se vuelve, en consecuencia, una necesidad para construir clasificadores de texto, basados en aprendizaje automático supervisado. Sin embargo, dichas tareas son extremadamente demandantes en tiempo y recursos humanos. Esta problemática se agrava para los casos en los que el criterio de clasificación no es objetivo, como por ejemplo para la clasificación de emociones en texto. En estas situaciones, se requiere que varios jueces clasifiquen el mismo contenido, de manera de poder validar la veracidad de la etiqueta asignada al mismo. Con el objetivo de agilizar el desarrollo de clasificadores de emociones en texto para el idioma español basados en aprendizaje automático supervisado, resulta necesario reducir o eliminar la necesidad del etiquetado manual de los datasets utilizados para entrenamiento. En esta tesis, a diferencia de otros estudios, las etiquetas que denotan la emoción de cada comentario se obtienen automáticamente de los mismos usuarios que escriben el contenido, en lugar de clasificarlos de manera manual. Posteriormente, se define un procedimiento para realizar la validación de las etiquetas recopiladas, el cual requiere del etiquetado y validación manual de sólo una pequeña muestra de las mismas y posterior cálculo de métricas para establecer el nivel de consenso. A su vez, durante el proceso de captura de los documentos, se obtiene también información contextual relacionada con los mismos, con el objetivo de utilizarla para medir los cambios, ya sean mejoras o no, en el desempeño de distintos clasificadores basados en aprendizaje automático. El proceso que se presenta en esta tesis, permite agilizar la construcción de clasificadores de emociones en texto basados en aprendizaje automático y a su vez mejorar su desempeño mediante el uso de información contextual. Estos clasificadores pueden ser utilizados para ofrecer una amplia variedad de propósitos potenciales, como detectar la emoción que surge de la opinión de grandes grupos de personas sobre ciertos productos, servicios o incluso políticas públicas. También podrían utilizarse para identificar demandas o quejas no satisfechas de ciudadanos; o, en seguridad, para la detección automática de factores de riesgo en redes sociales, como amenazas, hostigamiento o acoso. Los clasificadores construidos a partir del proceso mencionado, alcanzan un desempeño similar al de otros entrenados con datasets etiquetados manualmente. Debe resaltarse que, en el trabajo presentado, la necesidad de etiquetado manual en el proceso de recolección y clasificación se reduce significativamente. El conjunto de datos creado puede ser utilizado en diversas investigaciones que realicen Análisis de Sentimientos en español. Además, el proceso de recopilación y validación presentado en esta tesis puede adaptarse fácilmente para generar nuevos datasets en temas o idiomas específicos.Alongside the widespread adoption of social media, the generation of content on these platforms, particularly in text, has also grown. The proliferation of this type of content has provided the necessary raw material to apply text-mining techniques to extract valuable information from the data. Numerous studies attempting to categorize texts from social media using machine learning classifiers rely on manual content labeling or using pre-labeled public datasets. These approaches have their drawbacks, including the time-consuming process of manually classifying the training data. Another problem is that classifiers are often built using data from different sources than those they analyze. This poses a challenge because if the classifier hasn't been exposed to similar data during the training phase, it will have difficulty categorizing it correctly. Additionally, the availability of resources such as labeled datasets, corpora, or affective dictionaries is limited for languages other than English, restricting the possibilities of constructing aforementioned text classifiers for other languages, including Spanish. As a result, the collection and validation of resources in the target language become necessary for building supervised machine learning-based text classifiers. However, these tasks are extremely time-consuming and resource-intensive. This problem is exacerbated in cases where the classification criterion is not objective, such as emotion classification in text. In these situations, multiple judges are required to classify the same content to validate the accuracy of the assigned label. To expedite the development of supervised machine learning-based emotion classifiers for the Spanish language, reducing or eliminating the need for manual labeling of the datasets used for training is necessary. In this thesis, unlike other studies, the labels denoting the emotion of each comment are automatically obtained from the users who write the content rather than manually classifying them. Subsequently, a procedure is defined to validate the collected labels, which only requires manual labeling and validation of a small sample of them, followed by the calculation of metrics to establish the level of consensus. Furthermore, during the document collection process, contextual information related to the documents is also obtained and used to measure the changes, whether improvements or not, in the performance of different machine learning-based classifiers. The process presented in this thesis allows for streamlining the construction of text-based emotion classifiers using machine learning and enhancing their performance using contextual information. These classifiers can be used for a wide variety of potential purposes, such as detecting the sentiment arising from the opinions of large groups of people about specific products, services, or even public policies. They could also be used to identify unmet demands or complaints from citizens or, in security, to automatically detect risk factors in social networks, such as threats, harassment, or bullying. The classifiers built using the mentioned process perform similarly to others trained with manually labeled datasets. It should be emphasized that in the presented work, the need for manual labeling in the collection and classification process is significantly reduced. The constructed dataset can be used for various research purposes involving Sentiment Analysis in Spanish. Furthermore, the collection and validation process presented in this thesis can be easily adapted to generate new resources for specific domains or languages.Doctor en Ciencias InformáticasUniversidad Nacional de La PlataFacultad de Informátic

Servicio de Difusión de la Creación Intelectual

Computational intelligence in extra low voltage direct currrent pico-grids

Author: Quek Yang Thee
Publication venue: Newcastle University
Publication date: 01/01/2019
Field of study

Ph. D. ThesisThe modern power system has gone through a lot of changes over the past few years. It is no longer about providing one-way power from sources to various loads. Power monitoring and management have become an increasingly essential task with the growing trend to provide users more information about the status of the loads within their energy consumption so that they can make an informed decision to reduce usage and cost or request desired maintenance. Computational intelligence has been successfully implemented in the electrical power systems to aid the user, but these research studies about this are generally conducted on the conventional alternative current (AC) macro-grids. Until now, little work has been done on direct current (DC) and the focus on smaller DC grids has been even less. In recent years, the evolution of electrical power system has seen the proliferation of direct current (DC) appliances and equipment such as buildings, households and office loads. This number keeps increasing with the advancement in technology and consumer lifestyles changes. Given that DC power supplies are getting more popular in the form of photovoltaic panels and batteries, it is possible for Extra Low Voltage (ELV) DC households or office pico-grids to come into use soon. This research recognises and addresses this research gap in the monitoring and managing of the DC picogrids. It recommends and applies the bottom-up monitoring and management approach in smaller scale grids and in larger scale grids. It innovatively categorises the loads in the grids into dumb loads that do not have intelligence and communication features and smart loads that have these features. While targeting at these ELV DC pico-grids, this research presents solutions that provide users useful information on load classification, load disaggregation, anomaly warning and early fault detection. It provides local and remote sensing with the alternative use of hardware to lessen the computational burden from the main computer. The inclusion of remote monitoring has opened a window of opportunities for Internet of Things (IoT) implementation. These solutions involve the blending of computational intelligence techniques with enhanced algorithms, such as K-Means algorithm, k-Nearest Neighbours (kNN) classification, Naïve Bayes Classification (NBC) Theorem, Statistical Process Control (SPC) and Long Short-Term Memory Recurrent Neural Network (LSTM RNN). As demonstrated in this research, these solutions produce high accuracy results in load classification and early anomaly detection in both AC and DC pico-grids. In addition to the load side, this research features a short-term PV energy forecasting technique that is easily comprehensible to users. This research contributes to the implementation of the Smart Grid with possible IoT features in DC pico-grids

Newcastle University eTheses