8 research outputs found

    Application for Selection of Student Final Project Supervisors Based on the Selected Category and Expertise of Lecturers Using the Naive Bayes Classifier Method

    Get PDF
    At the end of the task the supervisor has an important role for the success achieved and graduation of students. For this reason, ideal supervisors are needed for students. As discussed in the STMIK Hang Tuah Pekanbaru in the process of submitting the title of this thesis, so are some of the problems that arise, namely regarding the matter of coaching because the process is still using conventional methods that is based on personal knowledge of the Head of Study Program, the difficulty of the development process of submitting the Student's final position for difficulties check the final supervisor's assignment. The application of selecting the final project supervisor for students is the solution of the debate. The supervisor lecturer recommendation system that can utilize the na茂ve bayes classifier algorithm as a determinant of the probability of the lecturer results students can choose. Naive Bayes is a prediction technique based on simple probabilities based on the application of the Bayes theorem (Bayes rule) with a strong assumption of independence. The selection is based on the final criteria for the assignment and expertise of the lecturer. From the application of this recommendation is obtained from the recommendations of supervisors in accordance with the concept of the student's final project. With reference data, training and Bayes rules obtained sufficient results to satisfy students in getting a supervisor who is in accordance with the topic of the student's final project

    PREDIKSI RATING FILM MENGGUNAKAN METODE NAIVE BAYES

    Get PDF
    Pada saat ini perkembangan dunia perfilman sudah sangat pesat, contohnya dengan banyaknya film-film yang silih berganti untuk ditayangkan, Para penikmat film juga membutuhkan film-film yang mempunyai kualitas gambar, suara, alur cerita dan nilai positif yang baik dalam sebuah film, agar mereka tetap antusias dalam mengikuti film-film yang terbaru. Namun film-film yang ada tidak semuanya dapat dinikmati dan tidak semua kalangan menyukai semua film. Agar suatu film dapat terus berkembang, tentunya membutuhkan penilaian-penilaian dari para penikmat film, untuk mengetahui selera film yang sesuai dengan para penikmat film. Untuk itu dibutuhkan analisis agar dapat mengetahui bagaimana minat penikmat film yaitu dengan membuat penilaian-penilaian yang nantinya digunakan untuk mengetahui rating suatu film menggunakan metode nae bayes yaitu metode yang melakukan pendekatan statistika yang fundamental dalam pengenalan pola (pattern recognition). Pendekatan ini didasarkan pada kuantifikasi trade-off antara baerbagai keputusan klasifikasi dengan menggunakan probanilitas dan resiko yang ditimbulkan dalam keputusan-keputusan tersebut. Metode tersebut merupakan salah satu metode dari data mining, dengan atribut yang sudah ditentukan, yaitu meliputi genre film, aktor film, bahasa,warna, durasi film, negara, dan lainnya yang dapat digunakan sebagai tolak ukur sutradara untuk membuat film

    Topic Modeling on Online News.Portal Using Latent Dirichlet Allocation (LDA)

    Get PDF
    The amount of News displayed on online news portals. Often does not indicate the topic being discussed, but the News can be read and analyzed. You can find the main issues and trends in the News being discussed. It would be best if you had a quick and efficient way to find trending topics in the News. One of the methods that can be used to solve this problem is topic modeling. Theme modeling is necessary to allow users to easily and quickly understand modern themes' development. One of the algorithms in topic modeling is the Latent Dirichlet Allocation (LDA). This research stage begins with data collection, preprocessing, n-gram formation, dictionary representation, weighting, topic model validation, topic model formation, and topic modeling results.聽聽聽聽聽聽聽聽聽聽聽 Based on the results of the topic evaluation, the. The best value of topic modeling using coherence was related to the number of passes. The number of topics produced 20 keys, five cases with a 0.53 coherence value. It can be said to be relatively stable based on the standard coherence value

    Analisis Sentimen penggunaan Mypertamina untuk Pembelian BBM Bersubsidi mengggunakan Algoritma Naive Bayes

    Get PDF
    Penelitian ini bertujuan untuk menganalisis sentimen penggunaan aplikasi Mypertamina dalam pembelian bahan bakar minyak (BBM) bersubsidi menggunakan algoritma Naive Bayes. Penelitian ini melibatkan tahap pre-processing data, seperti full preprocessing dan penghilangan stopword, serta pengujian akurasi dengan variasi pembagian data latih dan data uji. Hasil penelitian menunjukkan bahwa dengan melakukan full preprocessing pada data dan menggunakan 70聽persen data latih, model klasifikasi mencapai akurasi sebesar 85%. Penggunaan 80聽persen data latih meningkatkan akurasi menjadi 87聽persen, sedangkan penggunaan 90聽persen data latih menghasilkan akurasi sebesar 89 聽persen. Hal ini menunjukkan bahwa semakin banyak data latih yang digunakan, semakin baik performa model klasifikasi. Penghilangan stopword juga berdampak signifikan terhadap akurasi model. Tanpa penghilangan stopword, akurasi model dengan pembagian data 70聽persen, 80聽persen, dan 90聽persen adalah 80聽persen, 82聽persen, dan 84聽persen secara berturut-turut. Meskipun akurasi lebih rendah dibandingkan dengan full preprocessing, model tetap memberikan prediksi yang cukup baik. Berdasarkan hasil pengujian tersebut, dapat disimpulkan bahwa penerapan full preprocessing dengan lebih banyak data latih cenderung menghasilkan kinerja model yang lebih baik. Namun, penghilangan stopword juga memberikan kontribusi signifikan dalam meningkatkan akurasi. Oleh karena itu, dalam pengembangan model klasifikasi teks, pre-processing yang komprehensif dan penghilangan stopword yang tepat perlu dipertimbangkan sesuai dengan karakteristik data dan kebutuhan analisis. Dalam pengujian klasifikasi menggunakan metode Na茂ve Bayes Classifier, pembagian data latih dan data uji juga berpengaruh. Penggunaan 70 persen data latih menghasilkan akurasi 85 聽persen, sedangkan penggunaan 80聽聽persen dan 90聽persen data latih menghasilkan akurasi 87聽聽persen dan 89聽persen secara berturut-turut. Semakin banyak data latih yang digunakan, semakin baik performa model klasifikasi Na茂ve Bayes Classifier. Dalam kesimpulan akhir, proporsi 90% data latih memberikan performa terbaik dalam mengklasifikasikan data uji dengan akurasi tertinggi. Namun, penggunaan data uji yang lebih kecil dapat menyebabkan variasi hasil yang lebih tinggi. Oleh karena itu, metode validasi silang atau pengujian dengan lebih banyak fold dapat memberikan informasi yang lebih komprehensif tentang performa model klasifikasi

    Fair Preprocessing: Towards Understanding Compositional Fairness of Data Transformers in Machine Learning Pipeline

    Get PDF
    In recent years, many incidents have been reported where machine learning models exhibited discrimination among people based on race, sex, age, etc. Research has been conducted to measure and mitigate unfairness in machine learning models. For a machine learning task, it is a common practice to build a pipeline that includes an ordered set of data preprocessing stages followed by a classifier. However, most of the research on fairness has considered a single classifier based prediction task. What are the fairness impacts of the preprocessing stages in machine learning pipeline? Furthermore, studies showed that often the root cause of unfairness is ingrained in the data itself, rather than the model. But no research has been conducted to measure the unfairness caused by a specific transformation made in the data preprocessing stage. In this paper, we introduced the causal method of fairness to reason about the fairness impact of data preprocessing stages in ML pipeline. We leveraged existing metrics to define the fairness measures of the stages. Then we conducted a detailed fairness evaluation of the preprocessing stages in 37 pipelines collected from three different sources. Our results show that certain data transformers are causing the model to exhibit unfairness. We identified a number of fairness patterns in several categories of data transformers. Finally, we showed how the local fairness of a preprocessing stage composes in the global fairness of the pipeline. We used the fairness composition to choose appropriate downstream transformer that mitigates unfairness in the machine learning pipeline.Comment: ESEC/FSE'2021: The 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, August 23-28, 202

    Modelado e implementaci贸n de algoritmos inteligentes de an谩lisis de opini贸n

    Get PDF
    A la par de la amplia adopci贸n que han tenido las redes sociales, ha crecido tambi茅n la generaci贸n contenidos en ellas, en particular en forma de texto. La proliferaci贸n de este tipo de contenido ha creado la materia prima necesaria para aplicar t茅cnicas de miner铆a de textos a esos datos con el objetivo de extraer informaci贸n valiosa. Numerosos trabajos que intentan categorizar, mediante clasificadores basados en aprendizaje autom谩tico, textos provenientes de redes sociales, dependen del etiquetado manual del contenido o de la utilizaci贸n de datasets p煤blicos previamente etiquetados. Dichos abordajes presentan sus inconvenientes, uno de ellos es el tiempo que demanda la clasificaci贸n manual de los datos de entrenamiento. Otro problema es que los clasificadores suelen construirse utilizando datos de distinto origen a los que finalmente analizan, esto plantea un desaf铆o debido a que, si el clasificador no fue expuesto, durante la etapa de entrenamiento, a datos similares a los que finalmente debe categorizar, dif铆cilmente pueda hacerlo de manera adecuada. Por otro lado, la cantidad de recursos disponibles (tales como datasets etiquetados, corpus o diccionarios afectivos) no es abundante para idiomas distintos del ingl茅s, limitando las posibilidades de construcci贸n de los mencionados clasificadores de texto para otros idiomas, entre ellos el espa帽ol. La tarea de recopilaci贸n y validaci贸n de recursos en el idioma a utilizar se vuelve, en consecuencia, una necesidad para construir clasificadores de texto, basados en aprendizaje autom谩tico supervisado. Sin embargo, dichas tareas son extremadamente demandantes en tiempo y recursos humanos. Esta problem谩tica se agrava para los casos en los que el criterio de clasificaci贸n no es objetivo, como por ejemplo para la clasificaci贸n de emociones en texto. En estas situaciones, se requiere que varios jueces clasifiquen el mismo contenido, de manera de poder validar la veracidad de la etiqueta asignada al mismo. Con el objetivo de agilizar el desarrollo de clasificadores de emociones en texto para el idioma espa帽ol basados en aprendizaje autom谩tico supervisado, resulta necesario reducir o eliminar la necesidad del etiquetado manual de los datasets utilizados para entrenamiento. En esta tesis, a diferencia de otros estudios, las etiquetas que denotan la emoci贸n de cada comentario se obtienen autom谩ticamente de los mismos usuarios que escriben el contenido, en lugar de clasificarlos de manera manual. Posteriormente, se define un procedimiento para realizar la validaci贸n de las etiquetas recopiladas, el cual requiere del etiquetado y validaci贸n manual de s贸lo una peque帽a muestra de las mismas y posterior c谩lculo de m茅tricas para establecer el nivel de consenso. A su vez, durante el proceso de captura de los documentos, se obtiene tambi茅n informaci贸n contextual relacionada con los mismos, con el objetivo de utilizarla para medir los cambios, ya sean mejoras o no, en el desempe帽o de distintos clasificadores basados en aprendizaje autom谩tico. El proceso que se presenta en esta tesis, permite agilizar la construcci贸n de clasificadores de emociones en texto basados en aprendizaje autom谩tico y a su vez mejorar su desempe帽o mediante el uso de informaci贸n contextual. Estos clasificadores pueden ser utilizados para ofrecer una amplia variedad de prop贸sitos potenciales, como detectar la emoci贸n que surge de la opini贸n de grandes grupos de personas sobre ciertos productos, servicios o incluso pol铆ticas p煤blicas. Tambi茅n podr铆an utilizarse para identificar demandas o quejas no satisfechas de ciudadanos; o, en seguridad, para la detecci贸n autom谩tica de factores de riesgo en redes sociales, como amenazas, hostigamiento o acoso. Los clasificadores construidos a partir del proceso mencionado, alcanzan un desempe帽o similar al de otros entrenados con datasets etiquetados manualmente. Debe resaltarse que, en el trabajo presentado, la necesidad de etiquetado manual en el proceso de recolecci贸n y clasificaci贸n se reduce significativamente. El conjunto de datos creado puede ser utilizado en diversas investigaciones que realicen An谩lisis de Sentimientos en espa帽ol. Adem谩s, el proceso de recopilaci贸n y validaci贸n presentado en esta tesis puede adaptarse f谩cilmente para generar nuevos datasets en temas o idiomas espec铆ficos.Alongside the widespread adoption of social media, the generation of content on these platforms, particularly in text, has also grown. The proliferation of this type of content has provided the necessary raw material to apply text-mining techniques to extract valuable information from the data. Numerous studies attempting to categorize texts from social media using machine learning classifiers rely on manual content labeling or using pre-labeled public datasets. These approaches have their drawbacks, including the time-consuming process of manually classifying the training data. Another problem is that classifiers are often built using data from different sources than those they analyze. This poses a challenge because if the classifier hasn't been exposed to similar data during the training phase, it will have difficulty categorizing it correctly. Additionally, the availability of resources such as labeled datasets, corpora, or affective dictionaries is limited for languages other than English, restricting the possibilities of constructing aforementioned text classifiers for other languages, including Spanish. As a result, the collection and validation of resources in the target language become necessary for building supervised machine learning-based text classifiers. However, these tasks are extremely time-consuming and resource-intensive. This problem is exacerbated in cases where the classification criterion is not objective, such as emotion classification in text. In these situations, multiple judges are required to classify the same content to validate the accuracy of the assigned label. To expedite the development of supervised machine learning-based emotion classifiers for the Spanish language, reducing or eliminating the need for manual labeling of the datasets used for training is necessary. In this thesis, unlike other studies, the labels denoting the emotion of each comment are automatically obtained from the users who write the content rather than manually classifying them. Subsequently, a procedure is defined to validate the collected labels, which only requires manual labeling and validation of a small sample of them, followed by the calculation of metrics to establish the level of consensus. Furthermore, during the document collection process, contextual information related to the documents is also obtained and used to measure the changes, whether improvements or not, in the performance of different machine learning-based classifiers. The process presented in this thesis allows for streamlining the construction of text-based emotion classifiers using machine learning and enhancing their performance using contextual information. These classifiers can be used for a wide variety of potential purposes, such as detecting the sentiment arising from the opinions of large groups of people about specific products, services, or even public policies. They could also be used to identify unmet demands or complaints from citizens or, in security, to automatically detect risk factors in social networks, such as threats, harassment, or bullying. The classifiers built using the mentioned process perform similarly to others trained with manually labeled datasets. It should be emphasized that in the presented work, the need for manual labeling in the collection and classification process is significantly reduced. The constructed dataset can be used for various research purposes involving Sentiment Analysis in Spanish. Furthermore, the collection and validation process presented in this thesis can be easily adapted to generate new resources for specific domains or languages.Doctor en Ciencias Inform谩ticasUniversidad Nacional de La PlataFacultad de Inform谩tic

    Computational intelligence in extra low voltage direct currrent pico-grids

    Get PDF
    Ph. D. ThesisThe modern power system has gone through a lot of changes over the past few years. It is no longer about providing one-way power from sources to various loads. Power monitoring and management have become an increasingly essential task with the growing trend to provide users more information about the status of the loads within their energy consumption so that they can make an informed decision to reduce usage and cost or request desired maintenance. Computational intelligence has been successfully implemented in the electrical power systems to aid the user, but these research studies about this are generally conducted on the conventional alternative current (AC) macro-grids. Until now, little work has been done on direct current (DC) and the focus on smaller DC grids has been even less. In recent years, the evolution of electrical power system has seen the proliferation of direct current (DC) appliances and equipment such as buildings, households and office loads. This number keeps increasing with the advancement in technology and consumer lifestyles changes. Given that DC power supplies are getting more popular in the form of photovoltaic panels and batteries, it is possible for Extra Low Voltage (ELV) DC households or office pico-grids to come into use soon. This research recognises and addresses this research gap in the monitoring and managing of the DC picogrids. It recommends and applies the bottom-up monitoring and management approach in smaller scale grids and in larger scale grids. It innovatively categorises the loads in the grids into dumb loads that do not have intelligence and communication features and smart loads that have these features. While targeting at these ELV DC pico-grids, this research presents solutions that provide users useful information on load classification, load disaggregation, anomaly warning and early fault detection. It provides local and remote sensing with the alternative use of hardware to lessen the computational burden from the main computer. The inclusion of remote monitoring has opened a window of opportunities for Internet of Things (IoT) implementation. These solutions involve the blending of computational intelligence techniques with enhanced algorithms, such as K-Means algorithm, k-Nearest Neighbours (kNN) classification, Na茂ve Bayes Classification (NBC) Theorem, Statistical Process Control (SPC) and Long Short-Term Memory Recurrent Neural Network (LSTM RNN). As demonstrated in this research, these solutions produce high accuracy results in load classification and early anomaly detection in both AC and DC pico-grids. In addition to the load side, this research features a short-term PV energy forecasting technique that is easily comprehensible to users. This research contributes to the implementation of the Smart Grid with possible IoT features in DC pico-grids
    corecore