12 research outputs found

    Klasifikasi Rumah Tangga Penerima Subsidi Listrik di Provinsi Gorontalo Tahun 2019 dengan Metode K-Nearest Neighbor dan Support Vector Machine

    Get PDF
    Program subsidi listrik merupakan salah satu program pemerintah untuk penanganan kemiskinan, dimana keluarga tidak mampu mendapatkan bantuan subsidi listrik yang dibayarkan pemerintah ke PT Perusahaan Listrik Negara (PLN). Permasalahannya adalah masih terdapat rumah tangga yang mampu secara ekonomi namun tetap mendapatkan subsidi listrik. Penelitian ini bertujuan untuk melakukan klasifikasi rumah tangga penerima subsidi listrik menggunakan data mining serta melakukan perbandingan hasil klasifikasi dengan metode K-Nearest Neighbor  (KNN) dan Support Vector Machine (SVM). Alasan pemilihan metode ini dibandingkan metode lainnya dalam data mining, KNN merupakan metode yang dapat mewakili lazy learning dan SVM merupakan metode klasifikasi yang dapat memberikan generalisasi. Data yang digunakan adalah data Susenas Provinsi Gorontalo tahun 2019. Variabel yang digunakan adalah status penerimaan subsidi listrik sebagai kelas dan variabel penjelas (atribut) mencakup jumlah anggota rumah tangga, status kepemilikan bangunan, luas lantai rumah, bahan atap rumah terluas, bahan dinding terluas, bahan lantai rumah terluas, sumber air minum utama, bahan bakar utama untuk memasak, dan tempat pembuangan akhir tinja. Program yang digunakan dalam pengolahan data adalah R. Hasil penelitian menunjukkan bahwa metode KNN memiliki akurasi yang lebih baik dalam melakukan klasifikasi yaitu sebesar 98,07%. Secara keseluruhan, terdapat perbedaan yang signifikan dari klasifikasi KNN dan SVM, dimana kinerja KNN jauh lebih baik dari SVM dalam melakukan klasifikasi

    Machine learning based prediction of esterases' promiscuity

    Get PDF
    Els enzims són de gran interès per a la majoria de les indústries, no obstant la seva caracterització en el laboratori és costosa i molt laboriosa, fet que ha impulsat el desenvolupament de tecnologies de predicció de les activitats dels enzims. Malgrat això, els enzims industrials han de tenir unes propietats molt específiques com per exemple alta especificitat, alta activitat en condicions no biològiques i alta promiscuitat, característiques que no estan ben cobertes per les eines de predicció actuals. Per aquest motiu, amb aquest projecte, s'intenta mitigar el problema creant classificadors binaris que poden predir la promiscuitat de les esterases.Enzymes are of great interest for a vast variety of industries; however, the experimental characterization is very time consuming and expensive. Moreover, industrial enzymes need to adapt to nonbiological conditions while maintaining high activity, promiscuity and stereo-selectivity, properties that are not well covered, currently, by prediction technologies which means that their characterization still relies solely on experimentation. This project has the intention of mitigating the problem by developing binary classifiers and multi-classifiers that can predict the promiscuity of esterases, one of the many industrially relevant enzymes

    Harnessing Data-Driven Insights: Predictive Modeling for Diamond Price Forecasting using Regression and Classification Techniques

    Get PDF
    In the multi-faceted world of gemology, understanding diamond valuations plays a pivotal role for traders, customers, and researchers alike. This study delves deep into predicting diamond prices in terms of exact monetary values and broader price categories. The purpose was to harness advanced machine learning techniques to achieve precise estimations and categorisations, thereby assisting stakeholders in informed decision-making. The research methodology adopted comprised a rigorous data preprocessing phase, ensuring the data's readiness for model training. A range of sophisticated machine learning models were employed, from traditional linear regression to more advanced   ensemble methods like Random Forest and Gradient Boosting. The dataset was also transformed to facilitate classification into predefined price tiers, exploring the viability of models like Logistic Regression and Support Vector Machines in this context. The conceptual model encompasses a systematic flow, beginning with data acquisition, transitioning through preprocessing, regression, and classification analyses, and culminating in a comparative study of the performance metrics. This structured approach underscores the originality and value of our research, offering a holistic view of diamond price prediction from both regression and classification lenses. Findings from the analysis highlighted the superior performance of the Random Forest regressor in predicting exact prices with an R2 value of approximately 0.975. In contrast, for classification into price tiers, both Logistic Regression and Support Vector Machines emerged as frontrunners with an accuracy exceeding 95%. These results provide invaluable insights for stakeholders in the diamond industry, emphasising the potential of machine learning in refining valuation processes

    Detecting fake news and disinformation using artificial intelligence and machine learning to avoid supply chain disruptions

    Get PDF
    Fake news and disinformation (FNaD) are increasingly being circulated through various online and social networking platforms, causing widespread disruptions and influencing decision-making perceptions. Despite the growing importance of detecting fake news in politics, relatively limited research efforts have been made to develop artificial intelligence (AI) and machine learning (ML) oriented FNaD detection models suited to minimize supply chain disruptions (SCDs). Using a combination of AI and ML, and case studies based on data collected from Indonesia, Malaysia, and Pakistan, we developed a FNaD detection model aimed at preventing SCDs. This model based on multiple data sources has shown evidence of its effectiveness in managerial decision-making. Our study further contributes to the supply chain and AI-ML literature, provides practical insights, and points to future research directions

    Detecting fake news and disinformation using artificial intelligence and machine learning to avoid supply chain disruptions

    Get PDF
    Fake news and disinformation (FNaD) are increasingly being circulated through various online and social networking platforms, causing widespread disruptions and influencing decision-making perceptions. Despite the growing importance of detecting fake news in politics, relatively limited research efforts have been made to develop artificial intelligence (AI) and machine learning (ML) oriented FNaD detection models suited to minimize supply chain disruptions (SCDs). Using a combination of AI and ML, and case studies based on data collected from Indonesia, Malaysia, and Pakistan, we developed a FNaD detection model aimed at preventing SCDs. This model based on multiple data sources has shown evidence of its effectiveness in managerial decision-making. Our study further contributes to the supply chain and AI-ML literature, provides practical insights, and points to future research directions.© The Author(s) 2022. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.fi=vertaisarvioitu|en=peerReviewed

    Value loss as a result of poor project planning decisions

    Get PDF
    The forecasts of development costs, schedules, and future productions, which are utilized to support investment decisions for projects, have a substantial impact on valuation and decision-making. However, many large projects, including petroleum projects on the Norwegian Continental Shelf (NCS), encounter cost and schedule overruns, as well as production shortfalls. Flyvbjerg [1] argued that exceeding budgeted time and costs is so prevalent in megaprojects that it can be considered a rule. Although many studies worked on the cost and schedule overruns in various industries, the subject of production underperformance has not been widely covered. This study builds upon the research conducted by Bratvold [2] and Nesvold [3], [4] regarding production forecasts for projects on the NCS. It involves the analysis of production shortfalls along with development time and cost overruns to evaluate the value loss arising from inaccurate forecasts. The actual cost, time and production values are publicly available on the Norwegian Petroleum Directorate's (NPD) website [5]. Only the forecasted development costs and time schedule were publicly available, while the projected production values remained confidential. Using the forecasted and actual values of the aforementioned data, this study aimed to identify overruns or underperformances, assess the economic value erosion caused by them, and examine the correlation between cumulative underperformances in the first 10 years of production and total cost overruns. For this purpose, the Present Value (PV) calculations were performed using two methods suggested by Mohus [6], in addition to Pearson, Spearman, and Support Vector Regression (SVR) analyses, utilizing MS Excel and Python. After summarizing previous research on overruns and underperformances in various industries, an overview of the development project phases on the NCS and the study's methodology is provided. Based on the analyses conducted in this study, the results can be summarized as follows: 1) Utilizing development cost data from 142 projects with (Plan for Development and Operation) PDO approvals between 2001 and 2022, a total loss of NOK 268 billion was attributed to cost overruns. This indicates a 12.7% overrun of actual costs compared to the initial forecasted costs. 2) Analysing development time data for 76 oil fields with PDO approval from 1990 to 2019, the study found an average development delay of 101 days, resulting in a 12% schedule overrun compared to the initially forecasted development times. The forecasted value loss due to these delays amounted to approximately NOK 39 billion. 3) Examining production data from 67 oil fields with production spanning from 1995 to 2021, it was observed that an average of around 50 million Sm3 of oil was not delivered during the first 10 years of production, as initially forecasted. This accounts for a 5% underproduction. The forecasted value loss due to production shortfalls, considering only the mean forecasts, amounted to approximately NOK 80 billion. 4) The total value loss resulting from poor forecasts, when considering the first 10 years of production, was calculated to be NOK 387 billion, which is the cumulative sum of the previously mentioned figures. 5) The average correlation coefficients between the cumulative underproduction in the first 10 years and the total cost overruns, calculated using the Pearson, Spearman, and SVR methods, were 0.28, 0.08, and 0.41, respectively. The Pearson coefficient suggests a positive linear correlation, albeit a weak one. The Spearman coefficient indicates a very weak positive linear correlation between the ranks of the data. The SVR coefficient suggests a moderate correlation, which is non-linear, considering that the linear correlation coefficient is considerably smaller. Moreover, the reasons for such poor forecasts, attributed to human bias, can be categorized as delusion, deception, and bad luck [7]. It is recommended that forecasters be trained about the biases and uncertainties involved in their forecasts, leverage the expertise of superforecasters, utilize historical data, and employ external methods like Reference Class Forecasting (RCF) to improve their forecasting accuracy [8]. To the best of the author's knowledge, based on data available until 2023, there has been no prior exploration of conducting a value loss analysis on the development time, cost, and production data for the first 10 years of production, along with regression analysis. This unique aspect distinguishes the present study from previous works. Considering the substantial value erosion resulting from inadequate forecasts in petroleum projects, it is vital for both the industry and the public to maintain diligent monitoring of project performance

    Computer vision-based wood identification and its expansion and contribution potentials in wood science: A review

    Get PDF
    The remarkable developments in computer vision and machine learning have changed the methodologies of many scientific disciplines. They have also created a new research field in wood science called computer vision-based wood identification, which is making steady progress towards the goal of building automated wood identification systems to meet the needs of the wood industry and market. Nevertheless, computer vision-based wood identification is still only a small area in wood science and is still unfamiliar to many wood anatomists. To familiarize wood scientists with the artificial intelligence-assisted wood anatomy and engineering methods, we have reviewed the published mainstream studies that used or developed machine learning procedures. This review could help researchers understand computer vision and machine learning techniques for wood identification and choose appropriate techniques or strategies for their study objectives in wood science.This study was supported by Grants-in-Aid for Scientifc Research (Grant Number H1805485) from the Japan Society for the Promotion of Science

    Detección de fallas en cajas de engranajes utilizando el método de aprendizaje de máquinas Support Vector Machine (SVM)

    Get PDF
    El objetivo de esta investigación fue crear un modelo predictivo bajo el enfoque de aprendizaje de máquinas y verificar su efectividad para clasificar y detectar fallas en cajas de engranajes de manera automática, para lo cual se utilizó un conjunto de datos de señales de vibración obtenido del repositorio de Iniciativa de Datos de Energía Abierta (OEDI) del departamento de energía de EE. UU. La creación del modelo se llevó a cabo utilizando el método de aprendizaje de máquinas supervisado Support Vector Machine (SVM) y con la ayuda del software de programación Python, donde se realizó el preprocesamiento y análisis del conjunto de datos. Al conjunto de datos se le extrajo características en el dominio del tiempo y dominio de la frecuencia. Para seleccionar las mejores características se aplicó el método de Eliminación Recursiva de Características con Validación Cruzada (RFECV). Para ingresar al clasificador SVM los datos se dividieron en 70% para entrenamiento y 30% para prueba. Como resultado se obtuvo tres modelos de detección de fallas, un primer modelo donde se utilizó un conjunto de datos recopilados por cuatro acelerómetros bajo una carga de 50%, un segundo modelo donde se combinó los datos recopilados por cuatro acelerómetros y cargas en un rango de 0 a 90% y un tercer modelo utilizando los datos de un solo acelerómetro del modelo dos. Cada modelo se entrenó y probo obteniéndose excelentes resultados, logrando una exactitud de 99,84% y una precisión de 99,82% para el mejor modelo. Los resultados demuestran que el método empleado clasifica y predice fallas con alta exactitud y precisión, siendo un método prometedor y de gran aporte para el mantenimiento industrial. Se recomienda reducir y estandarizar el conjunto de características, de esa forma se consigue reducir la carga computacional y a su vez mejorar el rendimiento del modelo.The objective of this research was to create a predictive model under the machine learning approach and verify its effectiveness to classify and detect faults in gearboxes automatically, for which a data set of vibration signals obtained from the repository was used from the Open Energy Data Initiative (OEDI) of the US Department of Energy. The creation of the model was carried out using the Support Vector Machine (SVM) supervised machine learning method and with the aid of Python programming software, where the preprocessing and analysis of the data set was performed. Features in the time domain and frequency domain were extracted from the data set. To select the best features, the Recursive Features Elimination with Cross Validation (RFECV) method was applied. To enter the SVM classifier, the data was divided into 70% for training and 30% for testing. As a result, three fault detection models were obtained, a first model where a set of data collected by four accelerometers under a load of 50% was produced, a second model where the data collected by four accelerometers and loads in a range of 0 to 90% and a third model using the data from a single accelerometer of model two. Each model was trained and tested obtaining excellent results, achieving an accuracy of 99,84% and a precision of 99,82% for the best model. The results show that the method used classifies and predicts faults with high accuracy and precision, being a promising method and of great contribution to industrial maintenance. It is recommended to reduce and standardize the set of features, in this way it is possible to reduce the computational load and in turn improve the performance of the model
    corecore