18 research outputs found
K-Means Optimization Algorithm to Improve Cluster Quality on Sparse Data
The aim of this research is clustering sparse data using various K-Means optimization algorithms. Sparse data used in this research came from Citampi Stories game reviews on Google Play Store. This research method are Density Based Spatial Clustering of Applications with Noise-Kmeans (DB-Kmeans), Particle Swarm Optimization-Kmeans (PSO-Kmeans), and Robust Sparse Kmeans Clustering (RSKC) which are evaluated using the silhouette score. Clustering sparse data presented a challenge as it could complicate the analysis process, leading to suboptimal or non-representative results. To address this challenge, the research employed an approach that involved dividing the data based on the number of terms in three different scenarios to reduce sparsity. The results of this research showed that DB-Kmeans had the potential to enhance clustering quality across most data scenarios. Additionally, this research found that dividing data based on the number of terms could effectively mitigate sparsity, significantly influencing the optimization of topic formation within each cluster. The conclusion of this research is that this approach is effective in enhancing the quality of clustering for sparse data, providing more diverse and easily interpretable information. The results of this research could be valuable for developers seeking to understand user preferences and enhance game quality
Penerapan Geographically Weighted Panel Regression dan Data Envelopment Analysis dalam Pemodelan Kemiskinan di Kalimantan Timur
Indonesia currently still needs to focus on achieving sustainable development goals agreed by all countries in the world. Indonesia presently ranks 82nd out of 163 nations in terms of SDG accomplishment, indicating that there is still plenty of potential for improvement. One of the goals that hasn't been accomplished is ‘no poverty’. Regarding the poverty cases, among all province in Indonesia, East Kalimantan is important to be analyzed, because Penajam Paser Utara and Kutai Kartanegara in East Kalimantan are scheduled to become Indonesia's next capital, Nusantara. The goal of this research is to investigate the variables that influence poverty in East Kalimantan and determine the effectiveness of poverty alleviation in the regencies/cities in East Kalimantan. This research used indicator data of poverty from 2019-2021 retrieved from Statistics Indonesia. This research use spatial panel data analysis regression method or Geographically Weighted Panel Regression (GWPR) and Data Envelopment Analysis (DEA). In GWPR model, this research compared adaptive gaussian, adaptive bisquare, adaptive exponential, fixed gaussian, fixed bisquare, and fixed exponential kernel. The findings of this investigation revealed that fixed exponential is the kernel that has lowest AIC and the highest adj-2. The variables that determine poverty of regencies/cities in East Kalimantan are expenditure per capita, life expectancy, and number of village with higher education facilities. Furthermore, according to DEA, only three cities were effective in addressing poverty: Mahakam Ulu, Paser, and Penajam Paser Utara
PENGGUNAAN SOCIOGRAM UNTUK MENGIDENTIFIKASI POLA JARINGAN SOSIAL PEMBELAJARAN MANDIRI MAHASISWA (Identification of Social Network of Student’s Independent Learning using Sociogram)
This paper presents a useful tool to help universities to increasing the level of their graduate outcome by using the information about social network among students. Such a quantitative tool is a sociogram which depicts how students interact with others. The graph can be easily generated when the pattern of the connectivity among individuals is known. We apply sociogram to portray the network of a class of students in Department of Statistics – Bogor Agricultural University which represent the way they interact when they want to discuss the academic related problems. We found some interesting results are practically valuable for the one who is responsible to the study result of the students. Some results are not new, but this approach could provide more informative features than conventional tables or such things.Keywords : sociogram, social network analysi
PEMODELAN DATA PANEL SPASIAL DENGAN DIMENSI RUANG DAN WAKTU (Spatial Panel Data Modeling with Space and Time Dimensions)
The modeling of spatial panel data is a method of analysis that include the dimension of space and time. In this analysis, the set of data that is required is a combination of cross sections and time series data, that is, either the data observed in each observation location periodically from time to time. On modeling of panel data, there are three approaches, namely pooled least square model, fixed and random effects model. While on modeling of spatial panel data there are several approaches which is a combination of these three approaches in modeling panel data with spatial autoregression model (SAR) and spatial error model (SEM). This research aims to apply a spatial panel data model analysis to include the dimension of space and time in a model. The data that used in this research is GDP, local revenues, a total population and total regional expenditures of ten districts in Jambi province during the years 2000-2008. The results from spatial panel data analysis obtained that model regression of spatial panel data corresponding to the data is panel data models with fixed effect model and spatial error model. From the results of such analysis can also be seen an increase in R2 compared with panel data analysis.Keywords : the modeling of panel data, the modeling of spatial panel data, SAR, SE
Perbandingan Kinerja Metode Arima, Multi-Layer Perceptron, dan Random Forest dalam Peramalan Harga Logam Mulia Berjangka yang Mengandung Pencilan
Akurasi peramalan sebagai tolok ukur kinerja metode deret waktu bergantung beberapa hal, antara lain karakteristik data, pemilihan metode, fluktuasi data, dan keberadaaan pencilan dalam data. Keberadaan pencilan tersebut sering kali tidak dapat dihindari sehingga dapat mengganggu akurasi peramalan. Mempertimbangkan hal tersebut dalam penelitian ini dibahas tentang perbandingan kinerja metode Autoregressive Integrated Moving Average (ARIMA), Multi-Layer Perceptron (MLP), dan Random Forest (RF) dalam peramalan data deret waktu yang mengandung pencilan, menggunakan studi kasus data harga logam mulia berjangka (emas, perak, dan platina) berdasarkan nilai Mean Absolute Percentage Error (MAPE). Ditunjukkan bahwa kinerja metode ARIMA dengan Interpolasi Linier mampu menekan pengaruh pencilan lebih baik dibanding metode ARIMA dengan Winsorized Mean dan ARIMA tanpa penanganan data pencilan dengan nilai MAPE rata-rata berturut-turut sebesar 10,67% dibanding 12,33% dan 11,79% ketika dievaluasi menggunakan data uji. Metode MLP memiliki kinerja yang tidak lebih baik dibanding ARIMA dengan Interpolasi Linier dengan nilai MAPE rata-rata sebesar 11,13% ketika dievaluasi menggunakan data uji. Secara keseluruhan kinerja terbaik dihasilkan oleh metode RF, dengan nilai MAPE rata-rata jauh lebih kecil dibanding metode lainnya, yakni 2,85% ketika dievaluasi menggunakan data uji. Dalam kajian ini disimpulkan Metode RF memiliki kinerja terbaik dibandingkan semua metode. Hal tersebut disebabkan metode RF menggunakan prinsip decision tree sehingga lebih robust terhadap kehadiran pencilan dalam data. Berdasarkan hasil penelitian, metode RF dapat menjadi opsi untuk pemodelan data deret waktu yang mengandung pencilan.
Abstract
Forecasting accuracy as a benchmark for the performance of time series methods depends on several things, including data characteristics, method selection, data fluctuations, and the existence of outliers in the data. The existence of these outliers is often unavoidable so it can interfere with the accuracy of forecasting. Considering this, this research discusses the comparison of the performance of the Autoregressive Integrated Moving Average (ARIMA), Multi-Layer Perceptron (MLP), and Random Forest (RF) methods in forecasting time series data containing outliers, using a case study of precious metal futures price data (gold, silver, and platinum) based on the Mean Absolute Percentage Error (MAPE) value. It is shown that the performance of the ARIMA method with Linear Interpolation is able to suppress the influence of outliers better than the ARIMA method with Winsorized Mean and ARIMA without handling outlier data with the average MAPE value was obtained respectively at 10.67% compared to 12.33% and 11.79% when evaluated using test data. The MLP method has no better performance than ARIMA with Linear Interpolation with an average MAPE value of 11.13% when evaluated using test data. Overall, the best performance was produced by the RF method, which had a much smaller average MAPE value than the other methods, namely 2.85% when evaluated using test data. In this study it appears that the RF method has the best performance compared to all methods. This is because the RF method is based on decision tree principle so it is more robust to the presence of outliers in the data. Based on the research results, the RF method can be an option for modeling time series data that contains outliers.
Akurasi peramalan sebagai tolok ukur kinerja metode deret waktu bergantung beberapa hal, antara lain karakteristik data, pemilihan metode, dan jangka waktu, di samping fluktuasi data dan keberadaaan pencilan dalam data. Keberadaan pencilan dalam data sering kali tidak dapat dihindari sehingga dapat mengganggu akurasi dan presisi dari peramalan. Berdasarkan hal tersebut dalam artikel ini dibahas tentang hasil kajian perbandingan kinerja metode ARIMA, Multi-Layer Perceptron (MLP), dan Random Forest (RF) dalam peramalan data deret waktu yang mengandung pencilan, khususnya untuk data harga logam mulia berjangka (emas, perak, dan platina) berdasarkan nilai Mean Absolute Percentage Error (MAPE). Ditunjukkan bahwa kinerja metode ARIMA dengan Interpolasi Linier mampu menekan pengaruh pencilan lebih baik dibanding ARIMA dengan Winsorized Mean dan ARIMA tanpa penanganan data pencilan Dalam hal ini diperoleh nilai MAPE rata-rata berturut-turut sebesar 10,67% dibanding 12,33% dan 11,79% ketika dievaluasi menggunakan data uji. Selain itu, metode MLP memiliki kinerja yang tidak lebih baik dibanding ARIMA dengan Interpolasi Linier dengan nilai MAPE rata-rata sebesar 11,13% ketika dievaluasi menggunakan data uji. Secara keseluruhan kinerja terbaik dihasilkan oleh metode RF, yang memiliki nilai MAPE rata-rata jauh lebih kecil dibanding metode lainnya, yakni 2,85% ketika dievaluasi menggunakan data uji. Dalam kajian ini nampak bahwa Metode RF memiliki kinerja terbaik dibandingkan semua metode dalam peramalan data deret waktu yang dicobakan menggunakan data empiris yaitu harga loga mulia berjangka
Comparative Analysis of ARIMA and LSTM for Forecasting Maximum Wind Speed in Kupang City, East Nusa Tenggara
This study compares the Autoregressive Integrated Moving Average (ARIMA) and Long Short-Term Memory (LSTM) models for predicting maximum wind speed based on accuracy measured by Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE). Based on the results of the research, the LSTM model is better than the ARIMA model in predicting maximum wind speed in Kupang City, East Nusa Tenggara Province. The best LSTM model has hyperparameters of 200 epochs; batch size of 32; learning rate of 0,001; and 8 neurons. Based on the evaluation results of predicted data against actual data, the MAPE value of the LSTM model is 19,40%. The benefit of this research is that it can contribute to the literature on the development of wind utilization as a basis for building power plants on small islands as a renewable resource, particularly in Kupang City, East Nusa Tenggara
PREFERENSI MAHASISWA IPB TERHADAP MATA KULIAH METODE STATISTIKA MENGGUNAKAN ANALISIS KONJOIN
Statistical method is one of the interdepth courses in Bogor Agricultural University (BAU) therefore, it is necessary to conduct an evaluation in order to know the student's preference towards Statistics Methods course. Conjoint analysis is an analysis that can be used to determine the preference of students on teaching methods of Statistical Methods course. The combination of teaching methods are made using fractional factorial in which the level of factor determined was based on preliminary survey. Sampling techniques that has been used was multistage sampling of students who had took the Statistical Methods course in 2009/2010. Based on conjoint analysis, the module, the number of students, and the time period of lectures are the top three choices. The students tend to prefer materials that are appropriate with their major, modules that are well structured, a communicative lecturer, students as a teacher in review session, the number of student which is less than 50 students per class, and the time period of lecture is between 7-12 am. Keywords : statistical methods, preferences, conjoint analysis
PERFORMANCE COMPARISON OF SARIMA INTERVENTION AND PROPHET MODELS FOR FORECASTING THE NUMBER OF AIRLINE PASSENGER AT SOEKARNO-HATTA INTERNATIONAL AIRPORT
The impact of the COVID-19 pandemic on the air transportation sector, particularly Soekarno-Hatta (Soetta) International Airport, has been quite significant. The number of passengers at Soetta Airport has decreased due to the COVID-19 pandemic, but flight activities are still ongoing to this day. An accurate forecasting model is needed to predict the number of airline passengers at Soetta Airport with the presence of the COVID-19 pandemic as an intervention. In this study we discuss performance comparison of two models namely SARIMA intervention and Prophet in forecasting the number of domestic passengers at Soetta Airport. The research results showed that the best SARIMA intervention model was SARIMA (0,1,1)(1,0,0)12 b = 0, s = 20, r = 0, with a Mean Absolute Percentage Error (MAPE) of 28% and Root Mean Square Error (RMSE) of 433473. On the other hand, the Prophet model yielded a MAPE of 37% and an RMSE of 497154. In terms of MAPE and RMSE, the SARIMA intervention method provides better results than the Prophet model in forecasting the number of domestic passengers at Soetta Airport