19 research outputs found
Analysis of Random Forest, Multiple Regression, and Backpropagation Methods in Predicting Apartment Price Index in Indonesia
This study focuses on predicting the apartment price index in Indonesia using property survey data from Bank Indonesia. In the era of the Covid-19 pandemic, accurately predicting the sale and purchase price of apartments is essential to minimize the impact of losses, thus making apartment prices attractive to predict. The machine learning approach used to predict the apartment price index are the Random Forest method, the Multiple Regression method, and the Backpropagation method. This study aims to determine which method is more effective in predicting small amounts of data accuracy. The data used is apartment price index data from 2012 to 2019 in the JABODEBEK area. The research will produce prediction accuracy that will determine the effectiveness of the application of the method. The Random Forest method with parameters n_estimators=100 and max_features=”log2” produces an R2 accuracy of 0.977. The Multiple Regression method with a correlation between the selling price and rental price variables is 0.746, and the rental inflation variable is 0.042 produces an R2 accuracy of 0.559. The Backpropagation method with a 1000-4000-1 hidden scheme and 20000 iterations produces an R2 accuracy of 0.996. Therefore, the Backpropagation method is more suitable in this study compared to the other two methods. The Backpropagation method is suitable because it gets almost perfect accuracy, so this method will minimize losses in investing in buying and selling apartments in the Covid-19 pandemic era
Large-scale assessment of Prophet for multi-step ahead forecasting of monthly streamflow
We assess the performance of the recently introduced Prophet model in
multi-step ahead forecasting of monthly streamflow by using a large dataset.
Our aim is to compare the results derived through two different approaches.
The first approach uses past information about the time series to be
forecasted only (standard approach), while the second approach uses exogenous
predictor variables alongside with the use of the endogenous ones. The
additional information used in the fitting and forecasting processes includes
monthly precipitation and/or temperature time series, and their forecasts
respectively. Specifically, the exploited exogenous (observed or forecasted)
information considered at each time step exclusively concerns the time of
interest. The algorithms based on the Prophet model are in total four. Their
forecasts are also compared with those obtained using two classical
algorithms and two benchmarks. The comparison is performed in terms of four
metrics. The findings suggest that the compared approaches are equally
useful.</p
Evaluation of random forests and Prophet for daily streamflow forecasting
We assess the performance of random forests and Prophet in
forecasting daily streamflow up to seven days ahead in a river in the US.
Both the assessed forecasting methods use past streamflow observations, while
random forests additionally use past precipitation information. For
benchmarking purposes we also implement a naïve method based on the
previous streamflow observation, as well as a multiple linear regression
model utilizing the same information as random forests. Our aim is to
illustrate important points about the forecasting methods when implemented
for the examined problem. Therefore, the assessment is made in detail at a
sufficient number of starting points and for several forecast horizons. The
results suggest that random forests perform better in general terms, while
Prophet outperforms the naïve method for forecast horizons longer than
three days. Finally, random forests forecast the abrupt streamflow
fluctuations more satisfactorily than the three other methods.</p
Feature Selection with Annealing for Forecasting Financial Time Series
Stock market and cryptocurrency forecasting is very important to investors as
they aspire to achieve even the slightest improvement to their buy or hold
strategies so that they may increase profitability. However, obtaining accurate
and reliable predictions is challenging, noting that accuracy does not equate
to reliability, especially when financial time-series forecasting is applied
owing to its complex and chaotic tendencies. To mitigate this complexity, this
study provides a comprehensive method for forecasting financial time series
based on tactical input output feature mapping techniques using machine
learning (ML) models. During the prediction process, selecting the relevant
indicators is vital to obtaining the desired results. In the financial field,
limited attention has been paid to this problem with ML solutions. We
investigate the use of feature selection with annealing (FSA) for the first
time in this field, and we apply the least absolute shrinkage and selection
operator (Lasso) method to select the features from more than 1,000 candidates
obtained from 26 technical classifiers with different periods and lags. Boruta
(BOR) feature selection, a wrapper method, is used as a baseline for
comparison. Logistic regression (LR), extreme gradient boosting (XGBoost), and
long short-term memory (LSTM) are then applied to the selected features for
forecasting purposes using 10 different financial datasets containing
cryptocurrencies and stocks. The dependent variables consisted of daily
logarithmic returns and trends. The mean-squared error for regression, area
under the receiver operating characteristic curve, and classification accuracy
were used to evaluate model performance, and the statistical significance of
the forecasting results was tested using paired t-tests. Experiments indicate
that the FSA algorithm increased the performance of ML models, regardless of
problem type.Comment: 37 pages, 1 figures and 12 table
Analysis of Random Forest, Multiple Regression, and Backpropagation Methods in Predicting Apartment Price Index in Indonesia
This study focuses on predicting the apartment price index in Indonesia using property survey data from Bank Indonesia. In the era of the Covid-19 pandemic, accurately predicting the sale and purchase price of apartments is essential to minimize the impact of losses, thus making apartment prices attractive to predict. The machine learning approach used to predict the apartment price index are the Random Forest method, the Multiple Regression method, and the Backpropagation method. This study aims to determine which method is more effective in predicting small amounts of data accuracy. The data used is apartment price index data from 2012 to 2019 in the JABODEBEK area. The research will produce prediction accuracy that will determine the effectiveness of the application of the method. The Random Forest method with parameters n_estimators=100 and max_features=”log2” produces an R2 accuracy of 0.977. The Multiple Regression method with a correlation between the selling price and rental price variables is 0.746, and the rental inflation variable is 0.042 produces an R2 accuracy of 0.559. The Backpropagation method with a 1000-4000-1 hidden scheme and 20000 iterations produces an R2 accuracy of 0.996. Therefore, the Backpropagation method is more suitable in this study compared to the other two methods. The Backpropagation method is suitable because it gets almost perfect accuracy, so this method will minimize losses in investing in buying and selling apartments in the Covid-19 pandemic era
Data Balancing Techniques for Predicting Student Dropout Using Machine Learning
This research article was published MDPI, 2023Predicting student dropout is a challenging problem in the education sector. This is due to an imbalance in student dropout data, mainly because the number of registered students is always higher than the number of dropout students. Developing a model without taking the data imbalance issue into account may lead to an ungeneralized model. In this study, different data balancing techniques were applied to improve prediction accuracy in the minority class while maintaining a satisfactory overall classification performance. Random Over Sampling, Random Under Sampling, Synthetic Minority Over Sampling, SMOTE with Edited Nearest Neighbor and SMOTE with Tomek links were tested, along with three popular classification models: Logistic Regression, Random Forest, and Multi-Layer Perceptron. Publicly accessible datasets from Tanzania and India were used to evaluate the effectiveness of balancing techniques and prediction models. The results indicate that SMOTE with Edited Nearest Neighbor achieved the best classification performance on the 10-fold holdout sample. Furthermore, Logistic Regression correctly classified the largest number of dropout students (57348 for the Uwezo dataset and 13430 for the India dataset) using the confusion matrix as the evaluation matrix. The applications of these models allow for the precise prediction of at-risk students and the reduction of dropout rates