Search CORE

2,347 research outputs found

Recommended from our members

A novel improved model for building energy consumption prediction based on model integration

Author: Feng W
Lu S
Wang R
Publication venue: eScholarship, University of California
Publication date: 15/03/2020
Field of study

Building energy consumption prediction plays an irreplaceable role in energy planning, management, and conservation. Constantly improving the performance of prediction models is the key to ensuring the efficient operation of energy systems. Moreover, accuracy is no longer the only factor in revealing model performance, it is more important to evaluate the model from multiple perspectives, considering the characteristics of engineering applications. Based on the idea of model integration, this paper proposes a novel improved integration model (stacking model) that can be used to forecast building energy consumption. The stacking model combines advantages of various base prediction algorithms and forms them into “meta-features” to ensure that the final model can observe datasets from different spatial and structural angles. Two cases are used to demonstrate practical engineering applications of the stacking model. A comparative analysis is performed to evaluate the prediction performance of the stacking model in contrast with existing well-known prediction models including Random Forest, Gradient Boosted Decision Tree, Extreme Gradient Boosting, Support Vector Machine, and K-Nearest Neighbor. The results indicate that the stacking method achieves better performance than other models, regarding accuracy (improvement of 9.5%–31.6% for Case A and 16.2%–49.4% for Case B), generalization (improvement of 6.7%–29.5% for Case A and 7.1%-34.6% for Case B), and robustness (improvement of 1.5%–34.1% for Case A and 1.8%–19.3% for Case B). The proposed model enriches the diversity of algorithm libraries of empirical models

eScholarship - University of California

Efficient Benchmarking of Algorithm Configuration Procedures via Model-Based Surrogates

Author: Eggensperger Katharina
Hoos Holger H.
Hutter Frank
Leyton-Brown Kevin
Lindauer Marius
Publication venue
Publication date: 30/03/2017
Field of study

The optimization of algorithm (hyper-)parameters is crucial for achieving peak performance across a wide range of domains, ranging from deep neural networks to solvers for hard combinatorial problems. The resulting algorithm configuration (AC) problem has attracted much attention from the machine learning community. However, the proper evaluation of new AC procedures is hindered by two key hurdles. First, AC benchmarks are hard to set up. Second and even more significantly, they are computationally expensive: a single run of an AC procedure involves many costly runs of the target algorithm whose performance is to be optimized in a given AC benchmark scenario. One common workaround is to optimize cheap-to-evaluate artificial benchmark functions (e.g., Branin) instead of actual algorithms; however, these have different properties than realistic AC problems. Here, we propose an alternative benchmarking approach that is similarly cheap to evaluate but much closer to the original AC problem: replacing expensive benchmarks by surrogate benchmarks constructed from AC benchmarks. These surrogate benchmarks approximate the response surface corresponding to true target algorithm performance using a regression model, and the original and surrogate benchmark share the same (hyper-)parameter space. In our experiments, we construct and evaluate surrogate benchmarks for hyperparameter optimization as well as for AC problems that involve performance optimization of solvers for hard combinatorial problems, drawing training data from the runs of existing AC procedures. We show that our surrogate benchmarks capture overall important characteristics of the AC scenarios, such as high- and low-performing regions, from which they were derived, while being much easier to use and orders of magnitude cheaper to evaluate

arXiv.org e-Print Archive

Crossref

Leiden University Scholary Publications

Tree Based Boosting Algorithm to Tackle the Overfitting in Healthcare Data

Author: A Ashwini
Pepsi M Blessa Binolin
S Vidhya
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/05/2022
Field of study

Healthcare data refers to information about an individual's or population's health issues, reproductive results, causes of mortality, and quality of life. When people interact with healthcare systems, a variety of health data is collected and used. However, these healthcare data are noisy as well as it prone to over-fitting. Over-fitting is a modeling error in statistics that occurs when a function is too closely aligned to a limited set of data points. As a result, the model learns the information and noise in the training data to the point where it degrades the model's performance on fresh data. The tree-based boosting approach works well on over-fitted data and is well suited for healthcare data. Improved Paloboost performs trimmed gradient and updated learning rate using Out-of-Bag mistakes collected from Out-of-Bag data. Out-of-Bag data are the data that are not present in In-Bag data. Improved Paloboost's outcome will protect against over-fitting in noisy healthcare data and outperform all tree baseline models. The Improved Paloboost is better at avoiding over-fitting of data and is less sensitive, according to experimental results on health-care datasets

International Journal on Recent and Innovation Trends in Computing and Communication

Recommended from our members

Developing Children's Oral Health Assessment Toolkits Using Machine Learning Algorithm.

Author: Coulter ID
Crall JJ
Hays RD
Lee SY
Liu H
Maida CA
Marcus M
Shen J
Spolsky VW
Wang Y
Xiong D
Publication venue: eScholarship, University of California
Publication date: 01/07/2020
Field of study

ObjectivesEvaluating children's oral health status and treatment needs is challenging. We aim to build oral health assessment toolkits to predict Children's Oral Health Status Index (COHSI) score and referral for treatment needs (RFTN) of oral health. Parent and Child toolkits consist of short-form survey items (12 for children and 8 for parents) with and without children's demographic information (7 questions) to predict the child's oral health status and need for treatment.MethodsData were collected from 12 dental practices in Los Angeles County from 2015 to 2016. We predicted COHSI score and RFTN using random Bootstrap samples with manually introduced Gaussian noise together with machine learning algorithms, such as Extreme Gradient Boosting and Naive Bayesian algorithms (using R). The toolkits predicted the probability of treatment needs and the COHSI score with percentile (ranking). The performance of the toolkits was evaluated internally and externally by residual mean square error (RMSE), correlation, sensitivity and specificity.ResultsThe toolkits were developed based on survey responses from 545 families with children aged 2 to 17 y. The sensitivity and specificity for predicting RFTN were 93% and 49% respectively with the external data. The correlation(s) between predicted and clinically determined COHSI was 0.88 (and 0.91 for its percentile). The RMSEs of the COHSI toolkit were 4.2 for COHSI (and 1.3 for its percentile).ConclusionsSurvey responses from children and their parents/guardians are predictive for clinical outcomes. The toolkits can be used by oral health programs at baseline among school populations. The toolkits can also be used to quantify differences between pre- and post-dental care program implementation. The toolkits' predicted oral health scores can be used to stratify samples in oral health research.Knowledge transfer statementThis study creates the oral health toolkits that combine self- and proxy- reported short forms with children's demographic characteristics to predict children's oral health and treatment needs using Machine Learning algorithms. The toolkits can be used by oral health programs at baseline among school populations to quantify differences between pre and post dental care program implementation. The toolkits can also be used to stratify samples according to the treatment needs and oral health status

eScholarship - University of California

An XGBoost Algorithm for Predicting Purchasing Behaviour on E-Commerce Platforms

Author: Peiyi Song
Yutong Liu
Publication venue: 'Mechanical Engineering Faculty in Slavonski Brod'
Publication date: 01/01/2020
Field of study

To improve and enhance the predictive ability of consumer purchasing behaviours on e-commerce platforms, a new method of predicting purchasing behaviour on e-commerce platforms is created in this paper. This study introduced the basic principles of the XGBoost algorithm, analysed the historical data of an e-commerce platform, pre-processed the original data and constructed an e-commerce platform consumer purchase prediction model based on the XGBoost algorithm. By using the traditional random forest algorithm for comparative analysis, the K-fold cross-validation method was further used, combined with model performance indicators such as accuracy rate, precision rate, recall rate and F1-score to evaluate the classification accuracy of the model. The characteristics of the importance of the results were found through visual analysis. The results indicated that using the XGBoost algorithm to predict the purchasing behaviours of e-commerce platform consumers can improve the performance of the method and obtain a better prediction effect. This study provides a reference for improving the accuracy of e-commerce platform consumers\u27 purchasing behaviours prediction, and has important practical significance for the efficient operation of e-commerce platforms

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

XgBoost Hyper-Parameter Tuning Using Particle Swarm Optimization for Stock Price Forecasting

Author: Bayuaji Luhur
Kurniawan Haris
Pebrianti Dwi
Rusdah Rusdah
Publication venue: Universitas Ahmad Dahlan
Publication date: 09/01/2024
Field of study

Investment in the capital market has become a lifestyle for millennials in Indonesia as seen from the increasing number of SID (Single Investor Identification) from 2.4 million in 2019 to 10.3 million in December 2022. The increase is due to various reasons, starting from the Covid-19 pandemic, which limited the space for social interaction and the easy way to invest in the capital market through various e-commerce platforms. These investors generally use fundamental and technical analysis to maximize profits and minimize the risk of loss in stock investment. These methods may lead to problem where subjectivity and different interpretation may appear in the process. Additionally, these methods are time consuming due to the need in the deep research on the financial statements, economic conditions and company reports. Machine learning by utilizing historical stock price data which is time-series data is one of the methods that can be used for the stock price forecasting. This paper proposed XGBoost optimized by Particle Swarm Optimization (PSO) for stock price forecasting. XGBoost is known for its ability to make predictions accurately and efficiently. PSO is used to optimize the hyper-parameter values of XGBoost. The results of optimizing the hyper-parameter of the XGBoost algorithm using the Particle Swarm Optimization (PSO) method achieved the best performance when compared with standard XGBoost, Long Short-Term Memory (LSTM), Support Vector Regression (SVR) and Random Forest. The results in RSME, MAE and MAPE shows the lowest values in the proposed method, which are, 0.0011, 0.0008, and 0.0772%, respectively. Meanwhile, the reaches the highest value. It is seen that the PSO-optimized XGBoost is able to predict the stock price with a low error rate, and can be a promising model to be implemented for the stock price forecasting. This result shows the contribution of the proposed method

Journal of Education and Learning (EduLearn)