4,678 research outputs found
Recommended from our members
Building thermal load prediction through shallow machine learning and deep learning
Building thermal load prediction informs the optimization of cooling plant and thermal energy storage. Physics-based prediction models of building thermal load are constrained by the model and input complexity. In this study, we developed 12 data-driven models (7 shallow learning, 2 deep learning, and 3 heuristic methods) to predict building thermal load and compared shallow machine learning and deep learning. The 12 prediction models were compared with the measured cooling demand. It was found XGBoost (Extreme Gradient Boost) and LSTM (Long Short Term Memory) provided the most accurate load prediction in the shallow and deep learning category, and both outperformed the best baseline model, which uses the previous day's data for prediction. Then, we discussed how the prediction horizon and input uncertainty would influence the load prediction accuracy. Major conclusions are twofold: first, LSTM performs well in short-term prediction (1 h ahead) but not in long term prediction (24 h ahead), because the sequential information becomes less relevant and accordingly not so useful when the prediction horizon is long. Second, the presence of weather forecast uncertainty deteriorates XGBoost's accuracy and favors LSTM, because the sequential information makes the model more robust to input uncertainty. Training the model with the uncertain rather than accurate weather data could enhance the model's robustness. Our findings have two implications for practice. First, LSTM is recommended for short-term load prediction given that weather forecast uncertainty is unavoidable. Second, XGBoost is recommended for long term prediction, and the model should be trained with the presence of input uncertainty
Ensemble Sales Forecasting Study in Semiconductor Industry
Sales forecasting plays a prominent role in business planning and business
strategy. The value and importance of advance information is a cornerstone of
planning activity, and a well-set forecast goal can guide sale-force more
efficiently. In this paper CPU sales forecasting of Intel Corporation, a
multinational semiconductor industry, was considered. Past sale, future
booking, exchange rates, Gross domestic product (GDP) forecasting, seasonality
and other indicators were innovatively incorporated into the quantitative
modeling. Benefit from the recent advances in computation power and software
development, millions of models built upon multiple regressions, time series
analysis, random forest and boosting tree were executed in parallel. The models
with smaller validation errors were selected to form the ensemble model. To
better capture the distinct characteristics, forecasting models were
implemented at lead time and lines of business level. The moving windows
validation process automatically selected the models which closely represent
current market condition. The weekly cadence forecasting schema allowed the
model to response effectively to market fluctuation. Generic variable
importance analysis was also developed to increase the model interpretability.
Rather than assuming fixed distribution, this non-parametric permutation
variable importance analysis provided a general framework across methods to
evaluate the variable importance. This variable importance framework can
further extend to classification problem by modifying the mean absolute
percentage error(MAPE) into misclassify error. Please find the demo code at :
https://github.com/qx0731/ensemble_forecast_methodsComment: 14 pages, Industrial Conference on Data Mining 2017 (ICDM 2017
Power System Parameters Forecasting Using Hilbert-Huang Transform and Machine Learning
A novel hybrid data-driven approach is developed for forecasting power system
parameters with the goal of increasing the efficiency of short-term forecasting
studies for non-stationary time-series. The proposed approach is based on mode
decomposition and a feature analysis of initial retrospective data using the
Hilbert-Huang transform and machine learning algorithms. The random forests and
gradient boosting trees learning techniques were examined. The decision tree
techniques were used to rank the importance of variables employed in the
forecasting models. The Mean Decrease Gini index is employed as an impurity
function. The resulting hybrid forecasting models employ the radial basis
function neural network and support vector regression. Apart from introduction
and references the paper is organized as follows. The section 2 presents the
background and the review of several approaches for short-term forecasting of
power system parameters. In the third section a hybrid machine learning-based
algorithm using Hilbert-Huang transform is developed for short-term forecasting
of power system parameters. Fourth section describes the decision tree learning
algorithms used for the issue of variables importance. Finally in section six
the experimental results in the following electric power problems are
presented: active power flow forecasting, electricity price forecasting and for
the wind speed and direction forecasting
An investigation into machine learning approaches for forecasting spatio-temporal demand in ride-hailing service
In this paper, we present machine learning approaches for characterizing and
forecasting the short-term demand for on-demand ride-hailing services. We
propose the spatio-temporal estimation of the demand that is a function of
variable effects related to traffic, pricing and weather conditions. With
respect to the methodology, a single decision tree, bootstrap-aggregated
(bagged) decision trees, random forest, boosted decision trees, and artificial
neural network for regression have been adapted and systematically compared
using various statistics, e.g. R-square, Root Mean Square Error (RMSE), and
slope. To better assess the quality of the models, they have been tested on a
real case study using the data of DiDi Chuxing, the main on-demand ride hailing
service provider in China. In the current study, 199,584 time-slots describing
the spatio-temporal ride-hailing demand has been extracted with an
aggregated-time interval of 10 mins. All the methods are trained and validated
on the basis of two independent samples from this dataset. The results revealed
that boosted decision trees provide the best prediction accuracy (RMSE=16.41),
while avoiding the risk of over-fitting, followed by artificial neural network
(20.09), random forest (23.50), bagged decision trees (24.29) and single
decision tree (33.55).Comment: Currently under review for journal publicatio
Recommended from our members
California Almond Yield Prediction at the Orchard Level With a Machine Learning Approach.
California's almond growers face challenges with nitrogen management as new legislatively mandated nitrogen management strategies for almond have been implemented. These regulations require that growers apply nitrogen to meet, but not exceed, the annual N demand for crop and tree growth and nut production. To accurately predict seasonal nitrogen demand, therefore, growers need to estimate block-level almond yield early in the growing season so that timely N management decisions can be made. However, methods to predict almond yield are not currently available. To fill this gap, we have developed statistical models using the Stochastic Gradient Boosting, a machine learning approach, for early season yield projection and mid-season yield update over individual orchard blocks. We collected yield records of 185 orchards, dating back to 2005, from the major almond growers in the Central Valley of California. A large set of variables were extracted as predictors, including weather and orchard characteristics from remote sensing imagery. Our results showed that the predicted orchard-level yield agreed well with the independent yield records. For both the early season (March) and mid-season (June) predictions, a coefficient of determination (R 2) of 0.71, and a ratio of performance to interquartile distance (RPIQ) of 2.6 were found on average. We also identified several key determinants of yield based on the modeling results. Almond yield increased dramatically with the orchard age until about 7 years old in general, and the higher long-term mean maximum temperature during April-June enhanced the yield in the southern orchards, while a larger amount of precipitation in March reduced the yield, especially in northern orchards. Remote sensing metrics such as annual maximum vegetation indices were also dominant variables for predicting the yield potential. While these results are promising, further refinement is needed; the availability of larger data sets and incorporation of additional variables and methodologies will be required for the model to be used as a fertilization decision support tool for growers. Our study has demonstrated the potential of automatic almond yield prediction to assist growers to manage N adaptively, comply with mandated requirements, and ensure industry sustainability
Recommended from our members
State-of-the-art on research and applications of machine learning in the building life cycle
Fueled by big data, powerful and affordable computing resources, and advanced algorithms, machine learning has been explored and applied to buildings research for the past decades and has demonstrated its potential to enhance building performance. This study systematically surveyed how machine learning has been applied at different stages of building life cycle. By conducting a literature search on the Web of Knowledge platform, we found 9579 papers in this field and selected 153 papers for an in-depth review. The number of published papers is increasing year by year, with a focus on building design, operation, and control. However, no study was found using machine learning in building commissioning. There are successful pilot studies on fault detection and diagnosis of HVAC equipment and systems, load prediction, energy baseline estimate, load shape clustering, occupancy prediction, and learning occupant behaviors and energy use patterns. None of the existing studies were adopted broadly by the building industry, due to common challenges including (1) lack of large scale labeled data to train and validate the model, (2) lack of model transferability, which limits a model trained with one data-rich building to be used in another building with limited data, (3) lack of strong justification of costs and benefits of deploying machine learning, and (4) the performance might not be reliable and robust for the stated goals, as the method might work for some buildings but could not be generalized to others. Findings from the study can inform future machine learning research to improve occupant comfort, energy efficiency, demand flexibility, and resilience of buildings, as well as to inspire young researchers in the field to explore multidisciplinary approaches that integrate building science, computing science, data science, and social science
Forecasting Workforce Requirement for State Transportation Agencies: A Machine Learning Approach
A decline in the number of construction engineers and inspectors available at State Transportation Agencies (STAs) to manage the ever-increasing lane miles has emphasized the importance of workforce planning in this sector. One of the crucial aspects of workforce planning involves forecasting the required workforce for any industry or agency. This thesis developed machine learning models to estimate the person-hour requirements of STAs at the agency and project levels. The Arkansas Department of Transportation (ARDOT) was used as a case study, using its employee data between 2012 and 2021. At the project level, machine learning regressors ranging from linear, tree ensembles, kernel-based, and neural network-based models were developed. At the agency level, a classic time series modeling approach, as well as neural networks-based models, were developed to forecast the monthly person-hour requirements of the agency. Parametric and non-parametric tests were employed in comparing the models across both levels. The results indicated a high performance from the random forest regressor, a tree ensemble with bagging, which recorded an average R-squared value of 0.91. The one-dimensional convolutional neural network model was the most effective model for forecasting the monthly person requirements at the agency level. It recorded an average RMSE of 4,500 person-hours monthly over short-range forecasting and an average of 5,000 person-hours monthly over long-range forecasting. These findings underscore the capability of machine learning models to provide more accurate workforce demand forecasts for STAs and the construction industry. This enhanced accuracy in workforce planning will contribute to improved resource allocation and management
- …