2,424 research outputs found
Using multiple classifiers for predicting the risk of endovascular aortic aneurysm repair re-intervention through hybrid feature selection.
Feature selection is essential in medical area; however, its process becomes complicated with the presence of censoring which is the unique character of survival analysis. Most survival feature selection methods are based on Cox's proportional hazard model, though machine learning classifiers are preferred. They are less employed in survival analysis due to censoring which prevents them from directly being used to survival data. Among the few work that employed machine learning classifiers, partial logistic artificial neural network with auto-relevance determination is a well-known method that deals with censoring and perform feature selection for survival data. However, it depends on data replication to handle censoring which leads to unbalanced and biased prediction results especially in highly censored data. Other methods cannot deal with high censoring. Therefore, in this article, a new hybrid feature selection method is proposed which presents a solution to high level censoring. It combines support vector machine, neural network, and K-nearest neighbor classifiers using simple majority voting and a new weighted majority voting method based on survival metric to construct a multiple classifier system. The new hybrid feature selection process uses multiple classifier system as a wrapper method and merges it with iterated feature ranking filter method to further reduce features. Two endovascular aortic repair datasets containing 91% censored patients collected from two centers were used to construct a multicenter study to evaluate the performance of the proposed approach. The results showed the proposed technique outperformed individual classifiers and variable selection methods based on Cox's model such as Akaike and Bayesian information criterions and least absolute shrinkage and selector operator in p values of the log-rank test, sensitivity, and concordance index. This indicates that the proposed classifier is more powerful in correctly predicting the risk of re-intervention enabling doctor in selecting patients' future follow-up plan
Groundwater Management Optimization and Saltwater Intrusion Mitigation under Uncertainty
Groundwater is valuable to supply fresh water to the public, industries, agriculture, etc. However, excessive pumping has caused groundwater storage degradation, water quality deterioration and saltwater intrusion problems. Reliable groundwater flow and solute transport modeling is needed for sustainable groundwater management and aquifer remediation design. However, challenges exist because of highly complex subsurface environments, computationally intensive groundwater models as well as inevitable uncertainties. The first research goal is to explore conjunctive use of feasible hydraulic control approaches for groundwater management and aquifer remediation. Water budget analysis is conducted to understand how groundwater withdrawals affect water levels. A mixed integer multi-objective optimization model is constructed to derive optimal freshwater pumping strategies and investigate how to promote the optimality through regulating pumping locations. A solute transport model for the Baton Rouge multi-aquifer system is developed to assess saltwater encroachment under current condition. Potential saltwater scavenging approach is proposed to mitigate the salinization issue in the Baton Rouge area. The second research goal aims to develop robust surrogate-assisted simulation-optimization modeling methods for saltwater intrusion mitigation. Machine learning based surrogate models (response surface regression model, artificial neural network and support vector machine) were developed to replace a complex high-fidelity solute transport model for predicting saltwater intrusion. Two different methods including Bayesian model averaging and Bayesian set pair analysis are used to construct ensemble surrogates and quantify model prediction uncertainties. Besides. different optimization models that incorporate multiple ensemble surrogates are formulated to obtain optimal saltwater scavenging strategies. Chance-constrained programming is used to account for model selection uncertainty in probabilistic nonlinear concentration constraints. The results show that conjunctive use of hydraulic control approaches would be effective to mitigate saltwater intrusion but needs decades. Machine learning based ensemble surrogates can build accurate models with high computing efficiency, and hence save great efforts in groundwater remediation design. Including model selection uncertainty through multimodel inference and model averaging provides more reliable remediation strategies compared with the single-surrogate assisted approach
Multilayer perceptron network optimization for chaotic time series modeling
Chaotic time series are widely present in practice, but due to their characteristics—such as internal randomness, nonlinearity, and long-term unpredictability—it is difficult to achieve high-precision intermediate or long-term predictions. Multi-layer perceptron (MLP) networks are an effective tool for chaotic time series modeling. Focusing on chaotic time series modeling, this paper presents a generalized degree of freedom approximation method of MLP. We then obtain its Akachi information criterion, which is designed as the loss function for training, hence developing an overall framework for chaotic time series analysis, including phase space reconstruction, model training, and model selection. To verify the effectiveness of the proposed method, it is applied to two artificial chaotic time series and two real-world chaotic time series. The numerical results show that the proposed optimized method is effective to obtain the best model from a group of candidates. Moreover, the optimized models perform very well in multi-step prediction tasks.This research was funded in part by the NSFC grant numbers 61972174 and 62272192, the Science-Technology Development Plan Project of Jilin Province grant number 20210201080GX, the Jilin Province Development and Reform Commission grant number 2021C044-1, the Guangdong Universities’ Innovation Team grant number 2021KCXTD015, and Key Disciplines Projects grant number 2021ZDJS138
Ensemble Sales Forecasting Study in Semiconductor Industry
Sales forecasting plays a prominent role in business planning and business
strategy. The value and importance of advance information is a cornerstone of
planning activity, and a well-set forecast goal can guide sale-force more
efficiently. In this paper CPU sales forecasting of Intel Corporation, a
multinational semiconductor industry, was considered. Past sale, future
booking, exchange rates, Gross domestic product (GDP) forecasting, seasonality
and other indicators were innovatively incorporated into the quantitative
modeling. Benefit from the recent advances in computation power and software
development, millions of models built upon multiple regressions, time series
analysis, random forest and boosting tree were executed in parallel. The models
with smaller validation errors were selected to form the ensemble model. To
better capture the distinct characteristics, forecasting models were
implemented at lead time and lines of business level. The moving windows
validation process automatically selected the models which closely represent
current market condition. The weekly cadence forecasting schema allowed the
model to response effectively to market fluctuation. Generic variable
importance analysis was also developed to increase the model interpretability.
Rather than assuming fixed distribution, this non-parametric permutation
variable importance analysis provided a general framework across methods to
evaluate the variable importance. This variable importance framework can
further extend to classification problem by modifying the mean absolute
percentage error(MAPE) into misclassify error. Please find the demo code at :
https://github.com/qx0731/ensemble_forecast_methodsComment: 14 pages, Industrial Conference on Data Mining 2017 (ICDM 2017
Hybrid statistical and mechanistic mathematical model guides mobile health intervention for chronic pain
Nearly a quarter of visits to the Emergency Department are for conditions
that could have been managed via outpatient treatment; improvements that allow
patients to quickly recognize and receive appropriate treatment are crucial.
The growing popularity of mobile technology creates new opportunities for
real-time adaptive medical intervention, and the simultaneous growth of big
data sources allows for preparation of personalized recommendations. Here we
focus on the reduction of chronic suffering in the sickle cell disease
community. Sickle cell disease is a chronic blood disorder in which pain is the
most frequent complication. There currently is no standard algorithm or
analytical method for real-time adaptive treatment recommendations for pain.
Furthermore, current state-of-the-art methods have difficulty in handling
continuous-time decision optimization using big data. Facing these challenges,
in this study we aim to develop new mathematical tools for incorporating mobile
technology into personalized treatment plans for pain. We present a new hybrid
model for the dynamics of subjective pain that consists of a dynamical systems
approach using differential equations to predict future pain levels, as well as
a statistical approach tying system parameters to patient data (both personal
characteristics and medication response history). Pilot testing of our approach
suggests that it has significant potential to predict pain dynamics given
patients' reported pain levels and medication usages. With more abundant data,
our hybrid approach should allow physicians to make personalized, data driven
recommendations for treating chronic pain.Comment: 13 pages, 15 figures, 5 table
Prediction of Energy Consumption of an Administrative Building using Machine Learning and Statistical Methods
oai:ojs.pkp.sfu.ca:article/4099Energy management is now essential in light of the current energy issues, particularly in the building industry, which accounts for a sizable amount of global energy use. Predicting energy consumption is of great interest in developing an effective energy management strategy. This study aims to prove the outperformance of machine learning models over SARIMA models in predicting heating energy usage in an administrative building in Chefchaouen City, Morocco. It also highlights the effectiveness of SARIMA models in predicting energy with limited data size in the training phase. The prediction is carried out using machine learning (artificial neural networks, bagging trees, boosting trees, and support vector machines) and statistical methods (14 SARIMA models). To build the models, external temperature, internal temperature, solar radiation, and the factor of time are selected as model inputs. Building energy simulation is conducted in the TRNSYS environment to generate a database for the training and validation of the models. The models' performances are compared based on three statistical indicators: normalized root mean square error (nRMSE), mean average error (MAE), and correlation coefficient (R). The results show that all studied models have good accuracy, with a correlation coefficient of 0.90 < R < 0.97. The artificial neural network outperforms all other models (R=0.97, nRMSE=12.60%, MAE= 0.19 kWh). Although machine learning methods, in general terms, seemingly outperform statistical methods, it is worth noting that SARIMA models reached good prediction accuracy without requiring too much data in the training phase. Doi: 10.28991/CEJ-2023-09-05-01 Full Text: PD
- …