118 research outputs found

    Dynamic ensemble selection methods for heterogeneous data mining

    Get PDF
    Big data is often collected from multiple sources with possibly different features, representations and granularity and hence is defined as heterogeneous data. Such multiple datasets need to be fused together in some ways for further analysis. Data fusion at feature level requires domain knowledge and can be time-consuming and ineffective, but it could be avoided if decision-level fusion is applied properly. Ensemble methods appear to be an appropriate paradigm to do just that as each subset of heterogeneous data sources can be separately used to induce models independently and their decisions are then aggregated by a decision fusion function in an ensemble. This study investigates how heterogeneous data can be used to generate more diverse classifiers to build more accurate ensembles. A Dynamic Ensemble Selection Optimisation (DESO) framework is proposed, using the local feature space of heterogeneous data to increase diversity among classifiers and Simulated Annealing for optimisation. An implementation example of DESO — BaggingDES is provided with Bagging as a base platform of DESO, to test its performance and also explore the relationship between diversity and accuracy. Experiments are carried out with some heterogeneous datasets derived from real-world benchmark datasets. The statistical analyses of the results show that BaggingDES performed significantly better than the baseline method — decision tree, and reasonably better than the classic Bagging.and accuracy. Experiments were carried out with some heterogeneous datasets derived from real-world benchmark datasets. The statistical analyses of the results show that BaggingDES performed significantly better than the baseline method - decision tree, and reasonably better than the classic Bagging

    Risk prediction of product-harm events using rough sets and multiple classifier fusion:an experimental study of listed companies in China

    Get PDF
    With the increasing of frequency and destructiveness of product-harm events, study on enterprise crisis management becomes essentially important, but little literature thoroughly explores the risk prediction method of product-harm event. In this study, an initial index system for risk prediction was built based on the analysis of the key drivers of the product-harm event's evolution; ultimately, nine risk-forecasting indexes were obtained using rough set attribute reduction. With the four indexes of cumulative abnormal returns as the input, fuzzy clustering was used to classify the risk level of a product-harm event into four grades. In order to control the uncertainty and instability of single classifiers in risk prediction, multiple classifier fusion was introduced and combined with self-organising data mining (SODM). Further, an SODM-based multiple classifier fusion (SB-MCF) model was presented for the risk prediction related to a product-harm event. The experimental results based on 165 Chinese listed companies indicated that the SB-MCF model improved the average predictive accuracy and reduced variation degree simultaneously. The statistical analysis demonstrated that the SB-MCF model significantly outperformed six widely used single classification models (e.g. neural networks, support vector machine, and case-based reasoning) and other six commonly used multiple classifier fusion methods (e.g. majority voting, Bayesian method, and genetic algorithm)

    Committee Machines for Hourly Water Demand Forecasting in Water Supply Systems

    Full text link
    [EN] Prediction models have become essential for the improvement of decision-making processes in public management and, particularly, for water supply utilities. Accurate estimation often needs to solve multimeasurement, mixed-mode, and space-time problems, typical of many engineering applications. As a result, accurate estimation of real world variables is still one of the major problems in mathematical approximation. Several individual techniques have shown very good estimation abilities. However, none of them are free from drawbacks. This paper faces the challenge of creating accurate water demand predictive models at urban scale by using so-called committee machines, which are ensemble frameworks of single machine learning models. The proposal is able to combine models of varied nature. Specifically, this paper analyzes combinations of such techniques as multilayer perceptrons, support vector machines, extreme learning machines, random forests, adaptive neural fuzzy inference systems, and the group method for data handling. Analyses are checked on two water demand datasets from Franca (Brazil). As an ensemble tool, the combined response of a committee machine outperforms any single constituent model.Ambrosio, JK.; Brentan, BM.; Herrera Fernández, AM.; Luvizotto, E.; Ribeiro, L.; Izquierdo Sebastián, J. (2019). Committee Machines for Hourly Water Demand Forecasting in Water Supply Systems. Mathematical Problems in Engineering. 2019:1-11. https://doi.org/10.1155/2019/97654681112019Montalvo, I., Izquierdo, J., Pérez-García, R., & Herrera, M. (2010). Improved performance of PSO with self-adaptive parameters for computing the optimal design of Water Supply Systems. Engineering Applications of Artificial Intelligence, 23(5), 727-735. doi:10.1016/j.engappai.2010.01.015Donkor, E. A., Mazzuchi, T. A., Soyer, R., & Alan Roberson, J. (2014). Urban Water Demand Forecasting: Review of Methods and Models. Journal of Water Resources Planning and Management, 140(2), 146-159. doi:10.1061/(asce)wr.1943-5452.0000314Adamowski, J. F. (2008). Peak Daily Water Demand Forecast Modeling Using Artificial Neural Networks. Journal of Water Resources Planning and Management, 134(2), 119-128. doi:10.1061/(asce)0733-9496(2008)134:2(119)Ghiassi, M., Zimbra, D. K., & Saidane, H. (2008). Urban Water Demand Forecasting with a Dynamic Artificial Neural Network Model. Journal of Water Resources Planning and Management, 134(2), 138-146. doi:10.1061/(asce)0733-9496(2008)134:2(138)Clemen, R. T. (1989). Combining forecasts: A review and annotated bibliography. International Journal of Forecasting, 5(4), 559-583. doi:10.1016/0169-2070(89)90012-5Herrera, M., García-Díaz, J. C., Izquierdo, J., & Pérez-García, R. (2011). Municipal Water Demand Forecasting: Tools for Intervention Time Series. Stochastic Analysis and Applications, 29(6), 998-1007. doi:10.1080/07362994.2011.610161Breiman, L. (2001). Machine Learning, 45(1), 5-32. doi:10.1023/a:1010933404324Barzegar, R., & Asghari Moghaddam, A. (2016). Combining the advantages of neural networks using the concept of committee machine in the groundwater salinity prediction. Modeling Earth Systems and Environment, 2(1). doi:10.1007/s40808-015-0072-8Nadiri, A. A., Gharekhani, M., Khatibi, R., Sadeghfam, S., & Moghaddam, A. A. (2017). Groundwater vulnerability indices conditioned by Supervised Intelligence Committee Machine (SICM). Science of The Total Environment, 574, 691-706. doi:10.1016/j.scitotenv.2016.09.093Brentan, B. M., Meirelles, G., Herrera, M., Luvizotto, E., & Izquierdo, J. (2017). Correlation Analysis of Water Demand and Predictive Variables for Short-Term Forecasting Models. Mathematical Problems in Engineering, 2017, 1-10. doi:10.1155/2017/6343625Brentan, B. M., Luvizotto Jr., E., Herrera, M., Izquierdo, J., & Pérez-García, R. (2017). Hybrid regression model for near real-time urban water demand forecasting. Journal of Computational and Applied Mathematics, 309, 532-541. doi:10.1016/j.cam.2016.02.009Johansson, C., Bergkvist, M., Geysen, D., Somer, O. D., Lavesson, N., & Vanhoudt, D. (2017). Operational Demand Forecasting In District Heating Systems Using Ensembles Of Online Machine Learning Algorithms. Energy Procedia, 116, 208-216. doi:10.1016/j.egypro.2017.05.068Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3), 21-45. doi:10.1109/mcas.2006.1688199Ferreira, R. P., Martiniano, A., Ferreira, A., Ferreira, A., & Sassi, R. J. (2016). Study on Daily Demand Forecasting Orders using Artificial Neural Network. IEEE Latin America Transactions, 14(3), 1519-1525. doi:10.1109/tla.2016.7459644Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297. doi:10.1007/bf00994018Schölkop, B. (2003). An Introduction to Support Vector Machines. Recent Advances and Trends in Nonparametric Statistics, 3-17. doi:10.1016/b978-044451378-6/50001-6Huang, G.-B., Wang, D. H., & Lan, Y. (2011). Extreme learning machines: a survey. International Journal of Machine Learning and Cybernetics, 2(2), 107-122. doi:10.1007/s13042-011-0019-yIvakhnenko, A. G. (1970). Heuristic self-organization in problems of engineering cybernetics. Automatica, 6(2), 207-219. doi:10.1016/0005-1098(70)90092-

    Prediction of Banks Financial Distress

    Get PDF
    In this research we conduct a comprehensive review on the existing literature of prediction techniques that have been used to assist on prediction of the bank distress. We categorized the review results on the groups depending on the prediction techniques method, our categorization started by firstly using time factors of the founded literature, so we mark the literature founded in the period (1990-2010) as history of prediction techniques, and after this period until 2013 as recent prediction techniques and then presented the strengths and weaknesses of both. We came out by the fact that there was no specific type fit with all bank distress issue although we found that intelligent hybrid techniques considered the most candidates methods in term of accuracy and reputatio

    An Extensive Analysis of Machine Learning Based Boosting Algorithms for Software Maintainability Prediction

    Get PDF
    Software Maintainability is an indispensable factor to acclaim for the quality of particular software. It describes the ease to perform several maintenance activities to make a software adaptable to the modified environment. The availability & growing popularity of a wide range of Machine Learning (ML) algorithms for data analysis further provides the motivation for predicting this maintainability. However, an extensive analysis & comparison of various ML based Boosting Algorithms (BAs) for Software Maintainability Prediction (SMP) has not been made yet. Therefore, the current study analyzes and compares five different BAs, i.e., AdaBoost, GBM, XGB, LightGBM, and CatBoost, for SMP using open-source datasets. Performance of the propounded prediction models has been evaluated using Root Mean Square Error (RMSE), Mean Magnitude of Relative Error (MMRE), Pred(0.25), Pred(0.30), & Pred(0.75) as prediction accuracy measures followed by a non-parametric statistical test and a post hoc analysis to account for the differences in the performances of various BAs. Based on the residual errors obtained, it was observed that GBM is the best performer, followed by LightGBM for RMSE, whereas, in the case of MMRE, XGB performed the best for six out of the seven datasets, i.e., for 85.71% of the total datasets by providing minimum values for MMRE, ranging from 0.90 to 3.82. Further, on applying the statistical test and on performing the post hoc analysis, it was found that significant differences exist in the performance of different BAs and, XGB and CatBoost outperformed all other BAs for MMRE. Lastly, a comparison of BAs with four other ML algorithms has also been made to bring out BAs superiority over other algorithms. This study would open new doors for the software developers for carrying out comparatively more precise predictions well in time and hence reduce the overall maintenance costs

    Improving binary classification using filtering based on k-NN proximity graphs

    Get PDF
    © 2020, The Author(s). One of the ways of increasing recognition ability in classification problem is removing outlier entries as well as redundant and unnecessary features from training set. Filtering and feature selection can have large impact on classifier accuracy and area under the curve (AUC), as noisy data can confuse classifier and lead it to catch wrong patterns in training data. The common approach in data filtering is using proximity graphs. However, the problem of the optimal filtering parameters selection is still insufficiently researched. In this paper filtering procedure based on k-nearest neighbours proximity graph was used. Filtering parameters selection was adopted as the solution of outlier minimization problem: k-NN proximity graph, power of distance and threshold parameters are selected in order to minimize outlier percentage in training data. Then performance of six commonly used classifiers (Logistic Regression, Naïve Bayes, Neural Network, Random Forest, Support Vector Machine and Decision Tree) and one heterogeneous classifiers combiner (DES-LA) are compared with and without filtering. Dynamic ensemble selection (DES) systems work by estimating the level of competence of each classifier from a pool of classifiers. Only the most competent ones are selected to classify a given test sample. This is achieved by defining a criterion to measure the level of competence of base classifiers, such as, its accuracy in local regions of the feature space around the query instance. In our case the combiner is based on the local accuracy of single classifiers and its output is a linear combination of single classifiers ranking. As results of filtering, accuracy of DES-LA combiner shows big increase for low-accuracy datasets. But filtering doesn’t have sufficient impact on DES-LA performance while working with high-accuracy datasets. The results are discussed, and classifiers, which performance was highly affected by pre-processing filtering step, are defined. The main contribution of the paper is introducing modifications to the DES-LA combiner, as well as comparative analysis of filtering impact on the classifiers of various type. Testing the filtering algorithm on real case dataset (Taiwan default credit card dataset) confirmed the efficiency of automatic filtering approach

    Data-Driven Machine Learning for Fault Detection and Diagnosis in Nuclear Power Plants: A Review

    Get PDF
    Data-driven machine learning (DDML) methods for the fault diagnosis and detection (FDD) in the nuclear power plant (NPP) are of emerging interest in the recent years. However, there still lacks research on comprehensive reviewing the state-of-the-art progress on the DDML for the FDD in the NPP. In this review, the classifications, principles, and characteristics of the DDML are firstly introduced, which include the supervised learning type, unsupervised learning type, and so on. Then, the latest applications of the DDML for the FDD, which consist of the reactor system, reactor component, and reactor condition monitoring are illustrated, which can better predict the NPP behaviors. Lastly, the future development of the DDML for the FDD in the NPP is concluded
    corecore