2,923 research outputs found

    A Comprehensive Survey on Rare Event Prediction

    Full text link
    Rare event prediction involves identifying and forecasting events with a low probability using machine learning and data analysis. Due to the imbalanced data distributions, where the frequency of common events vastly outweighs that of rare events, it requires using specialized methods within each step of the machine learning pipeline, i.e., from data processing to algorithms to evaluation protocols. Predicting the occurrences of rare events is important for real-world applications, such as Industry 4.0, and is an active research area in statistical and machine learning. This paper comprehensively reviews the current approaches for rare event prediction along four dimensions: rare event data, data processing, algorithmic approaches, and evaluation approaches. Specifically, we consider 73 datasets from different modalities (i.e., numerical, image, text, and audio), four major categories of data processing, five major algorithmic groupings, and two broader evaluation approaches. This paper aims to identify gaps in the current literature and highlight the challenges of predicting rare events. It also suggests potential research directions, which can help guide practitioners and researchers.Comment: 44 page

    Uncertainty Management of Intelligent Feature Selection in Wireless Sensor Networks

    Get PDF
    Wireless sensor networks (WSN) are envisioned to revolutionize the paradigm of monitoring complex real-world systems at a very high resolution. However, the deployment of a large number of unattended sensor nodes in hostile environments, frequent changes of environment dynamics, and severe resource constraints pose uncertainties and limit the potential use of WSN in complex real-world applications. Although uncertainty management in Artificial Intelligence (AI) is well developed and well investigated, its implications in wireless sensor environments are inadequately addressed. This dissertation addresses uncertainty management issues of spatio-temporal patterns generated from sensor data. It provides a framework for characterizing spatio-temporal pattern in WSN. Using rough set theory and temporal reasoning a novel formalism has been developed to characterize and quantify the uncertainties in predicting spatio-temporal patterns from sensor data. This research also uncovers the trade-off among the uncertainty measures, which can be used to develop a multi-objective optimization model for real-time decision making in sensor data aggregation and samplin

    A Comparative Analysis of Supervised Classification Algorithms and Missing Data Handling for Enhancing Chronic Kidney Disease Prediction

    Get PDF
    Chronic kidney disease (CKD), which is becoming a more significant public health concern, is characterized by a gradual but concerning increase in morbidity and death, particularly in its early, asymptomatic stages. Risk factors for chronic kidney disease (CKD), including genetic predisposition, obesity, diabetes, and hypertension, affect the illness's prevalence. When there are no outward signs of an illness, it is challenging to diagnose and treat it in its early stages. To tackle this pressing issue, our research does a comprehensive investigation through a comparative comparison of supervised classification techniques. In particular, we examine the prediction performance of CKD using the Random Forest, Decision Tree, and Support Vector Machine (SVM) techniques. We also look into a number of approaches to handling missing data. Our research presents a thorough evaluation of these algorithms' performance under different data cleaning methods, pointing out both their benefits and drawbacks. Ultimately, our research aims to clarify the early detection and treatment of chronic kidney disease (CKD) and pave the way for larger-scale public health initiatives to tackle this quickly escalating health emergency

    A Hierarchical, Fuzzy Inference Approach to Data Filtration and Feature Prioritization in the Connected Manufacturing Enterprise

    Get PDF
    The current big data landscape is one such that the technology and capability to capture and storage of data has preceded and outpaced the corresponding capability to analyze and interpret it. This has led naturally to the development of elegant and powerful algorithms for data mining, machine learning, and artificial intelligence to harness the potential of the big data environment. A competing reality, however, is that limitations exist in how and to what extent human beings can process complex information. The convergence of these realities is a tension between the technical sophistication or elegance of a solution and its transparency or interpretability by the human data scientist or decision maker. This dissertation, contextualized in the connected manufacturing enterprise, presents an original Fuzzy Approach to Feature Reduction and Prioritization (FAFRAP) approach that is designed to assist the data scientist in filtering and prioritizing data for inclusion in supervised machine learning models. A set of sequential filters reduces the initial set of independent variables, and a fuzzy inference system outputs a crisp numeric value associated with each feature to rank order and prioritize for inclusion in model training. Additionally, the fuzzy inference system outputs a descriptive label to assist in the interpretation of the feature’s usefulness with respect to the problem of interest. Model testing is performed using three publicly available datasets from an online machine learning data repository and later applied to a case study in electronic assembly manufacture. Consistency of model results is experimentally verified using Fisher’s Exact Test, and results of filtered models are compared to results obtained by the unfiltered sets of features using a proposed novel metric of performance-size ratio (PSR)

    Forecasting and Optimizing Dual Media Filter Performance via Machine Learning

    Get PDF
    Four different machine learning algorithms, including Decision Tree (DT), Random Forest (RF), Multivariable Linear Regression (MLR), Support Vector Regressions (SVR), and Gaussian Process Regressions (GPR), were applied to predict the performance of a multi-media filter operating as a function of raw water quality and plant operating variables. The models were trained using data collected over a seven year period covering water quality and operating variables, including true colour, turbidity, plant flow, and chemical dose for chlorine, KMnO4, FeCl3, and Cationic Polymer (PolyDADMAC). The machine learning algorithms have shown that the best prediction is at a 1-day time lag between input variables and unit filter run volume (UFRV). Furthermore, the RF algorithm with grid search using the input metrics mentioned above with a 1-day time lag has provided the highest reliability in predicting UFRV with a RMSE and R2 of 31.58 and 0.98, respectively. Similarly, RF with grid search has shown the shortest training time, prediction accuracy, and forecasting events using a ROC-AUC curve analysis (AUC over 0.8) in extreme wet weather events. Therefore, Random Forest with grid search and a 1-day time lag is an effective and robust machine learning algorithm that can predict the filter performance to aid water treatment operators in their decision makings by providing real-time warning of the potential turbidity breakthrough from the filters

    17. Simpozij „Materijali i metalurgija“ – dopuna „Zbornik sažetaka”

    Get PDF
    In Metalurgija 63 (2024) 2,303-320 published „ Book of Abstracts “ (224). Deadline for received of Abstracts was November, 30,2023 y. Many authors have request new deadline by March, 25, 2024 y. Organizing committee have accept new deadline. Now it published supplements of 103 Abstracts.U Metalurgiji 63 (2024) 2,303-320 objavljen je Zbornik sažetaka (224). Rok za primitak sažetke je bio 30. studeni 2023. god. Mnogi autori zatražili novi rok do 25.03.2024. Organizacijski odbor Simpozija je prihvatio novi termin. Objavljuje se sada dodatnih još 160 sažetaka

    Extraction of decision rules via imprecise probabilities

    Full text link
    "This is an Accepted Manuscript of an article published by Taylor & Francis in International Journal of General Systems on 2017, available online: https://www.tandfonline.com/doi/full/10.1080/03081079.2017.1312359"Data analysis techniques can be applied to discover important relations among features. This is the main objective of the Information Root Node Variation (IRNV) technique, a new method to extract knowledge from data via decision trees. The decision trees used by the original method were built using classic split criteria. The performance of new split criteria based on imprecise probabilities and uncertainty measures, called credal split criteria, differs significantly from the performance obtained using the classic criteria. This paper extends the IRNV method using two credal split criteria: one based on a mathematical parametric model, and other one based on a non-parametric model. The performance of the method is analyzed using a case study of traffic accident data to identify patterns related to the severity of an accident. We found that a larger number of rules is generated, significantly supplementing the information obtained using the classic split criteria.This work has been supported by the Spanish "Ministerio de Economia y Competitividad" [Project number TEC2015-69496-R] and FEDER funds.Abellán, J.; López-Maldonado, G.; Garach, L.; Castellano, JG. (2017). Extraction of decision rules via imprecise probabilities. International Journal of General Systems. 46(4):313-331. https://doi.org/10.1080/03081079.2017.1312359S313331464Abellan, J., & Bosse, E. (2018). Drawbacks of Uncertainty Measures Based on the Pignistic Transformation. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 48(3), 382-388. doi:10.1109/tsmc.2016.2597267Abellán, J., & Klir, G. J. (2005). Additivity of uncertainty measures on credal sets. International Journal of General Systems, 34(6), 691-713. doi:10.1080/03081070500396915Abellán, J., & Masegosa, A. R. (2010). An ensemble method using credal decision trees. European Journal of Operational Research, 205(1), 218-226. doi:10.1016/j.ejor.2009.12.003(2003). International Journal of Intelligent Systems, 18(12). doi:10.1002/int.v18:12Abellán, J., Klir, G. J., & Moral, S. (2006). Disaggregated total uncertainty measure for credal sets. International Journal of General Systems, 35(1), 29-44. doi:10.1080/03081070500473490Abellán, J., Baker, R. M., & Coolen, F. P. A. (2011). Maximising entropy on the nonparametric predictive inference model for multinomial data. European Journal of Operational Research, 212(1), 112-122. doi:10.1016/j.ejor.2011.01.020Abellán, J., López, G., & de Oña, J. (2013). Analysis of traffic accident severity using Decision Rules via Decision Trees. Expert Systems with Applications, 40(15), 6047-6054. doi:10.1016/j.eswa.2013.05.027Abellán, J., Baker, R. M., Coolen, F. P. A., Crossman, R. J., & Masegosa, A. R. (2014). Classification with decision trees from a nonparametric predictive inference perspective. Computational Statistics & Data Analysis, 71, 789-802. doi:10.1016/j.csda.2013.02.009Alkhalid, A., Amin, T., Chikalov, I., Hussain, S., Moshkov, M., & Zielosko, B. (2013). Optimization and analysis of decision trees and rules: dynamic programming approach. International Journal of General Systems, 42(6), 614-634. doi:10.1080/03081079.2013.798902Chang, L.-Y., & Chien, J.-T. (2013). Analysis of driver injury severity in truck-involved accidents using a non-parametric classification tree model. Safety Science, 51(1), 17-22. doi:10.1016/j.ssci.2012.06.017Chang, L.-Y., & Wang, H.-W. (2006). Analysis of traffic injury severity: An application of non-parametric classification tree techniques. Accident Analysis & Prevention, 38(5), 1019-1027. doi:10.1016/j.aap.2006.04.009DE CAMPOS, L. M., HUETE, J. F., & MORAL, S. (1994). PROBABILITY INTERVALS: A TOOL FOR UNCERTAIN REASONING. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 02(02), 167-196. doi:10.1142/s0218488594000146DGT. 2011b.Spanish Road Safety Strategy 2011–2020, 222 p. Madrid: Traffic General Directorate.Dolques, X., Le Ber, F., Huchard, M., & Grac, C. (2016). Performance-friendly rule extraction in large water data-sets with AOC posets and relational concept analysis. International Journal of General Systems, 45(2), 187-210. doi:10.1080/03081079.2015.1072927Gray, R. C., Quddus, M. A., & Evans, A. (2008). Injury severity analysis of accidents involving young male drivers in Great Britain. Journal of Safety Research, 39(5), 483-495. doi:10.1016/j.jsr.2008.07.003Guo, J., & Chankong, V. (2002). Rough set-based approach to rule generation and rule induction. International Journal of General Systems, 31(6), 601-617. doi:10.1080/0308107021000034353Huang, H., Chin, H. C., & Haque, M. M. (2008). Severity of driver injury and vehicle damage in traffic crashes at intersections: A Bayesian hierarchical analysis. Accident Analysis & Prevention, 40(1), 45-54. doi:10.1016/j.aap.2007.04.002Kashani, A. T., & Mohaymany, A. S. (2011). Analysis of the traffic injury severity on two-lane, two-way rural roads based on classification tree models. Safety Science, 49(10), 1314-1320. doi:10.1016/j.ssci.2011.04.019Li, X., & Yu, L. (2016). Decision making under various types of uncertainty. International Journal of General Systems, 45(3), 251-252. doi:10.1080/03081079.2015.1086574Mantas, C. J., & Abellán, J. (2014). Analysis and extension of decision trees based on imprecise probabilities: Application on noisy data. Expert Systems with Applications, 41(5), 2514-2525. doi:10.1016/j.eswa.2013.09.050Mayhew, D. R., Simpson, H. M., & Pak, A. (2003). Changes in collision rates among novice drivers during the first months of driving. Accident Analysis & Prevention, 35(5), 683-691. doi:10.1016/s0001-4575(02)00047-7McCartt, A. T., Mayhew, D. R., Braitman, K. A., Ferguson, S. A., & Simpson, H. M. (2009). Effects of Age and Experience on Young Driver Crashes: Review of Recent Literature. Traffic Injury Prevention, 10(3), 209-219. doi:10.1080/15389580802677807Montella, A., Aria, M., D’Ambrosio, A., & Mauriello, F. (2011). Data-Mining Techniques for Exploratory Analysis of Pedestrian Crashes. Transportation Research Record: Journal of the Transportation Research Board, 2237(1), 107-116. doi:10.3141/2237-12Montella, A., Aria, M., D’Ambrosio, A., & Mauriello, F. (2012). Analysis of powered two-wheeler crashes in Italy by classification trees and rules discovery. Accident Analysis & Prevention, 49, 58-72. doi:10.1016/j.aap.2011.04.025De Oña, J., López, G., & Abellán, J. (2013). Extracting decision rules from police accident reports through decision trees. Accident Analysis & Prevention, 50, 1151-1160. doi:10.1016/j.aap.2012.09.006De Oña, J., López, G., Mujalli, R., & Calvo, F. J. (2013). Analysis of traffic accidents on rural highways using Latent Class Clustering and Bayesian Networks. Accident Analysis & Prevention, 51, 1-10. doi:10.1016/j.aap.2012.10.016Pande, A., & Abdel-Aty, M. (2009). Market basket analysis of crash data from large jurisdictions and its potential as a decision support tool. Safety Science, 47(1), 145-154. doi:10.1016/j.ssci.2007.12.001Peek-Asa, C., Britton, C., Young, T., Pawlovich, M., & Falb, S. (2010). Teenage driver crash incidence and factors influencing crash injury by rurality. Journal of Safety Research, 41(6), 487-492. doi:10.1016/j.jsr.2010.10.002Sikora, M., & Wróbel, Ł. (2013). Data-driven adaptive selection of rule quality measures for improving rule induction and filtration algorithms. International Journal of General Systems, 42(6), 594-613. doi:10.1080/03081079.2013.798901Walley, P. (1996). Inferences from Multinomial Data: Learning About a Bag of Marbles. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 3-34. doi:10.1111/j.2517-6161.1996.tb02065.xWang, Z., & Klir, G. J. (1992). Fuzzy Measure Theory. doi:10.1007/978-1-4757-5303-5Webb, G. I. (2007). Discovering Significant Patterns. Machine Learning, 68(1), 1-33. doi:10.1007/s10994-007-5006-xWitten, I. H., & Frank, E. (2002). Data mining. ACM SIGMOD Record, 31(1), 76-77. doi:10.1145/507338.50735
    corecore