Extraction of decision rules via imprecise probabilities

Abstract

"This is an Accepted Manuscript of an article published by Taylor & Francis in International Journal of General Systems on 2017, available online: https://www.tandfonline.com/doi/full/10.1080/03081079.2017.1312359"Data analysis techniques can be applied to discover important relations among features. This is the main objective of the Information Root Node Variation (IRNV) technique, a new method to extract knowledge from data via decision trees. The decision trees used by the original method were built using classic split criteria. The performance of new split criteria based on imprecise probabilities and uncertainty measures, called credal split criteria, differs significantly from the performance obtained using the classic criteria. This paper extends the IRNV method using two credal split criteria: one based on a mathematical parametric model, and other one based on a non-parametric model. The performance of the method is analyzed using a case study of traffic accident data to identify patterns related to the severity of an accident. We found that a larger number of rules is generated, significantly supplementing the information obtained using the classic split criteria.This work has been supported by the Spanish "Ministerio de Economia y Competitividad" [Project number TEC2015-69496-R] and FEDER funds.Abellán, J.; López-Maldonado, G.; Garach, L.; Castellano, JG. (2017). Extraction of decision rules via imprecise probabilities. International Journal of General Systems. 46(4):313-331. https://doi.org/10.1080/03081079.2017.1312359S313331464Abellan, J., & Bosse, E. (2018). Drawbacks of Uncertainty Measures Based on the Pignistic Transformation. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 48(3), 382-388. doi:10.1109/tsmc.2016.2597267Abellán, J., & Klir, G. J. (2005). Additivity of uncertainty measures on credal sets. International Journal of General Systems, 34(6), 691-713. doi:10.1080/03081070500396915Abellán, J., & Masegosa, A. R. (2010). An ensemble method using credal decision trees. European Journal of Operational Research, 205(1), 218-226. doi:10.1016/j.ejor.2009.12.003(2003). International Journal of Intelligent Systems, 18(12). doi:10.1002/int.v18:12Abellán, J., Klir, G. J., & Moral, S. (2006). Disaggregated total uncertainty measure for credal sets. International Journal of General Systems, 35(1), 29-44. doi:10.1080/03081070500473490Abellán, J., Baker, R. M., & Coolen, F. P. A. (2011). Maximising entropy on the nonparametric predictive inference model for multinomial data. European Journal of Operational Research, 212(1), 112-122. doi:10.1016/j.ejor.2011.01.020Abellán, J., López, G., & de Oña, J. (2013). Analysis of traffic accident severity using Decision Rules via Decision Trees. Expert Systems with Applications, 40(15), 6047-6054. doi:10.1016/j.eswa.2013.05.027Abellán, J., Baker, R. M., Coolen, F. P. A., Crossman, R. J., & Masegosa, A. R. (2014). Classification with decision trees from a nonparametric predictive inference perspective. Computational Statistics & Data Analysis, 71, 789-802. doi:10.1016/j.csda.2013.02.009Alkhalid, A., Amin, T., Chikalov, I., Hussain, S., Moshkov, M., & Zielosko, B. (2013). Optimization and analysis of decision trees and rules: dynamic programming approach. International Journal of General Systems, 42(6), 614-634. doi:10.1080/03081079.2013.798902Chang, L.-Y., & Chien, J.-T. (2013). Analysis of driver injury severity in truck-involved accidents using a non-parametric classification tree model. Safety Science, 51(1), 17-22. doi:10.1016/j.ssci.2012.06.017Chang, L.-Y., & Wang, H.-W. (2006). Analysis of traffic injury severity: An application of non-parametric classification tree techniques. Accident Analysis & Prevention, 38(5), 1019-1027. doi:10.1016/j.aap.2006.04.009DE CAMPOS, L. M., HUETE, J. F., & MORAL, S. (1994). PROBABILITY INTERVALS: A TOOL FOR UNCERTAIN REASONING. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 02(02), 167-196. doi:10.1142/s0218488594000146DGT. 2011b.Spanish Road Safety Strategy 2011–2020, 222 p. Madrid: Traffic General Directorate.Dolques, X., Le Ber, F., Huchard, M., & Grac, C. (2016). Performance-friendly rule extraction in large water data-sets with AOC posets and relational concept analysis. International Journal of General Systems, 45(2), 187-210. doi:10.1080/03081079.2015.1072927Gray, R. C., Quddus, M. A., & Evans, A. (2008). Injury severity analysis of accidents involving young male drivers in Great Britain. Journal of Safety Research, 39(5), 483-495. doi:10.1016/j.jsr.2008.07.003Guo, J., & Chankong, V. (2002). Rough set-based approach to rule generation and rule induction. International Journal of General Systems, 31(6), 601-617. doi:10.1080/0308107021000034353Huang, H., Chin, H. C., & Haque, M. M. (2008). Severity of driver injury and vehicle damage in traffic crashes at intersections: A Bayesian hierarchical analysis. Accident Analysis & Prevention, 40(1), 45-54. doi:10.1016/j.aap.2007.04.002Kashani, A. T., & Mohaymany, A. S. (2011). Analysis of the traffic injury severity on two-lane, two-way rural roads based on classification tree models. Safety Science, 49(10), 1314-1320. doi:10.1016/j.ssci.2011.04.019Li, X., & Yu, L. (2016). Decision making under various types of uncertainty. International Journal of General Systems, 45(3), 251-252. doi:10.1080/03081079.2015.1086574Mantas, C. J., & Abellán, J. (2014). Analysis and extension of decision trees based on imprecise probabilities: Application on noisy data. Expert Systems with Applications, 41(5), 2514-2525. doi:10.1016/j.eswa.2013.09.050Mayhew, D. R., Simpson, H. M., & Pak, A. (2003). Changes in collision rates among novice drivers during the first months of driving. Accident Analysis & Prevention, 35(5), 683-691. doi:10.1016/s0001-4575(02)00047-7McCartt, A. T., Mayhew, D. R., Braitman, K. A., Ferguson, S. A., & Simpson, H. M. (2009). Effects of Age and Experience on Young Driver Crashes: Review of Recent Literature. Traffic Injury Prevention, 10(3), 209-219. doi:10.1080/15389580802677807Montella, A., Aria, M., D’Ambrosio, A., & Mauriello, F. (2011). Data-Mining Techniques for Exploratory Analysis of Pedestrian Crashes. Transportation Research Record: Journal of the Transportation Research Board, 2237(1), 107-116. doi:10.3141/2237-12Montella, A., Aria, M., D’Ambrosio, A., & Mauriello, F. (2012). Analysis of powered two-wheeler crashes in Italy by classification trees and rules discovery. Accident Analysis & Prevention, 49, 58-72. doi:10.1016/j.aap.2011.04.025De Oña, J., López, G., & Abellán, J. (2013). Extracting decision rules from police accident reports through decision trees. Accident Analysis & Prevention, 50, 1151-1160. doi:10.1016/j.aap.2012.09.006De Oña, J., López, G., Mujalli, R., & Calvo, F. J. (2013). Analysis of traffic accidents on rural highways using Latent Class Clustering and Bayesian Networks. Accident Analysis & Prevention, 51, 1-10. doi:10.1016/j.aap.2012.10.016Pande, A., & Abdel-Aty, M. (2009). Market basket analysis of crash data from large jurisdictions and its potential as a decision support tool. Safety Science, 47(1), 145-154. doi:10.1016/j.ssci.2007.12.001Peek-Asa, C., Britton, C., Young, T., Pawlovich, M., & Falb, S. (2010). Teenage driver crash incidence and factors influencing crash injury by rurality. Journal of Safety Research, 41(6), 487-492. doi:10.1016/j.jsr.2010.10.002Sikora, M., & Wróbel, Ł. (2013). Data-driven adaptive selection of rule quality measures for improving rule induction and filtration algorithms. International Journal of General Systems, 42(6), 594-613. doi:10.1080/03081079.2013.798901Walley, P. (1996). Inferences from Multinomial Data: Learning About a Bag of Marbles. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 3-34. doi:10.1111/j.2517-6161.1996.tb02065.xWang, Z., & Klir, G. J. (1992). Fuzzy Measure Theory. doi:10.1007/978-1-4757-5303-5Webb, G. I. (2007). Discovering Significant Patterns. Machine Learning, 68(1), 1-33. doi:10.1007/s10994-007-5006-xWitten, I. H., & Frank, E. (2002). Data mining. ACM SIGMOD Record, 31(1), 76-77. doi:10.1145/507338.50735

    Similar works

    Full text

    thumbnail-image

    Available Versions