291 research outputs found

    Performance Analysis of Tree-Based Algorithms in Predicting Employee Attrition

    Get PDF
    Based on data throughout 2022, there have been many reductions in employees both globally and Indonesia. The reduction was made due to adjustments with developments to keep the business afloat in increasingly fierce competition. However, reducing the number of employees is not an easy decision to make. This decision can have an impact on many aspects of the development and course of a business or company. To make a decision especially related to the aspect of termination of employment, it is necessary to consider carefully and thoroughly. Assessment and decision-making cannot be based on just one aspect, other aspects need to be seen to be taken into consideration. Additional aspects that can be selected to strengthen decision-making can be taken from the data. Data will not have any value without processing it with various approaches, one of which is the prediction process. Starting from the data, the prediction results will be more appropriate to make a decision. This study made a comparison of 3 decision tree algorithms, and produced a comparison of the three methods in terms of accuracy. The results of this study are the best accuracy for each algorithm C.45 = 83.44; Random Forests = 85.85; LMT = 88.29 with a linear precision value, and the best algorithm model with the highest accuracy is the Logistic Model Tree (LMT) algorithm

    Air quality and urban sustainable development: the application of machine learning tools

    Full text link
    [EN] Air quality has an efect on a populationÂżs quality of life. As a dimension of sustainable urban development, governments have been concerned about this indicator. This is refected in the references consulted that have demonstrated progress in forecasting pollution events to issue early warnings using conventional tools which, as a result of the new era of big data, are becoming obsolete. There are a limited number of studies with applications of machine learning tools to characterize and forecast behavior of the environmental, social and economic dimensions of sustainable development as they pertain to air quality. This article presents an analysis of studies that developed machine learning models to forecast sustainable development and air quality. Additionally, this paper sets out to present research that studied the relationship between air quality and urban sustainable development to identify the reliability and possible applications in diferent urban contexts of these machine learning tools. To that end, a systematic review was carried out, revealing that machine learning tools have been primarily used for clustering and classifying variables and indicators according to the problem analyzed, while tools such as artifcial neural networks and support vector machines are the most widely used to predict diferent types of events. The nonlinear nature and synergy of the dimensions of sustainable development are of great interest for the application of machine learning tools.Molina-GĂłmez, NI.; DĂ­az-ArĂ©valo, JL.; LĂłpez JimĂ©nez, PA. (2021). Air quality and urban sustainable development: the application of machine learning tools. International Journal of Environmental Science and Technology. 18(4):1-18. https://doi.org/10.1007/s13762-020-02896-6S118184Al-Dabbous A, Kumar P, Khan A (2017) Prediction of airborne nanoparticles at roadside location using a feed–forward artificial neural network. Atmos Pollut Res 8:446–454. https://doi.org/10.1016/j.apr.2016.11.004Antanasijević D, Pocajt V, Povrenović D, Ristić M, Perić-Grujić A (2013) PM10 emission forecasting using artificial neural networks and genetic algorithm input variable optimization. Sci Total Environ 443:511–519. https://doi.org/10.1016/j.scitotenv.2012.10.110Brink H, Richards JW, Fetherolf M (2016) Real-world machine learning. Richards JW, Fetherolf M (eds) Manning Publications Co. Berkeley, CA. https://www.manning.com/books/real-world-machine-learning. Accessed 26 Apr 2020Cervone G, Franzese P, Ezber Y, Boybeyi Z (2008) Risk assessment of atmospheric emissions using machine learning. Nat Hazard Earth Syst 8:991–1000. https://doi.org/10.5194/nhess-8-991-2008Chen S, Kan G, Li J, Liang K, Hong Y (2018) Investigating China’s urban air quality using big data, information theory, and machine learning. Pol J Environ Stud 27:565–578. https://doi.org/10.15244/pjoes/75159Corani (2005) Air quality prediction in Milan: feed-forward neural networks, pruned neural networks and lazy learning. Ecol Model 185:513–529. https://doi.org/10.1016/j.ecolmodel.2005.01.008Cruz C, GĂłmez A, RamĂ­rez L, Villalva A, Monge O, Varela J, Quiroz J, Duarte H (2017) Calidad del aire respecto de metales (Pb, Cd, Ni, Cu, Cr) y relaciĂłn con salud respiratoria: caso Sonora, MĂ©xico. Rev Int Contam Ambient 33:23–34. https://doi.org/10.20937/RICA.2017.33.esp02.02de Hoogh K, HĂ©ritier H, Stafoggia M, KĂŒnzli N, Kloog I (2018) Modelling daily PM2.5 concentrations at high spatio-temporal resolution across Switzerland. Environ Pollut 233:1147–1154. https://doi.org/10.1016/j.envpol.2017.10.025Franceschi F, Cobo M, Figueredo M (2018) Discovering relationships and forecasting PM10 and PM2.5 concentrations in BogotĂĄ, Colombia, using Artificial Neural Networks, Principal Component Analysis, and k-means clustering. Atmos Pollut Res 9:912–922. https://doi.org/10.1016/j.apr.2018.02.006GarcĂ­a N, Combarro E, del Coz J, Montañes E (2013) A SVM-based regression model to study the air quality at local scale in Oviedo urban area (Northern Spain): a case study. Appl Math Comput 219:8923–8937. https://doi.org/10.1016/j.amc.2013.03.018Gibert K, SĂ nchez-MĂ rre M, Sevilla B (2012) Tools for environmental data mining and intelligent decision support. In iEMSs. Leipzig, Germany. http://www.iemss.org/society/index.php/iemss-2012-proceedings. Accessed 26 Nov 2018Gibert K, SĂ nchez-MarrĂš M, Izquierdo J (2016) A survey on pre-processing techniques: relevant issues in the context of environmental data mining. Ai Commun 29:627–663. https://doi.org/10.3233/AIC-160710Gounaridis D, Chorianopoulos I, Koukoulas S (2018) Exploring prospective urban growth trends under different economic outlooks and land-use planning scenarios: the case of Athens. Appl Geogr 90:134–144. https://doi.org/10.1016/j.apgeog.2017.12.001Holloway J, Mengersen K (2018) Statistical machine learning methods and remote sensing for sustainable development goals: a review. Remote Sens 10:1–21. https://doi.org/10.3390/rs10091365Ifaei P, Karbassi A, Lee S, Yoo Ch (2017) A renewable energies-assisted sustainable development plan for Iran using techno-econo-socio-environmental multivariate analysis and big data. Energy Convers Manag 153:257–277. https://doi.org/10.1016/j.enconman.2017.10.014Kadiyala A, Kumar A (2017a) Applications of R to evaluate environmental data science problems. Environ Prog Sustain 36:1358–1364. https://doi.org/10.1002/ep.12676Kadiyala A, Kumar A (2017b) Vector time series-based radial basis function neural network modeling of air quality inside a public transportation bus using available software. Environ Prog Sustain 36:4–10. https://doi.org/10.1002/ep.12523Karimian H, Li Q, Wu Ch, Qi Y, Mo Y, Chen G, Zhang X, Sachdeva S (2019) Evaluation of different machine learning approaches to forecasting PM2.5 mass concentrations. Aerosol Air Qual Res 19:1400–1410. https://doi.org/10.4209/aaqr.2018.12.0450Krzyzanowski M, Apte J, Bonjour S, Brauer M, Cohen A, PrĂŒss-Ustun A (2014) Air pollution in the mega-cities. Curr Environ Health Rep 1:185–191. https://doi.org/10.1007/s40572-014-0019-7LĂ€ssig K, Morik (2016) Computat sustainability. Springer, Berlin. https://doi.org/10.1007/978-3-319-31858-5Li Y, Wu Y-X, Zeng Z-X, Guo L (2006) Research on forecast model for sustainable development of economy-environment system based on PCA and SVM. In: Proceedings of the 2006 international conference on machine learning and cybernetics, vol 2006. IEEE, Dalian, China, pp 3590–3593. https://doi.org/10.1109/ICMLC.2006.258576Liu B-Ch, Binaykia A, Chang P-Ch, Tiwari M, Tsao Ch-Ch (2017) Urban air quality forecasting based on multi- dimensional collaborative support vector regression (SVR): a case study of Beijing-Tianjin-Shijiazhuang. PLoS ONE 12:1–17. https://doi.org/10.1371/journal.pone.0179763Lubell M, Feiock R, Handy S (2009) City adoption of environmentally sustainable policies in California’s Central Valley. J Am Plan Assoc 75:293–308. https://doi.org/10.1080/01944360902952295Ma D, Zhang Z (2016) Contaminant dispersion prediction and source estimation with integrated Gaussian-machine learning network model for point source emission in atmosphere. J Hazard Mater 311:237–245. https://doi.org/10.1016/j.jhazmat.2016.03.022Madu C, Kuei N, Lee P (2017) Urban sustainability management: a deep learning perspective. Sustain Cities Soc 30:1–17. https://doi.org/10.1016/j.scs.2016.12.012Mellos K (1988) Theory of eco-development. In: Perspectives on ecology. Palgrave Macmillan, London. https://doi.org/10.1007/978-1-349-19598-5_4Ni XY, Huang H, Du WP (2017) Relevance analysis and short-term prediction of PM2.5 concentrations in Beijing based on multi-source data. Atmos Environ 150:146–161. https://doi.org/10.1016/j.atmosenv.2016.11.054Oprea M, Dragomir E, Popescu M, Mihalache S (2016) Particulate matter air pollutants forecasting using inductive learning approach. Rev Chim 67:2075–2081Paas B, Stienen J, VorlĂ€nder M, Schneider Ch (2017) Modelling of urban near-road atmospheric PM concentrations using an artificial neural network approach with acoustic data input. Environments 4:1–25. https://doi.org/10.3390/environments4020026Pandey G, Zhang B, Jian L (2013) Predicting submicron air pollution indicators: a machine learning approach. Environ Sci Proc Impacts 15:996–1005. https://doi.org/10.1039/c3em30890aPeng H, Lima A, Teakles A, Jin J, Cannon A, Hsieh W (2017) Evaluating hourly air quality forecasting in Canada with nonlinear updatable machine learning methods. Air Qual Atmos Health 10:195–211. https://doi.org/10.1007/s11869-016-0414-3PĂ©rez-OrtĂ­z M, de La Paz-MarĂ­n M, GutiĂ©rrez PA, HervĂĄs-MartĂ­nez C (2014) Classification of EU countries’ progress towards sustainable development based on ordinal regression techniques. Knowl Based Syst 66:178–189. https://doi.org/10.1016/j.knosys.2014.04.041Phillis Y, Kouikoglou V, Verdugo C (2017) Urban sustainability assessment and ranking of cities. Comput Environ Urban 64:254–265. https://doi.org/10.1016/j.compenvurbsys.2017.03.002Saeed S, Hussain L, Awan I, Idris A (2017) Comparative analysis of different statistical methods for prediction of PM2.5 and PM10 concentrations in advance for several hours. Int J Comput Sci Netw Secur 17:45–52Sayegh A, Munir S, Habeebullah T (2014) Comparing the performance of statistical models for predicting PM10 concentrations. Aerosol Air Qual Res 14:653–665. https://doi.org/10.4209/aaqr.2013.07.0259Shaban K, Kadri A, Rezk E (2016) Urban air pollution monitoring system with forecasting models. IEEE Sens J 16:2598–2606. https://doi.org/10.1109/JSEN.2016.2514378Sierra B (2006) Aprendizaje automĂĄtico conceptos bĂĄsicos y avanzados Aspectos prĂĄcticos utilizando el software Weka. Madrid Pearson Prentice Hall, MadridSingh K, Gupta S, Rai P (2013) Identifying pollution sources and predicting urban air quality using ensemble learning methods. Atmos Environ 80:426–437. https://doi.org/10.1016/j.atmosenv.2013.08.023Song L, Pang S, Longley I, Olivares G, Sarrafzadeh A (2014) Spatio-temporal PM2.5 prediction by spatial data aided incremental support vector regression. In: International joint conference on neural networks. IEEE, Beijing, pp 623–630. https://doi.org/10.1109/IJCNN.2014.6889521Souza R, Coelho G, da Silva A, Pozza S (2015) Using ensembles of artificial neural networks to improve PM10 forecasts. Chem Eng Trans 43:2161–2166. https://doi.org/10.3303/CET1543361SuĂĄrez A, GarcĂ­a PJ, Riesgo P, del Coz JJ, Iglesias-RodrĂ­guez FJ (2011) Application of an SVM-based regression model to the air quality study at local scale in the AvilĂ©s urban area (Spain). Math Comput Model 54:453–1466. https://doi.org/10.1016/j.mcm.2011.04.017Tamas W, Notton G, Paoli C, Nivet M, Voyant C (2016) Hybridization of air quality forecasting models using machine learning and clustering: an original approach to detect pollutant peaks. Aerosol Air Qual Res 16:405–416. https://doi.org/10.4209/aaqr.2015.03.0193Toumi O, Le Gallo J, Ben Rejeb J (2017) Assessment of Latin American sustainability. Renew Sustain Energy Rev 78:878–885. https://doi.org/10.1016/j.rser.2017.05.013Tzima F, Mitkas P, Voukantsis D, Karatzas K (2011) Sparse episode identification in environmental datasets: the case of air quality assessment. Expert Syst Appl 38:5019–5027. https://doi.org/10.1016/j.eswa.2010.09.148United Nations, Department of Economic and Social Affairs (2019) World urbanization prospects The 2018 Revision. New York. https://doi.org/10.18356/b9e995fe-enWang B (2019) Applying machine-learning methods based on causality analysis to determine air quality in China. Pol J Environ Stud 28:3877–3885. https://doi.org/10.15244/pjoes/99639Wang X, Xiao Z (2017) Regional eco-efficiency prediction with support vector spatial dynamic MIDAS. J Clean Prod 161:165–177. https://doi.org/10.1016/j.jclepro.2017.05.077Wang W, Men C, Lu W (2008) Online prediction model based on support vector machine. Neurocomputing 71:550–558. https://doi.org/10.1016/j.neucom.2007.07.020WCED (1987) Report of the world commission on environment and development: our common future: report of the world commission on environment and development. WCED, Oslo. https://doi.org/10.1080/07488008808408783Weizhen H, Zhengqiang L, Yuhuan Z, Hua X, Ying Z, Kaitao L, Donghui L, Peng W, Yan M (2014) Using support vector regression to predict PM10 and PM2.5. In: IOP conference series: earth and environmental science, vol 17. IOP. https://doi.org/10.1088/1755-1315/17/1/012268WHO (2016) OMS | La OMS publica estimaciones nacionales sobre la exposiciĂłn a la contaminaciĂłn del aire y sus repercusiones para la salud. WHO. http://www.who.int/mediacentre/news/releases/2016/air-pollution-estimates/es/. Accesed 26 Nov 2018Yeganeh N, Shafie MP, Rashidi Y, Kamalan H (2012) Prediction of CO concentrations based on a hybrid partial least square and support vector machine model. Atmos Environ 55:357–365. https://doi.org/10.1016/j.atmosenv.2012.02.092Zalakeviciute R, Bastidas M, Buenaño A, Rybarczyk Y (2020) A traffic-based method to predict and map urban air quality. Appl Sci. https://doi.org/10.3390/app10062035Zeng L, Guo J, Wang B, Lv J, Wang Q (2019) Analyzing sustainability of Chinese coal cities using a decision tree modeling approach. Resour Policy 64:101501. https://doi.org/10.1016/j.resourpol.2019.101501Zhan Y, Luo Y, Deng X, Grieneisen M, Zhang M, Di B (2018) Spatiotemporal prediction of daily ambient ozone levels across China using random forest for human exposure assessment. Environ Pollut 233:464–473. https://doi.org/10.1016/j.envpol.2017.10.029Zhang Y, Huan Q (2006) Research on the evaluation of sustainable development in Cangzhou city based on neural-network-AHP. In: Proceedings of the fifth international conference on machine learning and cybernetics, vol 2006. pp 3144–3147. https://doi.org/10.1109/ICMLC.2006.258407Zhang Y, Shang W, Wu Y (2009) Research on sustainable development based on neural network. In: 2009 Chinese control and decision conference. IEEE, pp 3273–3276. https://doi.org/10.1109/CCDC.2009.5192476Zhou Y, Chang F-J, Chang L-Ch, Kao I-F, Wang YS (2019) Explore a deep learning multi-output neural network for regional multi-step-ahead air quality forecasts. J Clean Prod 209:134–145. https://doi.org/10.1016/j.jclepro.2018.10.24

    Data Mining Generating Decision Trees to Alert System Against Death and Losses in Egg Production

    Get PDF
    Climatic changes and high temperatures have been affecting animal production and the well-being of laying birds, with heat stress and high mortality rates, generating economic losses. Legacy databases can contain information to help model thermal comfort at climatic extremes. They can enable decision trees to be created through the use of data mining to prevent mortality and production losses. Thus, the objective of this study is to seek to develop decision trees, for application as an alert system, for the incidence of caloric stress in the production of layers. We used a database of three aviaries located in the city of Bastos-SP, collected in 2013. The data were organized in ExcelÂź spreadsheets, and processed with the WekaÂź software with the J48 (C4.5) algorithm for mining of the data. The technique allowed the construction of decision trees that in the chosen sheds were classified with respectively 99.73%, 99.61%, and 98.71% of correct answers and with Kappa indexes equal to 0.9958, 0.9907 and 0.9663, which indicate that the three classifiers built are excellent. Thus, the proposed system, with the decision trees built, can serve as a basis for the construction of an alert system to be applied to the three warehouses simultaneously

    Harvesting Data from Advanced Technologies

    Get PDF
    Data streams are emerging everywhere such as Web logs, Web page click streams, sensor data streams, and credit card transaction flows. Different from traditional data sets, data streams are sequentially generated and arrive one by one rather than being available for random access before learning begins, and they are potentially huge or even infinite that it is impractical to store the whole data. To study learning from data streams, we target online learning, which generates a best–so far model on the fly by sequentially feeding in the newly arrived data, updates the model as needed, and then applies the learned model for accurate real-time prediction or classification in real-world applications. Several challenges arise from this scenario: first, data is not available for random access or even multiple access; second, data imbalance is a common situation; third, the performance of the model should be reasonable even when the amount of data is limited; fourth, the model should be updated easily but not frequently; and finally, the model should always be ready for prediction and classification. To meet these challenges, we investigate streaming feature selection by taking advantage of mutual information and group structures among candidate features. Streaming feature selection reduces the number of features by removing noisy, irrelevant, or redundant features and selecting relevant features on the fly, and brings about palpable effects for applications: speeding up the learning process, improving learning accuracy, enhancing generalization capability, and improving model interpretation. Compared with traditional feature selection, which can only handle pre-given data sets without considering the potential group structures among candidate features, streaming feature selection is able to handle streaming data and select meaningful and valuable feature sets with or without group structures on the fly. In this research, we propose 1) a novel streaming feature selection algorithm (GFSSF, Group Feature Selection with Streaming Features) by exploring mutual information and group structures among candidate features for both group and individual levels of feature selection from streaming data, 2) a lazy online prediction model with data fusion, feature selection and weighting technologies for real-time traffic prediction from heterogeneous sensor data streams, 3) a lazy online learning model (LB, Live Bayes) with dynamic resampling technology to learn from imbalanced embedded mobile sensor data streams for real-time activity recognition and user recognition, and 4) a lazy update online learning model (CMLR, Cost-sensitive Multinomial Logistic Regression) with streaming feature selection for accurate real-time classification from imbalanced and small sensor data streams. Finally, by integrating traffic flow theory, advanced sensors, data gathering, data fusion, feature selection and weighting, online learning and visualization technologies to estimate and visualize the current and future traffic, a real-time transportation prediction system named VTraffic is built for the Vermont Agency of Transportation

    Temporal Information in Data Science: An Integrated Framework and its Applications

    Get PDF
    Data science is a well-known buzzword, that is in fact composed of two distinct keywords, i.e., data and science. Data itself is of great importance: each analysis task begins from a set of examples. Based on such a consideration, the present work starts with the analysis of a real case scenario, by considering the development of a data warehouse-based decision support system for an Italian contact center company. Then, relying on the information collected in the developed system, a set of machine learning-based analysis tasks have been developed to answer specific business questions, such as employee work anomaly detection and automatic call classification. Although such initial applications rely on already available algorithms, as we shall see, some clever analysis workflows had also to be developed. Afterwards, continuously driven by real data and real world applications, we turned ourselves to the question of how to handle temporal information within classical decision tree models. Our research brought us the development of J48SS, a decision tree induction algorithm based on Quinlan's C4.5 learner, which is capable of dealing with temporal (e.g., sequential and time series) as well as atemporal (such as numerical and categorical) data during the same execution cycle. The decision tree has been applied into some real world analysis tasks, proving its worthiness. A key characteristic of J48SS is its interpretability, an aspect that we specifically addressed through the study of an evolutionary-based decision tree pruning technique. Next, since a lot of work concerning the management of temporal information has already been done in automated reasoning and formal verification fields, a natural direction in which to proceed was that of investigating how such solutions may be combined with machine learning, following two main tracks. First, we show, through the development of an enriched decision tree capable of encoding temporal information by means of interval temporal logic formulas, how a machine learning algorithm can successfully exploit temporal logic to perform data analysis. Then, we focus on the opposite direction, i.e., that of employing machine learning techniques to generate temporal logic formulas, considering a natural language processing scenario. Finally, as a conclusive development, the architecture of a system is proposed, in which formal methods and machine learning techniques are seamlessly combined to perform anomaly detection and predictive maintenance tasks. Such an integration represents an original, thrilling research direction that may open up new ways of dealing with complex, real-world problems.Data science is a well-known buzzword, that is in fact composed of two distinct keywords, i.e., data and science. Data itself is of great importance: each analysis task begins from a set of examples. Based on such a consideration, the present work starts with the analysis of a real case scenario, by considering the development of a data warehouse-based decision support system for an Italian contact center company. Then, relying on the information collected in the developed system, a set of machine learning-based analysis tasks have been developed to answer specific business questions, such as employee work anomaly detection and automatic call classification. Although such initial applications rely on already available algorithms, as we shall see, some clever analysis workflows had also to be developed. Afterwards, continuously driven by real data and real world applications, we turned ourselves to the question of how to handle temporal information within classical decision tree models. Our research brought us the development of J48SS, a decision tree induction algorithm based on Quinlan's C4.5 learner, which is capable of dealing with temporal (e.g., sequential and time series) as well as atemporal (such as numerical and categorical) data during the same execution cycle. The decision tree has been applied into some real world analysis tasks, proving its worthiness. A key characteristic of J48SS is its interpretability, an aspect that we specifically addressed through the study of an evolutionary-based decision tree pruning technique. Next, since a lot of work concerning the management of temporal information has already been done in automated reasoning and formal verification fields, a natural direction in which to proceed was that of investigating how such solutions may be combined with machine learning, following two main tracks. First, we show, through the development of an enriched decision tree capable of encoding temporal information by means of interval temporal logic formulas, how a machine learning algorithm can successfully exploit temporal logic to perform data analysis. Then, we focus on the opposite direction, i.e., that of employing machine learning techniques to generate temporal logic formulas, considering a natural language processing scenario. Finally, as a conclusive development, the architecture of a system is proposed, in which formal methods and machine learning techniques are seamlessly combined to perform anomaly detection and predictive maintenance tasks. Such an integration represents an original, thrilling research direction that may open up new ways of dealing with complex, real-world problems

    Data Mining

    Get PDF
    Data mining is a branch of computer science that is used to automatically extract meaningful, useful knowledge and previously unknown, hidden, interesting patterns from a large amount of data to support the decision-making process. This book presents recent theoretical and practical advances in the field of data mining. It discusses a number of data mining methods, including classification, clustering, and association rule mining. This book brings together many different successful data mining studies in various areas such as health, banking, education, software engineering, animal science, and the environment

    Analysis, Characterization, Prediction and Attribution of Extreme Atmospheric Events with Machine Learning: a Review

    Full text link
    Atmospheric Extreme Events (EEs) cause severe damages to human societies and ecosystems. The frequency and intensity of EEs and other associated events are increasing in the current climate change and global warming risk. The accurate prediction, characterization, and attribution of atmospheric EEs is therefore a key research field, in which many groups are currently working by applying different methodologies and computational tools. Machine Learning (ML) methods have arisen in the last years as powerful techniques to tackle many of the problems related to atmospheric EEs. This paper reviews the ML algorithms applied to the analysis, characterization, prediction, and attribution of the most important atmospheric EEs. A summary of the most used ML techniques in this area, and a comprehensive critical review of literature related to ML in EEs, are provided. A number of examples is discussed and perspectives and outlooks on the field are drawn.Comment: 93 pages, 18 figures, under revie

    Comparison of Statistical and Machine Learning Models on Road Traffic Accident Severity Classification

    Get PDF
    Portugal has the sixth highest road fatality rate among European Union members. This is a problem of different dimensions with serious consequences in people’s lives. This study analyses daily data from police and government authorities on road traffic accidents that occurred between 2016 and 2019 in a district of Portugal. This paper looks for the determinants that contribute to the existence of victims in road traffic accidents, as well as the determinants for fatalities and/or serious injuries in accidents with victims. We use logistic regression models, and the results are compared to the machine-learning model results. For the severity model, where the response variable indicates whether only property damage or casualties resulted in the traffic accident, we used a large sample with a small imbalance. For the serious injuries model, where the response variable indicates whether or not there were victims with serious injuries and/or fatalities in the traffic accident with victims, we used a small sample with very imbalanced data. Empirical analysis supports the conclusion that, with a small sample of imbalanced data, machine-learning models generally do not perform better than statistical models; however, they perform similarly when the sample is large and has a small imbalance

    Data mining methods to detect airborne pollen of spring flowering arboreal taxa

    Get PDF
    Variations in the airborne pollen load are among the current and expected impacts on plant pollination driven by climate change. Due to the potential risk for pollen-allergy sufferers, this study aimed to analyze the trends of the three most abundant spring-tree pollen types, Pinus, Platanus and Quercus, and to evaluate the possible influence of meteorological conditions. An aerobiological study was performed during the 1993–2020 period in the Ourense city (NW Spain) by means of a Hirst-type volumetric sampler. Meteorological data were obtained from the ‘Ourense’ meteorological station of METEOGALICIA. We found statistically significant trends for the Total Pollen in all cases. The positive slope values indicated an increase in pollen grains over the pollen season along the studied years, ranging from an increase of 107 to 442 pollen grains. The resulting C5.0 Decision Trees and Rule-Based Models coincided with the Spearman’s correlations since both statistical analyses showed a strong and positive influence of temperature and sunlight on pollen release and dispersal, as well as a negative influence of rainfall due to washout processes. Specifically, we found that slight rainfall and moderate temperatures promote the presence of Pinus pollen in the atmosphere and a marked effect of the daily thermal amplitude on the presence of high Platanus pollen levels. The percentage of successful predictions of the C5.0 models ranged between 62.23–74.28%. The analysis of long-term datasets of pollen and meteorological information provides valuable models that can be used as an indicator of potential allergy risk in the short term by feeding the obtained models with weather prognostics.Xunta de Galicia | Ref. CO-0034-2021 00VTUniversidad de Vigo | Ref. INOU 2021Universidad de Vigo | Ref. OUR1 131H 64

    London plane tree pollen and Pla A 1 allergen concentrations assessment in urban environments

    Get PDF
    The London plane tree is frequently used in gardens, parks, and avenues in European urban areas for ornamental purposes with the aim to provide shade, and given its tolerance to atmospheric pollution. Nevertheless, unfortunately, over recent decades, bioaerosols such as Platanus pollen grains cause increasing human health problems such as allergies or respiratory tract infections. An aerobiological sampling of airborne Platanus pollen and Pla a 1 allergen was performed using two volumetric traps placed on the roof of the Science Faculty building of the city of Ourense from 2009 to 2020. A volumetric sampler Hirst–type Lanzoni VPPS 2000 (Lanzoni s.r.l. Bologna, Italy) was used for pollen sampling. Pla a 1 aeroallergen was sampled by using a Burkard Multi-Vial Cyclone Sampler (Burkard Manufacturing Co., Ltd., Hertfordshire, UK) and by means of the enzyme-linked immunosorbent assay (ELISA) technique. Data mining algorithms, C5.0 decision trees, and rule-based models were assessed to evaluate the effects of the main meteorological factors in the pollen or allergen concentrations. Plane trees bloom in late winter and spring months in the Northwestern Spain area. Regarding the trends of the parameters that define the Platanus pollen season, the allergen values fitted the concentrations of pollen in the air in most cases. In addition, it was observed that a decrease in maximum temperatures causes a descent in both pollen and allergen concentrations. However, the presence of precipitations only increases the level of allergens. When the risk of allergy symptomatology was jointly assessed for both the concentration of pollen and allergens in the study area, the number of days with moderate and high risk for pollen allergy in sensitive people increased with respect to traditional alerts considering only the pollen values.Xunta de Galicia | Ref. ED431C 2017/62 BV1Consellería de Sanidade, Xunta de Galicia | Ref. CO-0034-2021 00V
    • 

    corecore