3,025 research outputs found

    An Analysis of the Predictive Capability of C5.0 and Chaid Decision Trees and Bayes Net in the Classification of fatal Traffic Accidents in the UK

    Get PDF
    Road traffic accidents are a significant cause of deaths worldwide and there is a global focus on understanding accident contributory factors and implementing prevention strategies. Although accident statistics are steadily improving, effective prevention must be persistent, evidence based and properly resourced. This research aimed to extract fatal traffic accident prediction from UK STATS19 accident data using C5.0 and Chaid decision trees and Bayes net classification models. Data was grouped as either fatal or non-fatal. The class imbalance due to fatal accident infrequency was considered and data transformation and sampling techniques were applied to increase prediction likelihood. Chaid was used for supervised discretisation and proved effective in identifying homogeneous subgroups. SPSS Modeler was used for data preparation and model build. Model performance was evaluated using accuracy, recall, precision and ROC curves. The experiment design and data preparation approach successfully predicted fatal accidents with high recall results, however, significant misclassification of non-fatals as fatals led to poor accuracy and precision performance. Boosting was subsequently tested and achieved some accuracy improvement. Serious accidents were grouped as non-fatal in the initial data analysis, however, are likely to hold similar characteristics to fatal and the models therefore struggled to classify correctly as non-fatal. Changing the experiment design to select fatal, serious and slight as targets may improve the models accuracy. Overall, the models succeeded in classifying fatal traffic accidents correctly and this was the original objective of the research. Interpretation of business rules, by ranking rules and summarising in a standard format, proved effective for understanding and comparison of key predictors. When comparing both C5.0 and Bayes net models, the contributory factors identified were consistent, with road surface and urban/rural identified as the strongest predictors for both models. The experiment demonstrated that classification techniques can be used to predict infrequent events once sampling techniques are applied

    Accident prediction using machine learning:analyzing weather conditions, and model performance

    Get PDF
    Abstract. The primary focus of this study was to investigate the impact of weather and road conditions on the severity of accidents and to determine the feasibility of machine learning models in accurately predicting the likelihood of such incidents. The research was centered on two key research questions. Firstly, the study examined the influence of weather and road conditions on accident severity and identified the most related factors contributing to accidents. We utilized an open-source accident dataset, which was preprocessed using techniques like variable selection, missing data elimination, and data balancing through the Synthetic Minority Over-sampling Technique (SMOTE). Chi-square statistical analysis was performed, suggesting that all weather-related variables are more or less associated with the severity of accidents. Visibility and temperature were found to be the most critical factors affecting the severity of road accidents. Hence, appropriate measures such as implementing effective fog dispersal systems, heatwave alerts, or improved road maintenance during extreme temperatures could help reduce accident severity. Secondly, the research evaluated the ability of machine learning models including decision trees, random forests, naive bayes, extreme gradient boost, and neural networks to predict accident likelihood. The models’ performance was gauged using metrics like accuracy, precision, recall, and F1 score. The Random Forest model emerged as the most reliable and accurate model for predicting accidents, with an overall accuracy of 98.53%. The Decision Tree model also showed high overall accuracy (95.33%), indicating its reliability. However, the Naive Bayes model showed the lowest accuracy (63.31%) and was deemed less reliable in this context. It is concluded that machine learning models can be effectively used to predict the likelihood of accidents, with models like Random Forest and Decision Tree proving the most effective. However, the effectiveness of each model may vary depending on the dataset and context, necessitating further testing and validation for real-world implementation. These findings not only provide insight into the factors affecting accident severity but also open a promising avenue in employing machine learning techniques for proactive accident prediction and mitigation. Future studies can aim to refine the models further and potentially integrate them into traffic management systems to enhance road safety

    Short-term crash risk prediction considering proactive, reactive, and driver behavior factors

    Get PDF
    Providing a safe and efficient transportation system is the primary goal of transportation engineering and planning. Highway crashes are among the most significant challenges to achieving this goal. They result in significant societal toll reflected in numerous fatalities, personal injuries, property damage, and traffic congestion. To that end, much attention has been given to predictive models of crash occurrence and severity. Most of these models are reactive: they use the data about crashes that have occurred in the past to identify the significant crash factors, crash hot-spots and crash-prone roadway locations, analyze and select the most effective countermeasures for reducing the number and severity of crashes. More recently, the advancements have been made in developing proactive crash risk models to assess short-term crash risks in near-real time. Such models could be applied as part of traffic management strategies to prevent and mitigate the crashes. The driver behavior is found to be the leading cause of highway crashes. Nevertheless, due to data unavailability, limited studies have explored and quantified the role of driver behavior in crashes. The Strategic Highway Research Program Naturalistic Driving Study (SHRP 2 NDS) offers an unprecedented opportunity to perform an in-depth analysis of the impacts of driver behavior on crashes events. The research presented in this dissertation is divided into three parts, corresponding to the research objectives. The first part investigates the application of advanced data modeling methods for proactive crash risk analysis. Several proactive models for segment level crash risk and severity assessment are developed and tested, considering the proactive data available to most transportation agencies in real time at a regional network scale. The data include roadway geometry characteristics, traffic flow characteristics, and weather condition data. The analysis methods include Random-effect Bayesian Logistics Regression, Random Forest, Gradient Boosting Machine, K-Nearest Neighbor, Gaussian Naive Bayes (GNB), and Multi-layer Feedforward Deep Neural Network (MLFDNN). The random oversampling technique is applied to deal with the problem of data imbalance associated with the injury severity analysis. The model training and testing are completed using a dataset containing records of 10,155 crashes that occurred on two interstate highways in New Jersey over a period of two years. The second part of the study analyzes the potential improvement in the prediction abilities of the proposed models by adding reactive data (such as vehicle characteristics and driver characteristics) to the analysis. Commonly, the reactive data is only available (known) after the crash occurs. In the proposed research, the crash analysis is performed by classifying crashes in multiple groupings (instead of a single group), constructed based on the age of drivers and vehicles to account for the impact of reactive data on driver injury severity outcomes. The results of the second part of the study show that while the simultaneous use of reactive and proactive data can improve the prediction performance of the models, the absolute crash probability values must be further improved for operational crash risk prediction. To this end, in the third part of the study, the Naturalistic Driving Study data is used to calibrate the crash risk models, including the driver behavior risk factors. The findings show significant improvement in crash prediction accuracy with the inclusion of driver behavior risk factors, which confirms the driver behavior to be the most critical risk factor affecting the crash likelihood and the associated injury severity

    Forecasting the Accident Frequency and Risk Factors: A Case Study of Erzurum, Turkey

    Get PDF
    Nowadays, life is intimately associated with transportation, generating several issues on it. Numerous works are available concerning accident prediction techniques depending on independent road and traffic features, while the mix parameters including time, geometry, traffic flow, and weather conditions are still rarely ever taken into consideration. This study aims to predict future accident frequency and the risk factors of traffic accidents. It utilizes the Generalized Linear Model (GLM) and Artificial Neural Networks (ANN) approaches to process and predict traffic data efficiently based on 21500 records of traffic accidents that occurred in Erzurum in Turkey from 2005 to 2019. The results of the comparative evaluation demonstrated that the ANN model outperformed the GLM model. The study revealed that the most effective variable was the number of horizontal curves. The annual average growth rates of accident occurrences based on the ANNꞌs method are predicted to be 11.22% until 2030

    Applying Machine Learning Techniques to Analyze the Pedestrian and Bicycle Crashes at the Macroscopic Level

    Get PDF
    This thesis presents different data mining/machine learning techniques to analyze the vulnerable road users\u27 (i.e., pedestrian and bicycle) crashes by developing crash prediction models at macro-level. In this study, we developed data mining approach (i.e., decision tree regression (DTR) models) for both pedestrian and bicycle crash counts. To author knowledge, this is the first application of DTR models in the growing traffic safety literature at macro-level. The empirical analysis is based on the Statewide Traffic Analysis Zones (STAZ) level crash count data for both pedestrian and bicycle from the state of Florida for the year of 2010 to 2012. The model results highlight the most significant predictor variables for pedestrian and bicycle crash count in terms of three broad categories: traffic, roadway, and socio demographic characteristics. Furthermore, spatial predictor variables of neighboring STAZ were utilized along with the targeted STAZ variables in order to improve the prediction accuracy of both DTR models. The DTR model considering spatial predictor variables (spatial DTR model) were compared without considering spatial predictor variables (aspatial DTR model) and the models comparison results clearly found that spatial DTR model is superior model compared to aspatial DTR model in terms of prediction accuracy. Finally, this study contributed to the safety literature by applying three ensemble techniques (Bagging, Random Forest, and Boosting) in order to improve the prediction accuracy of weak learner (DTR models) for macro-level crash count. The model\u27s estimation result revealed that all the ensemble technique performed better than the DTR model and the gradient boosting technique outperformed other competing ensemble technique in macro-level crash prediction model

    Predicting Pilot Misperception of Runway Excursion Risk Through Machine Learning Algorithms of Recorded Flight Data

    Get PDF
    The research used predictive models to determine pilot misperception of runway excursion risk associated with unstable approaches. The Federal Aviation Administration defined runway excursion as a veer-off or overrun of the runway surface. The Federal Aviation Administration also defined a stable approach as an aircraft meeting the following criteria: (a) on target approach airspeed, (b) correct attitude, (c) landing configuration, (d) nominal descent angle/rate, and (e) on a straight flight path to the runway touchdown zone. Continuing an unstable approach to landing was defined as Unstable Approach Risk Misperception in this research. A review of the literature revealed that an unstable approach followed by the failure to execute a rejected landing was a common contributing factor in runway excursions. Flight Data Recorder data were archived and made available by the National Aeronautics and Space Administration for public use. These data were collected over a four-year period from the flight data recorders of a fleet of 35 regional jets operating in the National Airspace System. The archived data were processed and explored for evidence of unstable approaches and to determine whether or not a rejected landing was executed. Once identified, those data revealing evidence of unstable approaches were processed for the purposes of building predictive models. SAS™ Enterprise MinerR was used to explore the data, as well as to build and assess predictive models. The advanced machine learning algorithms utilized included: (a) support vector machine, (b) random forest, (c) gradient boosting, (d) decision tree, (e) logistic regression, and (f) neural network. The models were evaluated and compared to determine the best prediction model. Based on the model comparison, the decision tree model was determined to have the highest predictive value. The Flight Data Recorder data were then analyzed to determine predictive accuracy of the target variable and to determine important predictors of the target variable, Unstable Approach Risk Misperception. Results of the study indicated that the predictive accuracy of the best performing model, decision tree, was 99%. Findings indicated that six variables stood out in the prediction of Unstable Approach Risk Misperception: (1) glideslope deviation, (2) selected approach speed deviation (3) localizer deviation, (4) flaps not extended, (5) drift angle, and (6) approach speed deviation. These variables were listed in order of importance based on results of the decision tree predictive model analysis. The results of the study are of interest to aviation researchers as well as airline pilot training managers. It is suggested that the ability to predict the probability of pilot misperception of runway excursion risk could influence the development of new pilot simulator training scenarios and strategies. The research aids avionics providers in the development of predictive runway excursion alerting display technologies

    Safety Performance Prediction of Large-Truck Drivers in the Transportation Industry

    Get PDF
    The trucking industry and truck drivers play a key role in the United States commercial transportation sector. Accidents involving large trucks is one such big event that can cause huge problems to the driver, company, customer and other road users causing property damage and loss of life. The objective of this research is to concentrate on an individual transportation company and use their historical data to build models based on statistical and machine learning methods to predict accidents. The focus is to build models that has high accuracy and correctly predicts an accident. Logistic regression and penalized logistic regression models were tested initially to obtain some interpretation between the predictor variables and the response variable. Random forest, gradient boosting machine (GBM) and deep learning methods are explored to deal with high non-linear and complex data. The cost of fatal and non-fatal accidents is also discussed to weight the difference between training a driver and encountering an accident. Since accidents are very rare events, the model accuracy should be balanced between predicting non-accidents (specificity) and predicting accidents (sensitivity). This framework can be a base line for transportation companies to emphasis the benefits of prediction to have safer and more productive drivers

    Network-Wide Pedestrian and Bicycle Crash Analysis with Statistical and Machine Learning Models in Utah

    Get PDF
    Recent trends in crashes indicate a dramatic increase in both the number and share of pedestrian and bicyclist injuries and fatalities nationally and in many states. Crash frequency modeling was undertaken to identify crash prone characteristics of segments and non-signalized intersections and explore possible non-linear associations of explanatory variables with crashes. Crowdsourced “Strava” app data was used for bicycle volume, and pedestrian counts estimated from nearby signalized intersections were used as pedestrian volume. Multiple negative binomial models investigated crashes at different spatial scales to account for different levels of data availability and completeness. The models showed high traffic volume, steeper vertical grades on roads, frequent bus and rail stations, greater driveway density, more legs at intersections, streets with high large truck presence, greater residential and employment density, as a larger share of low-income households and non-white race/ethnicity groups are indicators of locations with more pedestrian and bicycle crashes. Crash severity model results showed that crashes occurring at mid-blocks and near vertical grades were more severe compared to crashes at intersections. High daily temperature, driving under influence, and distracted driving also increases injury severity in crashes. This study suggests potential countermeasures, policy implications, and the scope of future research for improving pedestrian and bicycle safety at segments and at non-signalized intersections
    • …
    corecore