696 research outputs found

    High-Resolution Road Vehicle Collision Prediction for the City of Montreal

    Full text link
    Road accidents are an important issue of our modern societies, responsible for millions of deaths and injuries every year in the world. In Quebec only, in 2018, road accidents are responsible for 359 deaths and 33 thousands of injuries. In this paper, we show how one can leverage open datasets of a city like Montreal, Canada, to create high-resolution accident prediction models, using big data analytics. Compared to other studies in road accident prediction, we have a much higher prediction resolution, i.e., our models predict the occurrence of an accident within an hour, on road segments defined by intersections. Such models could be used in the context of road accident prevention, but also to identify key factors that can lead to a road accident, and consequently, help elaborate new policies. We tested various machine learning methods to deal with the severe class imbalance inherent to accident prediction problems. In particular, we implemented the Balanced Random Forest algorithm, a variant of the Random Forest machine learning algorithm in Apache Spark. Interestingly, we found that in our case, Balanced Random Forest does not perform significantly better than Random Forest. Experimental results show that 85% of road vehicle collisions are detected by our model with a false positive rate of 13%. The examples identified as positive are likely to correspond to high-risk situations. In addition, we identify the most important predictors of vehicle collisions for the area of Montreal: the count of accidents on the same road segment during previous years, the temperature, the day of the year, the hour and the visibility

    Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction

    Full text link
    Taxi demand prediction is an important building block to enabling intelligent transportation systems in a smart city. An accurate prediction model can help the city pre-allocate resources to meet travel demand and to reduce empty taxis on streets which waste energy and worsen the traffic congestion. With the increasing popularity of taxi requesting services such as Uber and Didi Chuxing (in China), we are able to collect large-scale taxi demand data continuously. How to utilize such big data to improve the demand prediction is an interesting and critical real-world problem. Traditional demand prediction methods mostly rely on time series forecasting techniques, which fail to model the complex non-linear spatial and temporal relations. Recent advances in deep learning have shown superior performance on traditionally challenging tasks such as image classification by learning the complex features and correlations from large-scale data. This breakthrough has inspired researchers to explore deep learning techniques on traffic prediction problems. However, existing methods on traffic prediction have only considered spatial relation (e.g., using CNN) or temporal relation (e.g., using LSTM) independently. We propose a Deep Multi-View Spatial-Temporal Network (DMVST-Net) framework to model both spatial and temporal relations. Specifically, our proposed model consists of three views: temporal view (modeling correlations between future demand values with near time points via LSTM), spatial view (modeling local spatial correlation via local CNN), and semantic view (modeling correlations among regions sharing similar temporal patterns). Experiments on large-scale real taxi demand data demonstrate effectiveness of our approach over state-of-the-art methods.Comment: AAAI 2018 pape

    Severity Analysis of Large Truck Crashes- Comparision Between the Regression Modeling Methods with Machine Learning Methods.

    Get PDF
    According to the Texas Department of Transportation’s Texas Motor Vehicle Crash Statistics, Texas has had the highest number of severe crashes involving large trucks in the US. As defined by the US Department of Transportation, a large truck is any vehicle with a gross vehicle weight rating greater than 10,000 pounds. Generally, it requires more time and much more space for large trucks to accelerating, slowing down, and stopping. Also, there will be large blind spots when large trucks make wide turns. Therefore, if an unexpected traffic situation comes upon, It would be more difficult for large trucks to take evasive actions than regular vehicles to avoid a collision. Due to their large size and heavy weight, large truck crashes often result in huge economic and social costs. Predicting the severity level of a reported large truck crash with unknown severity or of the severity of crashes that may be expected to occur sometime in the future is useful. It can help to prevent the crash from happening or help rescue teams and hospitals provide proper medical care as fast as possible. To identify the appropriate modeling approaches for predicting the severity of large truck crash, in this research, four representative classification tree-based ML models (e.g., Extreme Gradient Boosting tree (XGBoost), Adaptive Boosting tree(AdaBoost), Random Forest (RF), Gradient Boost Decision Tree (GBDT)), two non-tree-based ML models (e.g., the Support Vector Machines (SVM), k-Nearest Neighbors (kNN)), and LR model were selected. The results indicate that the GBDT model performs best among all of seven models

    Black spots identification on rural roads based on extreme learning machine

    Get PDF
    Accident black spots are usually defined as road locations with a high risk of fatal accidents. A thorough analysis of these areas is essential to determine the real causes of mortality due to these accidents and can thus help anticipate the necessary decisions to be made to mitigate their effects. In this context, this study aims to develop a model for the identification, classification and analysis of black spots on roads in Morocco. These areas are first identified using extreme learning machine (ELM) algorithm, and then the infrastructure factors are analyzed by ordinal regression. The XGBoost model is adopted for weighted severity index (WSI) generation, which in turn generates the severity scores to be assigned to individual road segments. The latter are then classified into four classes by using a categorization approach (high, medium, low and safe). Finally, the bagging extreme learning machine is used to classify the severity of road segments according to infrastructures and environmental factors. Simulation results show that the proposed framework accurately and efficiently identified the black spots and outperformed the reputable competing models, especially in terms of accuracy 98.6%. In conclusion, the ordinal analysis revealed that pavement width, road curve type, shoulder width and position were the significant factors contributing to accidents on rural roads

    Risk Analytics in Econometrics

    Get PDF
    [eng] This thesis addresses the framework of risk analytics as a compendium of four main pillars: (i) big data, (ii) intensive programming, (iii) advanced analytics and machine learning, and (iv) risk analysis. Under the latter mainstay, this PhD dissertation reviews potential hazards known as “extreme events” that could negatively impact the wellbeing of people, profitability of firms, or the economic stability of a country, but which also have been underestimated or incorrectly treated by traditional modelling techniques. The objective of this thesis is to develop econometric and machine learning algorithms that can improve the predictive capacity of those extreme events and improve the comprehension of the phenomena contrary to some modern advanced methods which are black boxes in terms of interpretation. This thesis presents seven chapters that provide a methodological contribution to the existing literature by building techniques that transform the new valuable insights of big data into more accurate predictions that support decisions under risk, and increase robustness for more reliable and real results. This PhD thesis focuses uniquely on extremal events which are trigged into a binary variable, mostly known as class-imbalanced data and rare events in binary response, in other words, whose classes that are not equally distributed. The scope of research tackle real cases studies in the field of risk and insurance, where it is highly important to specify a level of claims of an event in order to foresee its impact and to provide a personalized treatment. After Chapter 1 corresponding to the introduction, Chapter 2 proposes a weighting mechanism to incorporated in the weighted likelihood estimation of a generalized linear model to improve the predictive performance of the highest and lowest deciles of prediction. Chapter 3 proposes two different weighting procedures for a logistic regression model with complex survey data or specific sampling designed data. Its objective is to control the randomness of data and provide more sensitivity to the estimated model. Chapter 4 proposes a rigorous review of trials with modern and classical predictive methods to uncover and discuss the efficiency of certain methods over others, and which and how gaps in machine learning literature can be addressed efficiently. Chapter 5 proposes a novel boosting-based method that overcomes certain existing methods in terms of predictive accuracy and also, recovers some interpretation of the model with imbalanced data. Chapter 6 develops another boosting-based algorithm which is able to improve the predictive capacity of rare events and get approximated as a generalized linear model in terms of interpretation. And finally, Chapter 7 includes the conclusions and final remarks. The present thesis highlights the importance of developing alternative modelling algorithms that reduces uncertainty, especially when there are potential limitations that impede to know all the previous factors that influence on the presence of a rare event or imbalanced-data phenomenon. This thesis merges two important approaches in modelling predictive literature as they are: “econometrics” and “machine learning”. All in all, this thesis contributes to enhance the methodology of how empirical analysis in many experimental and non-experimental sciences have being doing so far

    Accident prediction using machine learning:analyzing weather conditions, and model performance

    Get PDF
    Abstract. The primary focus of this study was to investigate the impact of weather and road conditions on the severity of accidents and to determine the feasibility of machine learning models in accurately predicting the likelihood of such incidents. The research was centered on two key research questions. Firstly, the study examined the influence of weather and road conditions on accident severity and identified the most related factors contributing to accidents. We utilized an open-source accident dataset, which was preprocessed using techniques like variable selection, missing data elimination, and data balancing through the Synthetic Minority Over-sampling Technique (SMOTE). Chi-square statistical analysis was performed, suggesting that all weather-related variables are more or less associated with the severity of accidents. Visibility and temperature were found to be the most critical factors affecting the severity of road accidents. Hence, appropriate measures such as implementing effective fog dispersal systems, heatwave alerts, or improved road maintenance during extreme temperatures could help reduce accident severity. Secondly, the research evaluated the ability of machine learning models including decision trees, random forests, naive bayes, extreme gradient boost, and neural networks to predict accident likelihood. The models’ performance was gauged using metrics like accuracy, precision, recall, and F1 score. The Random Forest model emerged as the most reliable and accurate model for predicting accidents, with an overall accuracy of 98.53%. The Decision Tree model also showed high overall accuracy (95.33%), indicating its reliability. However, the Naive Bayes model showed the lowest accuracy (63.31%) and was deemed less reliable in this context. It is concluded that machine learning models can be effectively used to predict the likelihood of accidents, with models like Random Forest and Decision Tree proving the most effective. However, the effectiveness of each model may vary depending on the dataset and context, necessitating further testing and validation for real-world implementation. These findings not only provide insight into the factors affecting accident severity but also open a promising avenue in employing machine learning techniques for proactive accident prediction and mitigation. Future studies can aim to refine the models further and potentially integrate them into traffic management systems to enhance road safety

    Fuel Consumption Evaluation of Connected Automated Vehicles Under Rear-End Collisions

    Get PDF
    Connected automated vehicles (CAV) can increase traffic efficiency, which is considered a critical factor in saving energy and reducing emissions in traffic congestion. In this paper, systematic traffic simulations are conducted for three car-following modes, including intelligent driver model (IDM), adaptive cruise control (ACC), and cooperative ACC (CACC), in congestions caused by rear-end collisions. From the perspectives of lane density, vehicle trajectory and vehicle speed, the fuel consumption of vehicles under the three car-following modes are compared and analysed, respectively. Based on the vehicle driving and accident environment parameters, an XGBoost algorithm-based fuel consumption prediction framework is proposed for traffic congestions caused by rear-end collisions. The results show that compared with IDM and ACC modes, the vehicles in CACC car-following mode have the ideal performance in terms of total fuel consumption; besides, the traffic flow in CACC mode is more stable, and the speed fluctuation is relatively tiny in different accident impact regions, which meets the driving desires of drivers
    • …
    corecore