2,041 research outputs found

    Exploring Statistical and Machine Learning-Based Missing Data Imputation Methods to Improve Crash Frequency Prediction Models for Highway-Rail Grade Crossings

    Get PDF
    Highway-rail grade crossings (HRGCs) are critical spatial locations of transportation safety because crashes at HRGCs are often catastrophic, potentially causing several injuries and fatalities. Every year in the United States, a significant number of crashes occur at these crossings, prompting local and state organizations to engage in safety analysis and estimate crash frequency prediction models for resource allocation. These models provide valuable insights into safety and risk mitigation strategies for HRGCs. Furthermore, the estimation of these models is based on inventory details of HRGCs, and their quality is crucial for reliable crash predictions. However, many of these models exclude crossings with missing inventory details, which can adversely affect the precision of these models. In this study, a random sample of inventory details of 2000 HRGCs was taken from the Federal Railroad Administration’s HRGCs inventory database. Data filters were applied to retain only those crossings in the data that were at-grade, public and operational (N=1096). Missing values were imputed using various statistical and machine learning methods, including Mean, Median and Mode (MMM) imputation, Last Observation Carried Forward (LOCF) imputation, K-Nearest Neighbors (KNN) imputation, Expectation-Maximization (EM) imputation, Support Vector Machine (SVM) imputation, and Random Forest (RF) imputation. The results indicated that the crash frequency models based on machine learning imputation methods yielded better-fitted models (lower AIC and BIC values). The findings underscore the importance of obtaining complete inventory data through machine learning imputation methods when developing crash frequency models for HRGCs. This approach can substantially enhance the precision of these models, improving their predictive capabilities, and ultimately saving valuable human lives

    IoT Data Imputation with Incremental Multiple Linear Regression

    Get PDF
    In this paper, we address the problem related to missing data imputation in the IoT domain. More specifically, we propose an Incremental Space-Time-based model (ISTM) for repairing missing values in IoT real-time data streams. ISTM is based on Incremental Multiple Linear Regression, which processes data as follows: Upon data arrival, ISTM updates the model after reading again the intermediary data matrix instead of accessing all historical information. If a missing value is detected, ISTM will provide an estimation for the missing value based on nearly historical data and the observations of neighboring sensors of the default one. Experiments conducted with real traffic data show the performance of ISTM in comparison with known techniques

    A comparison of strategies for missing values in data on machine learning classification algorithms

    Get PDF
    Abstract: Dealing with missing values in data is an important feature engineering task in data science to prevent negative impacts on machine learning classification models in terms of accurate prediction. However, it is often unclear what the underlying cause of the missing values in real-life data is or rather the missing data mechanism that is causing the missingness. Thus, it becomes necessary to evaluate several missing data approaches for a given dataset. In this paper, we perform a comparative study of several approaches for handling missing values in data, namely listwise deletion, mean, mode, k–nearest neighbors, expectation-maximization, and multiple imputations by chained equations. The comparison is performed on two real-world datasets, using the following evaluation metrics: Accuracy, root mean squared error, receiver operating characteristics, and the F1 score. Most classifiers performed well across the missing data strategies. However, based on the result obtained, the support vector classifier method overall performed marginally better for the numerical data and naïve Bayes classifier for the categorical data when compared to the other evaluated missing value methods

    ChatGPT is on the Horizon: Could a Large Language Model be Suitable for Intelligent Traffic Safety Research and Applications?

    Full text link
    ChatGPT embarks on a new era of artificial intelligence and will revolutionize the way we approach intelligent traffic safety systems. This paper begins with a brief introduction about the development of large language models (LLMs). Next, we exemplify using ChatGPT to address key traffic safety issues. Furthermore, we discuss the controversies surrounding LLMs, raise critical questions for their deployment, and provide our solutions. Moreover, we propose an idea of multi-modality representation learning for smarter traffic safety decision-making and open more questions for application improvement. We believe that LLM will both shape and potentially facilitate components of traffic safety research.Comment: Submitted to Nature - Machine Intelligence (Revised and Extended

    Strikes, Scabs and Tread Separations: Labor Strife and the Production of Defective Bridgestone/Firestone Tires

    Get PDF
    This paper provides a case study of the effect of labor relations on product quality. We consider whether a long, contentious strike and the hiring of permanent replacement workers by Bridgestone/Firestone in the mid-1990s contributed to the production of an excess number of defective tires. Using several independent data sources we find that labor strife in the Decatur plant closely coincided with lower product quality. Count data regression models based on two data sets of tire failures by plant, year and age show significantly higher failure rates for tires produced in Decatur during the labor dispute than before or after the dispute, or than at other plants. Also, an analysis of internal Firestone engineering tests indicates that P235 tires from Decatur performed less well if they were manufactured during the labor dispute compared with those produced after the dispute, or compared with those from other, non-striking plants. Monthly data suggest that the production of defective tires was particularly high around the time wage concessions were demanded by Firestone in early 1994 and when large numbers of replacement workers and permanent workers worked side by side in late 1995 and early 1996.

    A New Paradigm for Development of Data Imputation Approach for Missing Value Estimation

    Get PDF
    Many real-world applications encountered a common issue in data analysis is the presence of missing data value and challenging task in many applications such as wireless sensor networks, medical applications and psychological domain and others. Learning and prediction in the presence of missing value can be treacherous in machine learning, data mining and statistical analysis. A missing value can signify important information about dataset in the mining process. Handling missing data value is a challenging task for the data mining process. In this paper, we propose new paradigm for the development of data imputation method for missing data value estimation based on centroids and the nearest neighbours. Firstly, identify clusters based on the k-means algorithm and calculate centroids and the nearest neighbour data records. Secondly, the nearest distances from complete dataset as well as incomplete dataset from the centroids and estimated the nearest data record which tends to be curse dimensionality. Finally, impute the missing value based nearest neighbour record using statistical measure called z-score. The experimental study demonstrates strengthen of the proposed paradigm for the imputation of the missing data value estimation in dataset. Tests have been run using different types of datasets in order to validate our approach and compare the results with other imputation methods such as KNNI, SVMI, WKNNI, KMI and FKNNI. The proposed approach is geared towards maximizing the utility of imputation with respect to missing data value estimation
    corecore