640 research outputs found

    On the "Poisson Trick" and its Extensions for Fitting Multinomial Regression Models

    Full text link
    This article is concerned with the fitting of multinomial regression models using the so-called "Poisson Trick". The work is motivated by Chen & Kuo (2001) and Malchow-M{\o}ller & Svarer (2003) which have been criticized for being computationally inefficient and sometimes producing nonsense results. We first discuss the case of independent data and offer a parsimonious fitting strategy when all covariates are categorical. We then propose a new approach for modelling correlated responses based on an extension of the Gamma-Poisson model, where the likelihood can be expressed in closed-form. The parameters are estimated via an Expectation/Conditional Maximization (ECM) algorithm, which can be implemented using functions for fitting generalized linear models readily available in standard statistical software packages. Compared to existing methods, our approach avoids the need to approximate the intractable integrals and thus the inference is exact with respect to the approximating Gamma-Poisson model. The proposed method is illustrated via a reanalysis of the yogurt data discussed by Chen & Kuo (2001)

    Exact Approaches for Bias Detection and Avoidance with Small, Sparse, or Correlated Categorical Data

    Get PDF
    Every day, traditional statistical methodology are used world wide to study a variety of topics and provides insight regarding countless subjects. Each technique is based on a distinct set of assumptions to ensure valid results. Additionally, many statistical approaches rely on large sample behavior and may collapse or degenerate in the presence of small, spare, or correlated data. This dissertation details several advancements to detect these conditions, avoid their consequences, and analyze data in a different way to yield trustworthy results. One of the most commonly used modeling techniques for outcomes with only two possible categorical values (eg. live/die, pass/fail, better/worse, ect.) is logistic regression. While some potential complications with this approach are widely known, many investigators are unaware that their particular data does not meet the foundational assumptions, since they are not easy to verify. We have developed a routine for determining if a researcher should be concerned about potential bias in logistic regression results, so they can take steps to mitigate the bias or use a different procedure altogether to model the data. Correlated data may arise from common situations such as multi-site medical studies, research on family units, or investigations on student achievement within classrooms. In these circumstance the associations between cluster members must be included in any statistical analysis testing the hypothesis of a connection be-tween two variables in order for results to be valid. Previously investigators had to choose between using a method intended for small or sparse data while assuming independence between observations or a method that allowed for correlation between observations, while requiring large samples to be reliable. We present a new method that allows for small, clustered samples to be assessed for a relationship between a two-level predictor (eg. treatment/control) and a categorical outcome (eg. low/medium/high)

    A CONJOINT ANALYSIS STUDY OF PREFERENCES AND PURCHASING BEHAVIOR OF POTENTIAL ADOPTERS OF THE BUREAU OF LAND MANAGEMENT WILD HORSES

    Get PDF
    This study uses conjoint analysis to examine the preferences of buyers for Bureau of Land Management (BLM) wild horses based on physical attributes of wild horses and individual characteristics of the buyers. Generalized ordered logit models and multinomial logit models are used to study the impact of the buyers’ demographic characteristics such as age, gender, knowledge about wild horse care, and number of wild horses previously adopted on physical attributes of the horses such as color, age, height, training status, temperament, conformation, and unique markings. Using a choice experiment, taken together, these attributes determine buyer’s preferences for a wild horse. This study reveals that characteristics of buyers have significant effects on their preferences for wild horses. Their gender, age, knowledge about wild horse care, and the number of horses previously adopted influence the importance that buyers place on physical attributes of a wild horse in their decision to purchase a wild horse

    Severity Analysis of Crashes Using Structural Equation Modeling

    Get PDF
    Population growth, increased travel demand and, consequently, increased motor vehicle use has led to concerns about road safety in today’s society. In transportation engineering, road safety levels are measured through frequency and severity of motor vehicle crashes. Crash data has been used in road safety modeling to analyze factors that may reduce crash frequency and severity. Regarding crash severity analysis, modeling techniques have mainly attempted to incorporate road and traffic factors into a statistical model, building a direct relationship between independent and dependent (crash severity) variables. However, some explanatory variables can affect crash severity indirectly through one or more mediating variable. Moreover, while traditional techniques have only included measured variables, there might also be unobserved factors not included in the observed data affecting crash severity. Therefore, this thesis is aimed at investigating both observed and unobserved factors that influence the severity of crashes, directly and indirectly, using a statistical technique known as structural equation modeling (SEM). Two types of crashes that affect road safety in urban and rural areas were investigated in this thesis: red-light running related (RLR) crashes and wildlife-vehicle crashes (WVC), respectively. An SEM model was developed for each crash type. In effect, three unobserved variables were hypothesized for RLR crashes: pre-crash travel speed (TS) of the bullet vehicle (at fault), the kinetic energy (KEs) applied from the bullet vehicle to the subject vehicle(s), and crash severity. Similarly, three latent variables were introduced for WVCs: driver’s speeding attitude (SA), driver’s visibility impairment (VI), and crash severity. The results show that crash data supports the main hypothesis, with measured/latent variables adequately predicting crash severity. Regarding the RLR data, results show that both TS and KEs positively influence the overall crash severity, and that TS increase could positively affect KEs. Regarding the WVC data, the results showed that both SA and VI positively influenced overall crash severity, and that higher VI would negatively affect SA, which would indirectly decrease crash severity. Overall, these findings could help transportation practitioners to prioritize strategies and countermeasures aimed at reducing crash severity outcomes at urban and rural road sites

    Improving Traffic Safety And Drivers\u27 Behavior In Reduced Visibility Conditions

    Get PDF
    This study is concerned with the safety risk of reduced visibility on roadways. Inclement weather events such as fog/smoke (FS), heavy rain (HR), high winds, etc, do affect every road by impacting pavement conditions, vehicle performance, visibility distance, and drivers’ behavior. Moreover, they affect travel demand, traffic safety, and traffic flow characteristics. Visibility in particular is critical to the task of driving and reduction in visibility due FS or other weather events such as HR is a major factor that affects safety and proper traffic operation. A real-time measurement of visibility and understanding drivers’ responses, when the visibility falls below certain acceptable level, may be helpful in reducing the chances of visibility-related crashes. In this regard, one way to improve safety under reduced visibility conditions (i.e., reduce the risk of visibility related crashes) is to improve drivers’ behavior under such adverse weather conditions. Therefore, one of objectives of this research was to investigate the factors affecting drivers’ stated behavior in adverse visibility conditions, and examine whether drivers rely on and follow advisory or warning messages displayed on portable changeable message signs (CMS) and/or variable speed limit (VSL) signs in different visibility, traffic conditions, and on two types of roadways; freeways and two-lane roads. The data used for the analyses were obtained from a self-reported questionnaire survey carried out among 566 drivers in Central Florida, USA. Several categorical data analysis techniques such as conditional distribution, odds’ ratio, and Chi-Square tests were applied. In addition, two modeling approaches; bivariate and multivariate probit models were estimated. The results revealed that gender, age, road type, visibility condition, and familiarity with VSL signs were the significant factors affecting the likelihood of reducing speed following CMS/VSL instructions in reduced visibility conditions. Other objectives of this survey study were to determine the content of messages that iv would achieve the best perceived safety and drivers’ compliance and to examine the best way to improve safety during these adverse visibility conditions. The results indicated that Caution-fog ahead-reduce speed was the best message and using CMS and VSL signs together was the best way to improve safety during such inclement weather situations. In addition, this research aimed to thoroughly examine drivers’ responses under low visibility conditions and quantify the impacts and values of various factors found to be related to drivers’ compliance and drivers’ satisfaction with VSL and CMS instructions in different visibility and traffic conditions. To achieve these goals, Explanatory Factor Analysis (EFA) and Structural Equation Modeling (SEM) approaches were adopted. The results revealed that drivers’ satisfaction with VSL/CMS was the most significant factor that positively affected drivers’ compliance with advice or warning messages displayed on VSL/CMS signs under different fog conditions followed by driver factors. Moreover, it was found that roadway type affected drivers’ compliance to VSL instructions under medium and heavy fog conditions. Furthermore, drivers’ familiarity with VSL signs and driver factors were the significant factors affecting drivers’ satisfaction with VSL/CMS advice under reduced visibility conditions. Based on the findings of the survey-based study, several recommendations are suggested as guidelines to improve drivers’ behavior in such reduced visibility conditions by enhancing drivers’ compliance with VSL/CMS instructions. Underground loop detectors (LDs) are the most common freeway traffic surveillance technologies used for various intelligent transportation system (ITS) applications such as travel time estimation and crash detection. Recently, the emphasis in freeway management has been shifting towards using LDs data to develop real-time crash-risk assessment models. Numerous v studies have established statistical links between freeway crash risk and traffic flow characteristics. However, there is a lack of good understanding of the relationship between traffic flow variables (i.e. speed, volume and occupancy) and crashes that occur under reduced visibility (VR crashes). Thus, another objective of this research was to explore the occurrence of reduced visibility related (VR) crashes on freeways using real-time traffic surveillance data collected from loop detectors (LDs) and radar sensors. In addition, it examines the difference between VR crashes to those occurring at clear visibility conditions (CV crashes). To achieve these objectives, Random Forests (RF) and matched case-control logistic regression model were estimated. The results indicated that traffic flow variables leading to VR crashes are slightly different from those variables leading to CV crashes. It was found that, higher occupancy observed about half a mile between the nearest upstream and downstream stations increases the risk for both VR and CV crashes. Moreover, an increase of the average speed observed on the same half a mile increases the probability of VR crash. On the other hand, high speed variation coupled with lower average speed observed on the same half a mile increase the likelihood of CV crashes. Moreover, two issues that have not explicitly been addressed in prior studies are; (1) the possibility of predicting VR crashes using traffic data collected from the Automatic Vehicle Identification (AVI) sensors installed on Expressways and (2) which traffic data is advantageous for predicting VR crashes; LDs or AVIs. Thus, this research attempts to examine the relationships between VR crash risk and real-time traffic data collected from LDs installed on two Freeways in Central Florida (I-4 and I-95) and from AVI sensors installed on two vi Expressways (SR 408 and SR 417). Also, it investigates which data is better for predicting VR crashes. The approach adopted here involves developing Bayesian matched case-control logistic regression using the historical VR crashes, LDs and AVI data. Regarding models estimated based on LDs data, the average speed observed at the nearest downstream station along with the coefficient of variation in speed observed at the nearest upstream station, all at 5-10 minute prior to the crash time, were found to have significant effect on VR crash risk. However, for the model developed based on AVI data, the coefficient of variation in speed observed at the crash segment, at 5-10 minute prior to the crash time, affected the likelihood of VR crash occurrence. Argument concerning which traffic data (LDs or AVI) is better for predicting VR crashes is also provided and discussed

    Transitional modeling of experimental longitudinal data with missing values

    Get PDF
    Longitudinal categorical data are often collected using an experimental design where the interest is in the differential development of the treatment group compared to the control group. Such differential development is often assessed based on average growth curves but can also be based on transitions. For longitudinal multinomial data we describe a transitional methodology for the statistical analysis based on a distance model. Such a distance approach has two advantages compared to a multinomial regression model: (1) sparse data can be handled more efficiently; (2) a graphical representation of the model can be made to enhance interpretation. Within this approach it is possible to jointly model the observations and missing values by adding a new category to the response variable representing the missingness condition. This approach is investigated in a Monte Carlo simulation study. The results show this is a promising way to deal with missing data, although the mechanism is not yet completely understood in all cases. Finally, an empirical example is presented where the advantages of the modeling procedure are highlighted.Multivariate analysis of psychological dat

    Multiple imputation of large scale complex surveys

    Get PDF

    Financial distress and bankruptcy prediction using accounting, market and macroeconomic variables

    Get PDF
    This thesis investigates the information content of different types of variables in the field of financial distress/default prediction. Specifically, the thesis tests empirically, for the first time, the utility of combining accounting data, market-based variables and macroeconomic indicators to explain corporate credit risk. Models for listed companies in the United Kingdom are developed for the prediction of financial distress and corporate failure. The models used a combination of accounting data, stock market information, proxies for changes in the macroeconomic environment, and industry controls. Furthermore, novel finance-based and technical definitions of firm distress and failure are introduced as outcome variables. The thesis produced binary and polytomous models with enhanced predictive accuracy, practical value, and macro dependent dynamics that have relevance for stress testing. The results unambiguously show the advantages, in terms of predictive accuracy and timeliness, of combining these types of variables. Unlike previous research works that employed discrete choice, non-linear regression methodologies, this thesis provided new evidence on the effects of the different types of variables on the probability of falling into each of the individual outcomes (e.g., financial distress, corporate failure). The analysis of graphic representations of changes in predicted probabilities, a primer in the field of risk modelling, offered new insights with regard to the behaviour of the vectors of predicted probabilities following a given change in the magnitude of a specific covariate. Additionally, and in line with the main area of study, the thesis provides historical evidence on the types of variables and the information sharing mechanisms employed by American and British investors and financial institutions to assess the riskiness of individuals, businesses and fixed-income instruments before the emergence of modern institutions such as the credit rating agencies and prior to the development of complex statistical models, filling thus a crucial gap in the credit risk literature
    • …
    corecore