The objective of this thesis is to develop statistical models for multivariate road accident data. Two directions of research are followed: graphical modelling for contingency tables cross-classified by accident characteristics, and hierarchical Bayesian models for multiple accident frequencies of different types modelled jointly.
Multi-dimensional tables are analysed and it is shown how to use collapsibility to reduce the dimensionality of the analysis without the problems of Simpson's paradox. It is revealed that accident severity and the number of casualties are associated, and that these variables are mainly influenced by the number of vehicles and speed limit. Graphical chain models allow causal hypotheses to be formulated and it is shown how they are valuable tools for
empirical research about road accident characteristics.
The hierarchical Bayesian models developed combine generalized linear models with random effects. The novelty of these models consists in the joint modelling of multiple response variables. The models account for overdispersion
and they are used for accident prediction and for ranking hazardous sites.
All models are fully Bayesian and are fitted using Markov Chain Monte Carlo methods. It is shown that multiple response variables models are superior to separate univariate response models.
Some theoretical problems are examined regarding the maximum likelihood estimation process for the two parameters negative binomial distribution. A condition is given that is equivalent with unique maximum likelihood estimators.
The two directions of research are connected by using graphs to describe the models. In addition, a new Bayesian model selection procedure for contingency tables is proposed. This is based on Gibbs sampling and avoids problems associated with asymptotic tests.
The conclusions revealed here can help practitioners to design better safety policies and to spend money more wisely on sites that really are dangerous