6 research outputs found
Supervised Classification Using Copula and Mixture Copula
Statistical classification is a field of study that has developed significantly after 1960\u27s. This research has a vast area of applications. For example, pattern recognition has been proposed for automatic character recognition, medical diagnostic and most recently in data mining. Classical discrimination rule assumes normality. However in many situations, this assumption is often questionable. In fact for some data, the pattern vector is a mixture of discrete and continuous random variables. In this dissertation, we use copula densities to model class conditional distributions. Such types of densities are useful when the marginal densities of a pattern vector are not normally distributed. This type of models are also useful for a mixed discrete and continuous feature types. Finite mixture density models are very flexible in building classifier and clustering, and for uncovering hidden structures in the data. We use finite mixture Gaussian copula and copula of the Archimedean family based mixture densities to build classifier. The complexities of the estimation are presented. Under such mixture models, maximum likelihood estimation methods are not suitable and regular expectation maximization algorithm may not converge, and if it does, not efficiently. We propose a new estimation method to evaluate such densities and build the classifier based on finite mixture of copula densities. We develop simulations scenarios to compare the performance of the copula based classifier with classical normal distribution based models, the logistic regression based model and the Independent model. We also apply the techniques to real data, and present the misclassification errors
Supervised Classification Using Finite Mixture Copula
Use of copula for statistical classification is recent and gaining popularity. For example, statistical classification using copula has been proposed for automatic character recognition, medical diagnostic and most recently in data mining. Classical discrimination rules assume normality. But in this data age time, this assumption is often questionable. In fact features of data could be a mixture of discrete and continues random variables. In this paper, mixture copula densities are used to model class conditional distributions. Such types of densities are useful when the marginal densities of the vector of features are not normally distributed and are of a mixed kind of variables. Authors have shown that such mixture models are very useful for uncovering hidden structures in the data, and used them for clustering in data mining. Under such mixture models, maximum likelihood estimation methods are not suitable and regular expectation maximization algorithm is inefficient and may not converge. A new estimation method is proposed to estimate such densities and build the classifier based on mixture finite Gaussian densities. Simulations are used to compare the performance of the copula based classifier with classical normal distribution based models, logistic regression based model and independent model cases. The method is also applied to a real data
A Probabilistic Approach to Identifying Run Scoring Advantage in the Order of Playing Cricket
In the game of cricket, the result of coin toss is assumed to be one of the
determinants of match outcome. The decision to bat first after winning the toss
is often taken to make the best use of superior pitch conditions and set a big
target for the opponent. However, the opponent may fail to show their natural
batting performance in the second innings due to a number of factors, including
deteriorated pitch conditions and excessive pressure of chasing a high target
score. The advantage of batting first has been highlighted in the literature
and expert opinions, however, the effect of batting and bowling order on match
outcome has not been investigated well enough to recommend a solution to any
potential bias. This study proposes a probability theory-based model to study
venue-specific scoring and chasing characteristics of teams under different
match outcomes. A total of 1117 one-day international matches held in ten
popular venues are analyzed to show substantially high scoring advantage and
likelihood when the winning team bat in the first innings. Results suggest that
the same 'bat-first' winning team is very unlikely to score or chase such a
high score if they were to bat in the second innings. Therefore, the coin toss
decision may favor one team over the other. A Bayesian model is proposed to
revise the target score for each venue such that the winning and scoring
likelihood is equal regardless of the toss decision. The data and source codes
have been shared publicly for future research in creating competitive match
outcomes by eliminating the advantage of batting order in run scoring
A Bivariate Distribution with Conditional Gamma and its Multivariate Form
A bivariate distribution whose marginal are gamma and beta prime distribution is introduced. The distribution is derived and the generation of such bivariate sample is shown. Extension of the results are given in the multivariate case under a joint independent component analysis method. Simulated applications are given and they show consistency of our approach. Estimation procedures for the bivariate case are provided