5 research outputs found

    Easily simulated multivariate binary distributions with given positive and negative correlations

    No full text
    We consider the problem of defining a multivariate distribution of binary variables, with given first two moments, from which values can be easily simulated. Oman and Zucker [Oman, S.D., Zucker, D.M., 2001. Modelling and generating correlated binary variables. Biometrika 88, 287-290] have done this when the correlation matrix of the binary variables is the Schur product of a parametric correlation matrix appropriate for normal variables (intraclass, moving average or autoregressive), having non-negative entries, with a matrix whose entries comprise the Fréchet upper bounds on the pairwise correlations of the binary variables. We extend their method to include negative correlations; moreover, we extend the range of positive correlations allowed in the moving-average case. We present algorithms for simulation of data from these distributions, and examine the ranges of correlations obtained.

    L-classifier chains classification and variable selection for multi-label datasets

    Get PDF
    Thesis (MCom)--Stellenbosch University, 2016.ENGLISH SUMMARY : Multi-label classification extends binary and multi-class classification to scenarios where every data case is assigned several labels simultaneously. Applications include labelling images with tags, identifying instruments that are playing in a musical piece and classifying text according to two or more labels. Variable selection is an important part of multi-label data analysis, but it has received little attention in the literature. Multi-label variable selection is more complex than for binary classification, mainly due to the presence of more than one response as well as label dependence. In this thesis, a multi-label classification approach called L-classifier chains (LCC) is proposed. This method implements a compromise between simple classifier chains and the ensemble of classifier chains procedures. The LCC approach uses an ensemble of classifier chains with a semi-random chain structure and random forests as base learners to perform variable selection. The specific structural assumptions of the LCC method allow for variable importance inference based on the output from the random forests. The results from LCC include multi-label predictions and a matrix of variable importance values. This thesis illustrates the application of the LCC clasifier by conducting empirical work using multi-label benchmark datasets, simulated datasets and a practical dataset obtained from a South African credit bureau. Throughout the practical applications, it compares the performance of LCC relative to three other classifiers, namely binary relevance, classifier chains and ensemble of classifier chains.AFRIKAANSE OPSOMMING : Geen opsomming beskikbaar
    corecore