2 research outputs found
Fast Multi-Class Probabilistic Classifier by Sparse Non-parametric Density Estimation
The model interpretation is essential in many application scenarios and to
build a classification model with a ease of model interpretation may provide
useful information for further studies and improvement. It is common to
encounter with a lengthy set of variables in modern data analysis, especially
when data are collected in some automatic ways. This kinds of datasets may not
collected with a specific analysis target and usually contains redundant
features, which have no contribution to a the current analysis task of
interest. Variable selection is a common way to increase the ability of model
interpretation and is popularly used with some parametric classification
models. There is a lack of studies about variable selection in nonparametric
classification models such as the density estimation-based methods and this is
especially the case for multiple-class classification situations. In this study
we study multiple-class classification problems using the thought of sparse
non-parametric density estimation and propose a method for identifying high
impacts variables for each class. We present the asymptotic properties and the
computation procedure for the proposed method together with some suggested
sample size. We also repost the numerical results using both synthesized and
some real data sets
Some New Copula Based Distribution-free Tests of Independence among Several Random Variables
Over the last couple of decades, several copula based methods have been
proposed in the literature to test for the independence among several random
variables. But these existing tests are not invariant under monotone
transformations of the variables, and they often perform poorly if the
dependence among the variables is highly non-monotone in nature. In this
article, we propose a copula based measure of dependency and use it to
construct some new distribution-free tests of independence. The proposed
measure and the resulting tests, all are invariant under permutations and
monotone transformations of the variables. Our dependency measure involves a
kernel function, and we use the Gaussian kernel for that purpose. We adopt a
multi-scale approach, where we look at the results obtained for several choices
of the bandwidth parameter associated with the Gaussian kernel and aggregate
them judiciously. Large sample properties of the dependency measure and the
resulting tests are derived under appropriate regularity conditions. Several
simulated and real data sets are analyzed to compare the performance of the
proposed tests with some popular tests available in the literature.Comment: arXiv admin note: text overlap with arXiv:1708.0748