5,626 research outputs found

    The Mysterious Optimality of Naive Bayes: Estimation of the Probability in the System of "Classifiers"

    Full text link
    Bayes Classifiers are widely used currently for recognition, identification and knowledge discovery. The fields of application are, for example, image processing, medicine, chemistry (QSAR). But by mysterious way the Naive Bayes Classifier usually gives a very nice and good presentation of a recognition. It can not be improved considerably by more complex models of Bayes Classifier. We demonstrate here a very nice and simple proof of the Naive Bayes Classifier optimality, that can explain this interesting fact.The derivation in the current paper is based on arXiv:cs/0202020v1Comment: 9 pages,1 figure, all changes in the second version is made by Kupervasser onl

    A Survey of Na\"ive Bayes Machine Learning approach in Text Document Classification

    Full text link
    Text Document classification aims in associating one or more predefined categories based on the likelihood suggested by the training set of labeled documents. Many machine learning algorithms play a vital role in training the system with predefined categories among which Na\"ive Bayes has some intriguing facts that it is simple, easy to implement and draws better accuracy in large datasets in spite of the na\"ive dependence. The importance of Na\"ive Bayes Machine learning approach has felt hence the study has been taken up for text document classification and the statistical event models available. This survey the various feature selection methods has been discussed and compared along with the metrics related to text document classification.Comment: Pages IEEE format, International Journal of Computer Science and Information Security, IJCSIS, Vol. 7 No. 2, February 2010, USA. ISSN 1947 5500, http://sites.google.com/site/ijcsis

    Risk-Sensitive Variational Bayes: Formulations and Bounds

    Full text link
    We study data-driven decision-making problems in a parametrized Bayesian framework. We adopt a risk-sensitive approach to modeling the interplay between statistical estimation of parameters and optimization, by computing a risk measure over a loss/disutility function with respect to the posterior distribution over the parameters. While this forms the standard Bayesian decision-theoretic approach, we focus on problems where calculating the posterior distribution is intractable, a typical situation in modern applications with %high-dimensional parameter space large datasets, heterogeneity due to observed covariates and latent group structure. The key methodological innovation we introduce in this paper is to leverage a dual representation of the risk measure to introduce an optimization-based framework for approximately computing the posterior risk-sensitive objective, as opposed to using standard sampling based methods such as Markov Chain Monte Carlo. Our analytical contributions include rigorously proving finite sample bounds on the `optimality gap' of optimizers obtained using the computational methods in this paper, from the `true' optimizers of a given decision-making problem. We illustrate our results by comparing the theoretical bounds with simulations of a newsvendor problem on two methods extracted from our computational framework

    Using genotype abundance to improve phylogenetic inference

    Full text link
    Modern biological techniques enable very dense genetic sampling of unfolding evolutionary histories, and thus frequently sample some genotypes multiple times. This motivates strategies to incorporate genotype abundance information in phylogenetic inference. In this paper, we synthesize a stochastic process model with standard sequence-based phylogenetic optimality, and show that tree estimation is substantially improved by doing so. Our method is validated with extensive simulations and an experimental single-cell lineage tracing study of germinal center B cell receptor affinity maturation

    A geometric characterisation of sensitivity analysis in monomial models

    Full text link
    Sensitivity analysis in probabilistic discrete graphical models is usually conducted by varying one probability value at a time and observing how this affects output probabilities of interest. When one probability is varied then others are proportionally covaried to respect the sum-to-one condition of probability laws. The choice of proportional covariation is justified by a variety of optimality conditions, under which the original and the varied distributions are as close as possible under different measures of closeness. For variations of more than one parameter at a time proportional covariation is justified in some special cases only. In this work, for the large class of discrete statistical models entertaining a regular monomial parametrisation, we demonstrate the optimality of newly defined proportional multi-way schemes with respect to an optimality criterion based on the notion of I-divergence. We demonstrate that there are varying parameters choices for which proportional covariation is not optimal and identify the sub-family of model distributions where the distance between the original distribution and the one where probabilities are covaried proportionally is minimum. This is shown by adopting a new formal, geometric characterization of sensitivity analysis in monomial models, which include a wide array of probabilistic graphical models. We also demonstrate the optimality of proportional covariation for multi-way analyses in Naive Bayes classifiers

    Naive Bayes and Text Classification I - Introduction and Theory

    Full text link
    Naive Bayes classifiers, a family of classifiers that are based on the popular Bayes' probability theorem, are known for creating simple yet well performing models, especially in the fields of document classification and disease prediction. In this article, we will look at the main concepts of naive Bayes classification in the context of document categorization.Comment: 20 pages, 5 figure

    Asymptotically optimal nonparametric empirical Bayes via predictive recursion

    Full text link
    An empirical Bayes problem has an unknown prior to be estimated from data. The predictive recursion (PR) algorithm provides fast nonparametric estimation of mixing distributions and is ideally suited for empirical Bayes applications. This paper presents a general notion of empirical Bayes asymptotic optimality, and it is shown that PR-based procedures satisfy this property under certain conditions. As an application, the problem of in-season prediction of baseball batting averages is considered. There the PR-based empirical Bayes rule performs well in terms of prediction error and ability to capture the distribution of the latent features.Comment: 15 pages, 1 figure, 1 table; accepted for publication in Communications in Statistics-Theory and Method

    Kullback-Leibler Principal Component for Tensors is not NP-hard

    Full text link
    We study the problem of nonnegative rank-one approximation of a nonnegative tensor, and show that the globally optimal solution that minimizes the generalized Kullback-Leibler divergence can be efficiently obtained, i.e., it is not NP-hard. This result works for arbitrary nonnegative tensors with an arbitrary number of modes (including two, i.e., matrices). We derive a closed-form expression for the KL principal component, which is easy to compute and has an intuitive probabilistic interpretation. For generalized KL approximation with higher ranks, the problem is for the first time shown to be equivalent to multinomial latent variable modeling, and an iterative algorithm is derived that resembles the expectation-maximization algorithm. On the Iris dataset, we showcase how the derived results help us learn the model in an \emph{unsupervised} manner, and obtain strikingly close performance to that from supervised methods.Comment: Asilomar 201

    Compound decision in the presence of proxies with an application to spatio-temporal data

    Full text link
    We study the problem of incorporating covariates in a compound decision setup. It is desired to estimate the means of nn response variables, which are independent and normally distributed, and each is accompanied by a vector of covariates. We suggest a method that involves non-parametric empirical Bayes techniques and may be viewed as a generalization of the celebrated Fay-Herriot (1979) method. Some optimality properties of our method are proved. We also compare it numerically with Fay-Herriot and other methods, using a `semi-real' data set that involves spatio-temporal covariates, where the goal is to estimate certain proportions in many small areas (Statistical-Areas

    Optimal properties of centroid-based classifiers for very high-dimensional data

    Full text link
    We show that scale-adjusted versions of the centroid-based classifier enjoys optimal properties when used to discriminate between two very high-dimensional populations where the principal differences are in location. The scale adjustment removes the tendency of scale differences to confound differences in means. Certain other distance-based methods, for example, those founded on nearest-neighbor distance, do not have optimal performance in the sense that we propose. Our results permit varying degrees of sparsity and signal strength to be treated, and require only mild conditions on dependence of vector components. Additionally, we permit the marginal distributions of vector components to vary extensively. In addition to providing theory we explore numerical properties of a centroid-based classifier, and show that these features reflect theoretical accounts of performance.Comment: Published in at http://dx.doi.org/10.1214/09-AOS736 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore