5,626 research outputs found
The Mysterious Optimality of Naive Bayes: Estimation of the Probability in the System of "Classifiers"
Bayes Classifiers are widely used currently for recognition, identification
and knowledge discovery. The fields of application are, for example, image
processing, medicine, chemistry (QSAR). But by mysterious way the Naive Bayes
Classifier usually gives a very nice and good presentation of a recognition. It
can not be improved considerably by more complex models of Bayes Classifier. We
demonstrate here a very nice and simple proof of the Naive Bayes Classifier
optimality, that can explain this interesting fact.The derivation in the
current paper is based on arXiv:cs/0202020v1Comment: 9 pages,1 figure, all changes in the second version is made by
Kupervasser onl
A Survey of Na\"ive Bayes Machine Learning approach in Text Document Classification
Text Document classification aims in associating one or more predefined
categories based on the likelihood suggested by the training set of labeled
documents. Many machine learning algorithms play a vital role in training the
system with predefined categories among which Na\"ive Bayes has some intriguing
facts that it is simple, easy to implement and draws better accuracy in large
datasets in spite of the na\"ive dependence. The importance of Na\"ive Bayes
Machine learning approach has felt hence the study has been taken up for text
document classification and the statistical event models available. This survey
the various feature selection methods has been discussed and compared along
with the metrics related to text document classification.Comment: Pages IEEE format, International Journal of Computer Science and
Information Security, IJCSIS, Vol. 7 No. 2, February 2010, USA. ISSN 1947
5500, http://sites.google.com/site/ijcsis
Risk-Sensitive Variational Bayes: Formulations and Bounds
We study data-driven decision-making problems in a parametrized Bayesian
framework. We adopt a risk-sensitive approach to modeling the interplay between
statistical estimation of parameters and optimization, by computing a risk
measure over a loss/disutility function with respect to the posterior
distribution over the parameters. While this forms the standard Bayesian
decision-theoretic approach, we focus on problems where calculating the
posterior distribution is intractable, a typical situation in modern
applications with %high-dimensional parameter space large datasets,
heterogeneity due to observed covariates and latent group structure. The key
methodological innovation we introduce in this paper is to leverage a dual
representation of the risk measure to introduce an optimization-based framework
for approximately computing the posterior risk-sensitive objective, as opposed
to using standard sampling based methods such as Markov Chain Monte Carlo. Our
analytical contributions include rigorously proving finite sample bounds on the
`optimality gap' of optimizers obtained using the computational methods in this
paper, from the `true' optimizers of a given decision-making problem. We
illustrate our results by comparing the theoretical bounds with simulations of
a newsvendor problem on two methods extracted from our computational framework
Using genotype abundance to improve phylogenetic inference
Modern biological techniques enable very dense genetic sampling of unfolding
evolutionary histories, and thus frequently sample some genotypes multiple
times. This motivates strategies to incorporate genotype abundance information
in phylogenetic inference. In this paper, we synthesize a stochastic process
model with standard sequence-based phylogenetic optimality, and show that tree
estimation is substantially improved by doing so. Our method is validated with
extensive simulations and an experimental single-cell lineage tracing study of
germinal center B cell receptor affinity maturation
A geometric characterisation of sensitivity analysis in monomial models
Sensitivity analysis in probabilistic discrete graphical models is usually
conducted by varying one probability value at a time and observing how this
affects output probabilities of interest. When one probability is varied then
others are proportionally covaried to respect the sum-to-one condition of
probability laws. The choice of proportional covariation is justified by a
variety of optimality conditions, under which the original and the varied
distributions are as close as possible under different measures of closeness.
For variations of more than one parameter at a time proportional covariation is
justified in some special cases only. In this work, for the large class of
discrete statistical models entertaining a regular monomial parametrisation, we
demonstrate the optimality of newly defined proportional multi-way schemes with
respect to an optimality criterion based on the notion of I-divergence. We
demonstrate that there are varying parameters choices for which proportional
covariation is not optimal and identify the sub-family of model distributions
where the distance between the original distribution and the one where
probabilities are covaried proportionally is minimum. This is shown by adopting
a new formal, geometric characterization of sensitivity analysis in monomial
models, which include a wide array of probabilistic graphical models. We also
demonstrate the optimality of proportional covariation for multi-way analyses
in Naive Bayes classifiers
Naive Bayes and Text Classification I - Introduction and Theory
Naive Bayes classifiers, a family of classifiers that are based on the
popular Bayes' probability theorem, are known for creating simple yet well
performing models, especially in the fields of document classification and
disease prediction. In this article, we will look at the main concepts of naive
Bayes classification in the context of document categorization.Comment: 20 pages, 5 figure
Asymptotically optimal nonparametric empirical Bayes via predictive recursion
An empirical Bayes problem has an unknown prior to be estimated from data.
The predictive recursion (PR) algorithm provides fast nonparametric estimation
of mixing distributions and is ideally suited for empirical Bayes applications.
This paper presents a general notion of empirical Bayes asymptotic optimality,
and it is shown that PR-based procedures satisfy this property under certain
conditions. As an application, the problem of in-season prediction of baseball
batting averages is considered. There the PR-based empirical Bayes rule
performs well in terms of prediction error and ability to capture the
distribution of the latent features.Comment: 15 pages, 1 figure, 1 table; accepted for publication in
Communications in Statistics-Theory and Method
Kullback-Leibler Principal Component for Tensors is not NP-hard
We study the problem of nonnegative rank-one approximation of a nonnegative
tensor, and show that the globally optimal solution that minimizes the
generalized Kullback-Leibler divergence can be efficiently obtained, i.e., it
is not NP-hard. This result works for arbitrary nonnegative tensors with an
arbitrary number of modes (including two, i.e., matrices). We derive a
closed-form expression for the KL principal component, which is easy to compute
and has an intuitive probabilistic interpretation. For generalized KL
approximation with higher ranks, the problem is for the first time shown to be
equivalent to multinomial latent variable modeling, and an iterative algorithm
is derived that resembles the expectation-maximization algorithm. On the Iris
dataset, we showcase how the derived results help us learn the model in an
\emph{unsupervised} manner, and obtain strikingly close performance to that
from supervised methods.Comment: Asilomar 201
Compound decision in the presence of proxies with an application to spatio-temporal data
We study the problem of incorporating covariates in a compound decision
setup. It is desired to estimate the means of response variables, which are
independent and normally distributed, and each is accompanied by a vector of
covariates. We suggest a method that involves non-parametric empirical Bayes
techniques and may be viewed as a generalization of the celebrated Fay-Herriot
(1979) method.
Some optimality properties of our method are proved. We also compare it
numerically with Fay-Herriot and other methods, using a `semi-real' data set
that involves spatio-temporal covariates, where the goal is to estimate certain
proportions in many small areas (Statistical-Areas
Optimal properties of centroid-based classifiers for very high-dimensional data
We show that scale-adjusted versions of the centroid-based classifier enjoys
optimal properties when used to discriminate between two very high-dimensional
populations where the principal differences are in location. The scale
adjustment removes the tendency of scale differences to confound differences in
means. Certain other distance-based methods, for example, those founded on
nearest-neighbor distance, do not have optimal performance in the sense that we
propose. Our results permit varying degrees of sparsity and signal strength to
be treated, and require only mild conditions on dependence of vector
components. Additionally, we permit the marginal distributions of vector
components to vary extensively. In addition to providing theory we explore
numerical properties of a centroid-based classifier, and show that these
features reflect theoretical accounts of performance.Comment: Published in at http://dx.doi.org/10.1214/09-AOS736 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …