2,171 research outputs found
Bayesian Inference with Posterior Regularization and applications to Infinite Latent SVMs
Existing Bayesian models, especially nonparametric Bayesian methods, rely on
specially conceived priors to incorporate domain knowledge for discovering
improved latent representations. While priors can affect posterior
distributions through Bayes' rule, imposing posterior regularization is
arguably more direct and in some cases more natural and general. In this paper,
we present regularized Bayesian inference (RegBayes), a novel computational
framework that performs posterior inference with a regularization term on the
desired post-data posterior distribution under an information theoretical
formulation. RegBayes is more flexible than the procedure that elicits expert
knowledge via priors, and it covers both directed Bayesian networks and
undirected Markov networks whose Bayesian formulation results in hybrid chain
graph models. When the regularization is induced from a linear operator on the
posterior distributions, such as the expectation operator, we present a general
convex-analysis theorem to characterize the solution of RegBayes. Furthermore,
we present two concrete examples of RegBayes, infinite latent support vector
machines (iLSVM) and multi-task infinite latent support vector machines
(MT-iLSVM), which explore the large-margin idea in combination with a
nonparametric Bayesian model for discovering predictive latent features for
classification and multi-task learning, respectively. We present efficient
inference methods and report empirical studies on several benchmark datasets,
which appear to demonstrate the merits inherited from both large-margin
learning and Bayesian nonparametrics. Such results were not available until
now, and contribute to push forward the interface between these two important
subfields, which have been largely treated as isolated in the community.Comment: 49 pages, 11 figure
Fraud/Uncollectible Debt Detection Using a Bayesian Network Based Learning System: A Rare Binary Outcome with Mixed Data Structures
The fraud/uncollectible debt problem in the telecommunications industry
presents two technical challenges: the detection and the treatment of the
account given the detection. In this paper, we focus on the first problem of
detection using Bayesian network models, and we briefly discuss the application
of a normative expert system for the treatment at the end. We apply Bayesian
network models to the problem of fraud/uncollectible debt detection for
telecommunication services. In addition to being quite successful at predicting
rare event outcomes, it is able to handle a mixture of categorical and
continuous data. We present a performance comparison using linear and
non-linear discriminant analysis, classification and regression trees, and
Bayesian network modelsComment: Appears in Proceedings of the Eleventh Conference on Uncertainty in
Artificial Intelligence (UAI1995
On data analysis and variable selection: the minimum entropy analysis
In this work, we present a minimum entropy analysis scheme for variable
selection and preliminary data analysis. The variable selection can be achieved
by the increasing preference of variables. We show such a preference to has a
unqiue form, which is given by the entropy of models associated with variables.
Evaluating the entropy provides a complete ranking scheme of variables. This
scheme not only indicates preferred variables but also may reveal the system's
nature and properties. We illustrate the proposed scheme to analyze a set of
geological data for three carbonate rock units in Texas and Oklahoma, and
compare to the discriminant function analysis. The result suggests this scheme
to provide a quick and robust analysis, and the use in data analysis is
promising.Comment: 9 pages and 2 table
Estimating Information-Theoretic Quantities with Uncertainty Forests
Information-theoretic quantities, such as conditional entropy and mutual
information, are critical data summaries for quantifying uncertainty. Existing
estimators for these quantities either have strong theoretical guarantees or
effective performance in high-dimensional data, but not both. We propose a
decision forest method, Uncertainty Forests (UF), which combines quantile
regression forests, honest sampling, and a finite sample correction. We prove
UF provides consistent estimates for these information-theoretic quantities,
including in multivariate settings. Empirically, UF reduces finite sample bias
and variance in a range of both low- and high-dimensional simulated settings
for estimating posterior probabilities, conditional entropies, and mutual
information. In a real-world connectome application, UF quantifies the
uncertainty about neuron type given various cellular features in the Drosophila
larva mushroom body, a key challenge for modern neuroscience
An eigenanalysis of data centering in machine learning
Many pattern recognition methods rely on statistical information from
centered data, with the eigenanalysis of an empirical central moment, such as
the covariance matrix in principal component analysis (PCA), as well as partial
least squares regression, canonical-correlation analysis and Fisher
discriminant analysis. Recently, many researchers advocate working on
non-centered data. This is the case for instance with the singular value
decomposition approach, with the (kernel) entropy component analysis, with the
information-theoretic learning framework, and even with nonnegative matrix
factorization. Moreover, one can also consider a non-centered PCA by using the
second-order non-central moment.
The main purpose of this paper is to bridge the gap between these two
viewpoints in designing machine learning methods. To provide a study at the
cornerstone of kernel-based machines, we conduct an eigenanalysis of the inner
product matrices from centered and non-centered data. We derive several results
connecting their eigenvalues and their eigenvectors. Furthermore, we explore
the outer product matrices, by providing several results connecting the largest
eigenvectors of the covariance matrix and its non-centered counterpart. These
results lay the groundwork to several extensions beyond conventional centering,
with the weighted mean shift, the rank-one update, and the multidimensional
scaling. Experiments conducted on simulated and real data illustrate the
relevance of this work.Comment: 14 page
The Importance of Being Clustered: Uncluttering the Trends of Statistics from 1970 to 2015
In this paper we retrace the recent history of statistics by analyzing all
the papers published in five prestigious statistical journals since 1970,
namely: Annals of Statistics, Biometrika, Journal of the American Statistical
Association, Journal of the Royal Statistical Society, series B and Statistical
Science. The aim is to construct a kind of "taxonomy" of the statistical papers
by organizing and by clustering them in main themes. In this sense being
identified in a cluster means being important enough to be uncluttered in the
vast and interconnected world of the statistical research. Since the main
statistical research topics naturally born, evolve or die during time, we will
also develop a dynamic clustering strategy, where a group in a time period is
allowed to migrate or to merge into different groups in the following one.
Results show that statistics is a very dynamic and evolving science, stimulated
by the rise of new research questions and types of data
Nonparametric discriminant HMM and application to facial expression recognition
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009, p. 2090-2096This paper presents a nonparametric discriminant HMM and applies it to facial expression recognition. In the proposed HMM, we introduce an effective nonparametric output probability estimation method to increase the discrimination ability at both hidden state level and class level. The proposed method uses a nonparametric adaptive kernel to utilize information from all classes and improve the discrimination at class level. The discrimination between hidden states is increased by defining membership coefficients which associate each reference vector with hidden states. The adaption of such coefficients is obtained by the Expectation Maximization (EM) method. Furthermore, we present a general formula for the estimation of output probability, which provides a way to develop new HMMs. Finally, we evaluate the performance of the proposed method on the CMU expression database and compare it with other nonparametric HMMs. © 2009 IEEE.published_or_final_versio
Stable Estimation of a Covariance Matrix Guided by Nuclear Norm Penalties
Estimation of covariance matrices or their inverses plays a central role in
many statistical methods. For these methods to work reliably, estimated
matrices must not only be invertible but also well-conditioned. In this paper
we present an intuitive prior that shrinks the classic sample covariance
estimator towards a stable target. We prove that our estimator is consistent
and asymptotically efficient. Thus, it gracefully transitions towards the
sample covariance matrix as the number of samples grows relative to the number
of covariates. We also demonstrate the utility of our estimator in two standard
situations -- discriminant analysis and EM clustering -- when the number of
samples is dominated by or comparable to the number of covariates.Comment: 25 pages, 3 figure
Large-Scale Mode Identification and Data-Driven Sciences
Bump-hunting or mode identification is a fundamental problem that arises in
almost every scientific field of data-driven discovery. Surprisingly, very few
data modeling tools are available for automatic (not requiring manual
case-by-base investigation), objective (not subjective), and nonparametric (not
based on restrictive parametric model assumptions) mode discovery, which can
scale to large data sets. This article introduces LPMode--an algorithm based on
a new theory for detecting multimodality of a probability density. We apply
LPMode to answer important research questions arising in various fields from
environmental science, ecology, econometrics, analytical chemistry to astronomy
and cancer genomics.Comment: I would like to express my sincere thanks to the Editor and the
anonymous reviewers for their in-depth comments, which have greatly improved
the manuscrip
IMMIGRATE: A Margin-based Feature Selection Method with Interaction Terms
Relief based algorithms have often been claimed to uncover feature
interactions. However, it is still unclear whether and how interaction terms
will be differentiated from marginal effects. In this paper, we propose
IMMIGRATE algorithm by including and training weights for interaction terms.
Besides applying the large margin principle, we focus on the robustness of the
contributors of margin and consider local and global information
simultaneously. Moreover, IMMIGRATE has been shown to enjoy attractive
properties, such as robustness and combination with Boosting. We evaluate our
proposed method on several tasks, which achieves state-of-the-art results
significantly.Comment: R package ('Immigrate') available on CRA
- …