12,490 research outputs found
Altitude Training: Strong Bounds for Single-Layer Dropout
Dropout training, originally designed for deep neural networks, has been
successful on high-dimensional single-layer natural language tasks. This paper
proposes a theoretical explanation for this phenomenon: we show that, under a
generative Poisson topic model with long documents, dropout training improves
the exponent in the generalization bound for empirical risk minimization.
Dropout achieves this gain much like a marathon runner who practices at
altitude: once a classifier learns to perform reasonably well on training
examples that have been artificially corrupted by dropout, it will do very well
on the uncorrupted test set. We also show that, under similar conditions,
dropout preserves the Bayes decision boundary and should therefore induce
minimal bias in high dimensions.Comment: Advances in Neural Information Processing Systems (NIPS), 201
Consistency in Models for Distributed Learning under Communication Constraints
Motivated by sensor networks and other distributed settings, several models
for distributed learning are presented. The models differ from classical works
in statistical pattern recognition by allocating observations of an independent
and identically distributed (i.i.d.) sampling process amongst members of a
network of simple learning agents. The agents are limited in their ability to
communicate to a central fusion center and thus, the amount of information
available for use in classification or regression is constrained. For several
basic communication models in both the binary classification and regression
frameworks, we question the existence of agent decision rules and fusion rules
that result in a universally consistent ensemble. The answers to this question
present new issues to consider with regard to universal consistency. Insofar as
these models present a useful picture of distributed scenarios, this paper
addresses the issue of whether or not the guarantees provided by Stone's
Theorem in centralized environments hold in distributed settings.Comment: To appear in the IEEE Transactions on Information Theor
Feature Augmentation via Nonparametrics and Selection (FANS) in High Dimensional Classification
We propose a high dimensional classification method that involves
nonparametric feature augmentation. Knowing that marginal density ratios are
the most powerful univariate classifiers, we use the ratio estimates to
transform the original feature measurements. Subsequently, penalized logistic
regression is invoked, taking as input the newly transformed or augmented
features. This procedure trains models equipped with local complexity and
global simplicity, thereby avoiding the curse of dimensionality while creating
a flexible nonlinear decision boundary. The resulting method is called Feature
Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by
generalizing the Naive Bayes model, writing the log ratio of joint densities as
a linear combination of those of marginal densities. It is related to
generalized additive models, but has better interpretability and computability.
Risk bounds are developed for FANS. In numerical analysis, FANS is compared
with competing methods, so as to provide a guideline on its best application
domain. Real data analysis demonstrates that FANS performs very competitively
on benchmark email spam and gene expression data sets. Moreover, FANS is
implemented by an extremely fast algorithm through parallel computing.Comment: 30 pages, 2 figure
Naive Feature Selection: Sparsity in Naive Bayes
Due to its linear complexity, naive Bayes classification remains an
attractive supervised learning method, especially in very large-scale settings.
We propose a sparse version of naive Bayes, which can be used for feature
selection. This leads to a combinatorial maximum-likelihood problem, for which
we provide an exact solution in the case of binary data, or a bound in the
multinomial case. We prove that our bound becomes tight as the marginal
contribution of additional features decreases. Both binary and multinomial
sparse models are solvable in time almost linear in problem size, representing
a very small extra relative cost compared to the classical naive Bayes.
Numerical experiments on text data show that the naive Bayes feature selection
method is as statistically effective as state-of-the-art feature selection
methods such as recursive feature elimination, -penalized logistic
regression and LASSO, while being orders of magnitude faster. For a large data
set, having more than with million training points and about million
features, and with a non-optimized CPU implementation, our sparse naive Bayes
model can be trained in less than 15 seconds
Predicting regression test failures using genetic algorithm-selected dynamic performance analysis metrics
A novel framework for predicting regression test failures is proposed. The basic principle embodied in the framework is to use performance analysis tools to capture the runtime behaviour of a program as it executes each test in a regression suite. The performance information is then used to build a dynamically predictive model of test outcomes. Our framework is evaluated using a genetic algorithm for dynamic metric selection in combination with state-of-the-art machine learning classifiers. We show that if a program is modified and some tests subsequently fail, then it is possible to predict with considerable accuracy which of the remaining tests will also fail which can be used to help prioritise tests in time constrained testing environments
- …