5,956 research outputs found
A regularized attribute weighting framework for naive bayes
The Bayesian classification framework has been widely used in many fields, but the covariance matrix is usually difficult to estimate reliably. To alleviate the problem, many naive Bayes (NB) approaches with good performance have been developed. However, the assumption of conditional independence between attributes in NB rarely holds in reality. Various attribute-weighting schemes have been developed to address this problem. Among them, class-specific attribute weighted naive Bayes (CAWNB) has recently achieved good performance by using classification feedback to optimize the attribute weights of each class. However, the derived model may be over-fitted to the training dataset, especially when the dataset is insufficient to train a model with good generalization performance. This paper proposes a regularization technique to improve the generalization capability of CAWNB, which could well balance the trade-off between discrimination power and generalization capability. More specifically, by introducing the regularization term, the proposed method, namely regularized naive Bayes (RNB), could well capture the data characteristics when the dataset is large, and exhibit good generalization performance when the dataset is small. RNB is compared with the state-of-the-art naive Bayes methods. Experiments on 33 machine-learning benchmark datasets demonstrate that RNB outperforms the compared methods significantly
Altitude Training: Strong Bounds for Single-Layer Dropout
Dropout training, originally designed for deep neural networks, has been
successful on high-dimensional single-layer natural language tasks. This paper
proposes a theoretical explanation for this phenomenon: we show that, under a
generative Poisson topic model with long documents, dropout training improves
the exponent in the generalization bound for empirical risk minimization.
Dropout achieves this gain much like a marathon runner who practices at
altitude: once a classifier learns to perform reasonably well on training
examples that have been artificially corrupted by dropout, it will do very well
on the uncorrupted test set. We also show that, under similar conditions,
dropout preserves the Bayes decision boundary and should therefore induce
minimal bias in high dimensions.Comment: Advances in Neural Information Processing Systems (NIPS), 201
Generative and Discriminative Text Classification with Recurrent Neural Networks
We empirically characterize the performance of discriminative and generative
LSTM models for text classification. We find that although RNN-based generative
models are more powerful than their bag-of-words ancestors (e.g., they account
for conditional dependencies across words in a document), they have higher
asymptotic error rates than discriminatively trained RNN models. However we
also find that generative models approach their asymptotic error rate more
rapidly than their discriminative counterparts---the same pattern that Ng &
Jordan (2001) proved holds for linear classification models that make more
naive conditional independence assumptions. Building on this finding, we
hypothesize that RNN-based generative classification models will be more robust
to shifts in the data distribution. This hypothesis is confirmed in a series of
experiments in zero-shot and continual learning settings that show that
generative models substantially outperform discriminative models
Exchangeable Variable Models
A sequence of random variables is exchangeable if its joint distribution is
invariant under variable permutations. We introduce exchangeable variable
models (EVMs) as a novel class of probabilistic models whose basic building
blocks are partially exchangeable sequences, a generalization of exchangeable
sequences. We prove that a family of tractable EVMs is optimal under zero-one
loss for a large class of functions, including parity and threshold functions,
and strictly subsumes existing tractable independence-based model families.
Extensive experiments show that EVMs outperform state of the art classifiers
such as SVMs and probabilistic models which are solely based on independence
assumptions.Comment: ICML 201
- …