2,712 research outputs found
Recommended from our members
Augmenting Naive Bayes Classifiers with Statistical Language Models
We augment naive Bayes models with statistical n-gram language models to address short- comings of the standard naive Bayes text classifier. The result is a generalized naive Bayes classifier which allows for a local Markov dependence among observations; a model we re- fer to as the Chain Augmented Naive Bayes (CAN) Bayes classifier. CAN models have two advantages over standard naive Bayes classifiers. First, they relax some of the indepen- dence assumptions of naive Bayes—allowing a local Markov chain dependence in the observed variables—while still permitting efficient inference and learning. Second, they permit straight- forward application of sophisticated smoothing techniques from statistical language modeling, which allows one to obtain better parameter estimates than the standard Laplace smoothing used in naive Bayes classification. In this paper, we introduce CAN models and apply them to various text classification problems. To demonstrate the language independent and task independent nature of these classifiers, we present experimental results on several text clas- sification problems—authorship attribution, text genre classification, and topic detection—in several languages—Greek, English, Japanese and Chinese. We then systematically study the key factors in the CAN model that can influence the classification performance, and analyze the strengths and weaknesses of the model
Energy Disaggregation for Real-Time Building Flexibility Detection
Energy is a limited resource which has to be managed wisely, taking into
account both supply-demand matching and capacity constraints in the
distribution grid. One aspect of the smart energy management at the building
level is given by the problem of real-time detection of flexible demand
available. In this paper we propose the use of energy disaggregation techniques
to perform this task. Firstly, we investigate the use of existing
classification methods to perform energy disaggregation. A comparison is
performed between four classifiers, namely Naive Bayes, k-Nearest Neighbors,
Support Vector Machine and AdaBoost. Secondly, we propose the use of Restricted
Boltzmann Machine to automatically perform feature extraction. The extracted
features are then used as inputs to the four classifiers and consequently shown
to improve their accuracy. The efficiency of our approach is demonstrated on a
real database consisting of detailed appliance-level measurements with high
temporal resolution, which has been used for energy disaggregation in previous
studies, namely the REDD. The results show robustness and good generalization
capabilities to newly presented buildings with at least 96% accuracy.Comment: To appear in IEEE PES General Meeting, 2016, Boston, US
Asymptotic Analysis of Generative Semi-Supervised Learning
Semisupervised learning has emerged as a popular framework for improving
modeling accuracy while controlling labeling cost. Based on an extension of
stochastic composite likelihood we quantify the asymptotic accuracy of
generative semi-supervised learning. In doing so, we complement
distribution-free analysis by providing an alternative framework to measure the
value associated with different labeling policies and resolve the fundamental
question of how much data to label and in what manner. We demonstrate our
approach with both simulation studies and real world experiments using naive
Bayes for text classification and MRFs and CRFs for structured prediction in
NLP.Comment: 12 pages, 9 figure
- …