6 research outputs found

    Using online linear classifiers to filter spam Emails

    Get PDF
    The performance of two online linear classifiers - the Perceptron and Littlestoneā€™s Winnow ā€“ is explored for two anti-spam filtering benchmark corpora - PU1 and Ling-Spam. We study the performance for varying numbers of features, along with three different feature selection methods: Information Gain (IG), Document Frequency (DF) and Odds Ratio. The size of the training set and the number of training iterations are also investigated for both classifiers. The experimental results show that both the Perceptron and Winnow perform much better when using IG or DF than using Odds Ratio. It is further demonstrated that when using IG or DF, the classifiers are insensitive to the number of features and the number of training iterations, and not greatly sensitive to the size of training set. Winnow is shown to slightly outperform the Perceptron. It is also demonstrated that both of these online classifiers perform much better than a standard NaĆÆve Bayes method. The theoretical and implementation computational complexity of these two classifiers are very low, and they are very easily adaptively updated. They outperform most of the published results, while being significantly easier to train and adapt. The analysis and promising experimental results indicate that the Perceptron and Winnow are two very competitive classifiers for anti-spam filtering

    Book reports

    Get PDF

    Numerical analysis of least squares and perceptron learning for classification problems

    Get PDF
    This work presents study on regularized and non-regularized versions of perceptron learning and least squares algorithms for classification problems. The Fr\ue9chet derivatives for least squares and perceptron algorithms are derived. Different Tikhonovā€™s regularization techniques for choosing the regularization parameter are discussed. Numerical experiments demonstrate performance of perceptron and least squares algorithms to classify simulated and experimental data sets

    GEOMETRIC APPROACHES TO DISCRIMINATIVE TRAINING

    Get PDF
    Discriminative training as a general machine learning approach has wide applications in tasks like Natural Language Processing (NLP) and Automatic Speech Recognition (ASR). In this thesis, we are interested in online methods for discriminative training due to their simplicity, efficiency and scalabililty. The novel methods we propose are summarized as follows. First, an interesting subclass of online learning algorithms adopts multiplicative instead of additive strategies to update parameters of linear models, but none of them can be directly used for structured prediction as required by many NLP tasks. We extend the multiplicative Winnow algorithm to a structured version, and the additive MIRA algorithm to a multiplicative version, and apply the them to NLP tasks. We also give interpretations to the relationship between EG and prod, two multiplicative algorithms, from an information geometric perspective. Secondly, although general online learning frameworks, notably the Online Mirror Descent (OMD), exist and subsume many specific algorithms, they are not suitable for deriving multiplicative algorithms. We therefore propose a new general framework named Generalized Multiplicative Update (GMU) that is multiplicative in nature and easily derives many specific multiplicative algorithms. We then propose a subclass of GMU, named the q-Exponentiated Gradient (qEG) method, that elucidates the relationship among several of the algorithms. To better understand the difference between OMD and GMU, we give further analysis of these algorithms from a Riemannian geometric perspective. We also extend OMD and GMU to accelerated versions by adding momentum terms. Thirdly, although natural gradient descent (NGD) is often hard to be applied in practice due its computational difficulty, we propose a novel approach for CRF training which allows efficient application of NGD. The loss functions, defined by Bregman divergence, generalizes the log-likelihood objective and can be easily coupled with NGD for optimization. The proposed framework is flexible, allowing us to choose proper convex functions that leads to better training performance. Finally, traditional vector space linear models require estimating as many parameters as the number of model features. In the presence of millions of features, a common phenomenon in many NLP tasks, this may complicate the training procedure especially when labeled training data is scarce. We propose a novel online learning approach by shifting from vector space to tensor space, which dramatically reduces the number of parameters to be estimated. The resulting model is highly regularized and is particularly suitable for training in low-resource environments

    Regularized Winnow methods

    No full text
    In theory, the Winnow multiplicative update has certain advantages over the Perceptron additive update when there are many irrelevant attributes. Recently, there has been much effort on enhancing the Perceptron algorithm by using regularization, leading to a class of linear classification methods called support vector machines. Similarly, it is also possible to apply the regularization idea to the Winnow algorithm, which gives methods we call regularized Winnows. We show that the resulting methods compare with the basic Winnows in a similar way that a support vector machine compares with the Perceptron. We investigate algorithmic issues and learning properties of the derived methods. Some experimental results will also be provided to illustrate different methods.
    corecore