3,977 research outputs found

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    Phoneme and sentence-level ensembles for speech recognition

    Get PDF
    We address the question of whether and how boosting and bagging can be used for speech recognition. In order to do this, we compare two different boosting schemes, one at the phoneme level and one at the utterance level, with a phoneme-level bagging scheme. We control for many parameters and other choices, such as the state inference scheme used. In an unbiased experiment, we clearly show that the gain of boosting methods compared to a single hidden Markov model is in all cases only marginal, while bagging significantly outperforms all other methods. We thus conclude that bagging methods, which have so far been overlooked in favour of boosting, should be examined more closely as a potentially useful ensemble learning technique for speech recognition

    An empirical comparison of supervised machine learning techniques in bioinformatics

    Get PDF
    Research in bioinformatics is driven by the experimental data. Current biological databases are populated by vast amounts of experimental data. Machine learning has been widely applied to bioinformatics and has gained a lot of success in this research area. At present, with various learning algorithms available in the literature, researchers are facing difficulties in choosing the best method that can apply to their data. We performed an empirical study on 7 individual learning systems and 9 different combined methods on 4 different biological data sets, and provide some suggested issues to be considered when answering the following questions: (i) How does one choose which algorithm is best suitable for their data set? (ii) Are combined methods better than a single approach? (iii) How does one compare the effectiveness of a particular algorithm to the others

    Boosting Applied to Word Sense Disambiguation

    Get PDF
    In this paper Schapire and Singer's AdaBoost.MH boosting algorithm is applied to the Word Sense Disambiguation (WSD) problem. Initial experiments on a set of 15 selected polysemous words show that the boosting approach surpasses Naive Bayes and Exemplar-based approaches, which represent state-of-the-art accuracy on supervised WSD. In order to make boosting practical for a real learning domain of thousands of words, several ways of accelerating the algorithm by reducing the feature space are studied. The best variant, which we call LazyBoosting, is tested on the largest sense-tagged corpus available containing 192,800 examples of the 191 most frequent and ambiguous English words. Again, boosting compares favourably to the other benchmark algorithms.Comment: 12 page

    New trends in data mining.

    Get PDF
    Trends; Data; Data mining;

    Efficient Intrusion Detection Model Using Ensemble Methods

    Get PDF
    Ensemble method or any combination model train multiple learners to solve the classification or regression problems, not by simply ordinary learning approaches that can able to construct one learner from training data rather construct a set of learners and combine them. Boosting algorithm is one of the most important recent developments in the area of classification methodology. Boosting belongs to a family of algorithms that has the capability to convert a group of weak learners to strong learners. Boosting works in a sequential manner by adding a classification algorithm to the next updated weight of the training samples by doing the majority voting technique of the sequence of classifiers. The boosting method combines the weak models to produce a powerful one and reduces the bias of the combined model. AdaBoost algorithm is the most influential algorithm that efficiently combines the weak learners to generate a strong classifier that could be able to classify a training data with better accuracy. AdaBoost differs from the current existing boosting methods in detection accuracy, error cost minimization, computational time and detection rate. Detection accuracy and computational cost are the two main metrics used to analyze the performance of AdaBoost classification algorithm. From the simulation result, it is evident that AdaBoost algorithm could able to achieve high detection accuracy with less computational time, and minimum cost compared to a single classifier. We have proposed a predictive model to classify normal class and attack class and an online inference engine is being imposed, either to allow or deny access to a network

    Ensemble Approach for Fine-Grained Question Classification in Bengali

    Get PDF
    corecore