38 research outputs found

    Promoter prediction using physico-chemical properties of DNA

    Get PDF
    The ability to locate promoters within a section of DNA is known to be a very difficult and very important task in DNA analysis. We document an approach that incorporates the concept of DNA as a complex molecule using several models of its physico-chemical properties. A support vector machine is trained to recognise promoters by their distinctive physical and chemical properties. We demonstrate that by combining models, we can improve upon the classification accuracy obtained with a single model. We also show that by examining how the predictive accuracy of these properties varies over the promoter, we can reduce the number of attributes needed. Finally, we apply this method to a real-world problem. The results demonstrate that such an approach has significant merit in its own right. Furthermore, they suggest better results from a planned combined approach to promoter prediction using both physicochemical and sequence based techniques

    Multi-level Boundary Classification for Information Extraction

    No full text
    We investigate the application of classification techniques to the problem of information extraction (IE). In particular we use support vector machines and several different feature-sets to build a set of classifiers for IE. We show that this approach is competitive with current state-of-the-art IE algorithms based on specialized learning algorithms. We also introduce a new technique for improving the recall of our IE algorithm. This approach uses a two-level ensemble of classifiers to improve the recall of the extracted fragments while maintaining high precision. We show that this approach outperforms current state-of-the-art IE algorithms on several benchmark IE tasks

    A Comparison of Text-Categorization Methods Applied to N-Gram Frequency Statistics

    No full text
    Abstract. This paper gives an analysis of multi-class e-mail categoriza-tion performance, comparing a character n-gram document representa-tion against a word-frequency based representation. Furthermore the im-pact of using available e-mail specific meta-information on classification performance is explored and the findings are presented.

    Learning Named Entity Classifiers Using Support Vector Machines

    No full text

    Discriminative vs. generative classifiers for cost sensitive learning

    No full text
    This paper experimentally compares the performance of discriminative and generative classifiers for cost sensitive learning. There is some evidence that learning a discriminative classifier is more effective for a traditional classification task. This paper explores the advantages, and disadvantages, of using a generative classifier when the misclassification costs, and class frequencies, are not fixed. The paper details experiments built around commonly used algorithms modified to be cost sensitive. This allows a clear comparison to the same algorithm used to produce a discriminative classifier. The paper compares the performance of these different variants over multiple data sets and for the full range of misclassification costs and class frequencies. It concludes that although some of these variants are better than a single discriminative classifier, the right choice of training set distribution plus careful calibration are needed to make them competitive with multiple discriminative classifiers.

    Ensemble Learning with Biased Classifiers: The Triskel Algorithm

    No full text
    We propose a novel ensemble learning algorithm called Triskel, which has two interesting features. First, Triskel learns an ensemble of classifiers that are biased to have high precision (as opposed to, for example, boosting, where the ensemble members are biased to ignore portions of the instance space). Second, Triskel uses weighted voting like most ensemble methods, but the weights are assigned so that certain pairs of biased classifiers outweigh the rest of the ensemble, if their predictions agree. Our experiments on a variety of real-world tasks demonstrate that Triskel often outperforms boosting, in terms of both accuracy and training time
    corecore