Search CORE

38 research outputs found

Regret analysis for performance metrics in multi-label classification: the case of Hamming and subset zero-one loss

Author: B. Taskar
D. McAllester
D.J.C. MacKay
G. Tsoumakas
G. Tsoumakas
G. Tsoumakas
I.H. Witten
J.C. Platt
L. Breiman
M. Boutell
R. Caruana
R. Nelsen
W. Cheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Crossref

Ghent University Academic Bibliography

Large Scale Personality Classification of Bloggers

Author: B. Reeves
F. Mairesse
I.H. Witten
J. Oberlander
J.C. Platt
J.W. Pennebaker
J.W. Pennebaker
M. Eid
M.A. Hall
M.F. Porter
M.R. Mehl
N.S. Schutte
R.E. Fan
S. Herring
T. Yarkoni
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Crossref

Edinburgh Research Explorer

Surrey Research Insight

Promoter prediction using physico-chemical properties of DNA

Author: A. Gabrielian
A. Kanhere
A.G. Pedersen
A.V. Sivolob
C.H. Choi
H. Wang
I.H. Witten
J. Platt
J.W. Fickett
K. Breslauer
K. Florquin
L. Tsai
M. Hassan el
R.D. Blake
S. Lisser
S.C. Satchwell
S.S. Keerthi
T. Ota
U. Ohler
V.B. Bajic
V.I. Ivanov
Y. Fukue
Y. Fukue
Y. Suzuki
Y. Suzuki
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

The ability to locate promoters within a section of DNA is known to be a very difficult and very important task in DNA analysis. We document an approach that incorporates the concept of DNA as a complex molecule using several models of its physico-chemical properties. A support vector machine is trained to recognise promoters by their distinctive physical and chemical properties. We demonstrate that by combining models, we can improve upon the classification accuracy obtained with a single model. We also show that by examining how the predictive accuracy of these properties varies over the promoter, we can reduce the number of attributes needed. Finally, we apply this method to a real-world problem. The results demonstrate that such an approach has significant merit in its own right. Furthermore, they suggest better results from a planned combined approach to promoter prediction using both physicochemical and sequence based techniques

Crossref

University of Tasmania Open Access Repository

Multi-level Boundary Classification for Information Extraction

Author: I.H. Witten
J.C. Platt
N. Littlestone
R. Quinlan
Publication venue
Publication date: 01/01/2004
Field of study

We investigate the application of classification techniques to the problem of information extraction (IE). In particular we use support vector machines and several different feature-sets to build a set of classifiers for IE. We show that this approach is competitive with current state-of-the-art IE algorithms based on specialized learning algorithms. We also introduce a new technique for improving the recall of our IE algorithm. This approach uses a two-level ensemble of classifiers to improve the recall of the extracted fragments while maintaining high precision. We show that this approach outperforms current state-of-the-art IE algorithms on several benchmark IE tasks

CiteSeerX

Crossref

A Comparison of Text-Categorization Methods Applied to N-Gram Frequency Statistics

Author: F. Sebastiani
G. Forman
I.H. Witten
J. Platt
T. Mitchell
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Abstract. This paper gives an analysis of multi-class e-mail categoriza-tion performance, comparing a character n-gram document representa-tion against a word-frequency based representation. Furthermore the im-pact of using available e-mail specific meta-information on classification performance is explored and the findings are presented.

CiteSeerX

Crossref

Learning Named Entity Classifiers Using Support Vector Machines

Author: G. Petasis
I.H. Witten
J. Platt
S.N. Galicia-Haro
V. Vapnik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Crossref

Discriminative vs. generative classifiers for cost sensitive learning

Author: A.Y. Ng
D.E. Rumelhart
I.H. Witten
J. Platt
J.R. Quinlan
T.S. Jaakkola
V. Vapnik
Publication venue
Publication date: 01/01/2006
Field of study

This paper experimentally compares the performance of discriminative and generative classifiers for cost sensitive learning. There is some evidence that learning a discriminative classifier is more effective for a traditional classification task. This paper explores the advantages, and disadvantages, of using a generative classifier when the misclassification costs, and class frequencies, are not fixed. The paper details experiments built around commonly used algorithms modified to be cost sensitive. This allows a clear comparison to the same algorithm used to produce a discriminative classifier. The paper compares the performance of these different variants over multiple data sets and for the full range of misclassification costs and class frequencies. It concludes that although some of these variants are better than a single discriminative classifier, the right choice of training set distribution plus careful calibration are needed to make them competitive with multiple discriminative classifiers.

CiteSeerX

NRC Publications Archive

Crossref

Ensemble Learning with Biased Classifiers: The Triskel Algorithm

Author: I.H. Witten
J. Fürnkranz
J.C. Platt
L. Breiman
L.K. Hansen
N.V. Chawla
R. Akbani
R.E. Shapire
Y. Freund
Y. Freund
Y.L. Murphey
Publication venue
Publication date: 01/01/2005
Field of study

We propose a novel ensemble learning algorithm called Triskel, which has two interesting features. First, Triskel learns an ensemble of classifiers that are biased to have high precision (as opposed to, for example, boosting, where the ensemble members are biased to ignore portions of the instance space). Second, Triskel uses weighted voting like most ensemble methods, but the weights are assigned so that certain pairs of biased classifiers outweigh the rest of the ensemble, if their predictions agree. Our experiments on a variety of real-world tasks demonstrate that Triskel often outperforms boosting, in terms of both accuracy and training time

CiteSeerX

Crossref

The Landscape Contest at ICPR 2010

Author: C.A. Coello
D.E. Goldberg
D.W. Aha
E. Frank
I.H. Witten
J. Platt
J.R. Quinlan
K. Deb
L. Breiman
T.G. Dietterich
T.K. Ho
V. Vapnik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Crossref

The Nearest Centroid Based on Vector Norms: A New Classification Algorithm for a New Document Representation Model

Author: A. Khan
A. Mountassir
A.F. Smeaton
G. Salton
G. Salton
H. Bhavsar
I.H. Witten
J. Platt
M. Rushdi-Saleh
M.F. Porter
T.G. Dietterich
Y. Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref