Search CORE

14 research outputs found

Recommended from our members

Detecting and removing noisy instances from concept descriptions

Author: Aha David W.
Kibler Dennis
Publication venue: eScholarship, University of California
Publication date: 12/12/1988
Field of study

Several published results show that instance-based learning algorithms record high classification accuracies and low storage requirements when applied to supervised learning tasks. However, these learning algorithms are highly sensitive to training set noise. This paper describes a simple extension of instance-based learning algorithms for detecting and removing noisy instances from concept descriptions. The extension requires evidence that saved instances be significantly good classifiers before it allows them to be used for subsequent classification tasks. We show that this extension's performance degrades more slowly in the presence of noise, improves classification accuracies, and further reduces storage requirements in several artificial and real-world databases

eScholarship - University of California

Recommended from our members

Incremental learning of independent, overlapping, and graded concept descriptions with an instance-based process framework

Author: Aha David W.
Publication venue: eScholarship, University of California
Publication date: 23/05/1989
Field of study

Supervised learning algorithms make several simplifying assumptions concerning the characteristics of the concept descriptions to be learned. For example, concepts are often assumed to be (1) defined with respect to the same set of relevant attributes, (2) disjoint in instance space, and (3) have uniform instance distributions. While these assumptions constrain the learning task, they unfortunately limit an algorithm's applicability. We believe that supervised learning algorithms should learn attribute relevancies independently for each concept, allow instances to be members of any subset of concepts, and represent graded concept descriptions. This paper introduces a process framework for instance-based learning algorithms that exploit only specific instance and performance feedback information to guide their concept learning processes. We also introduce Bloom, a specific instantiation of this framework. Bloom is a supervised, incremental, instance-based learning algorithm that learns relative attribute relevancies independently for each concept, allows instances to be members of any subset of concepts, and represents graded concept memberships. We describe empirical evidence to support our claims that Bloom can learn independent, overlapping, and graded concept descriptions

eScholarship - University of California

A theory of cross-validation error

Author: Turney Peter D.
Publication venue
Publication date: 01/01/1994
Field of study

This paper presents a theory of error in cross-validation testing of algorithms for predicting real-valued attributes. The theory justifies the claim that predicting real-valued attributes requires balancing the conflicting demands of simplicity and accuracy. Furthermore, the theory indicates precisely how these conflicting demands must be balanced, in order to minimize cross-validation error. A general theory is presented, then it is developed in detail for linear regression and instance-based learning

CiteSeerX

NRC Publications Archive

CogPrints Cognitive Sciences Eprint Archive

Markov Mean Properties for Cell Death-Related Protein Classification

Author: Dorado Julián
Fernández-Lozano Carlos
Gestal M.
González-Díaz Humberto
Munteanu Cristian-Robert
Pazos A.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

[Abstract] The cell death (CD) is a dynamic biological function involved in physiological and pathological processes. Due to the complexity of CD, there is a demand for fast theoretical methods that can help to find new CD molecular targets. The current work presents the first classification model to predict CD-related proteins based on Markov Mean Properties. These protein descriptors have been calculated with the MInD-Prot tool using the topological information of the amino acid contact networks of the 2423 protein chains, five atom physicochemical properties and the protein 3D regions. The Machine Learning algorithms from Weka were used to find the best classification model for CD-related protein chains using all 20 attributes. The most accurate algorithm to solve this problem was K*. After several feature subset methods, the best model found is based on only 11 variables and is characterized by the Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.992 and the true positive rate (TP Rate) of 88.2% (validation set). 7409 protein chains labeled with “unknown function” in the PDB Databank were analyzed with the best model in order to predict the CD-related biological activity. Thus, several proteins have been predicted to have CD-related function in Homo sapiens: 3DRX–involved in virus-host interaction biological process, protein homooligomerization; 4DWF–involved in cell differentiation, chromatin modification, DNA damage response, protein stabilization; 1IUR–involved in ATP binding, chaperone binding; 1J7D–involved in DNA double-strand break processing, histone ubiquitination, nucleotide-binding oligomerization; 1UTU–linked with DNA repair, regulation of transcription; 3EEC–participating to the cellular membrane organization, egress of virus within host cell, class mediator resulting in cell cycle arrest, negative regulation of ubiquitin-protein ligase activity involved in mitotic cell cycle and apoptotic process. Other proteins from bacteria predicted as CD-related are 2G3V - a CAG pathogenicity island protein 13 from Helicobacter pylori, 4G5A - a hypothetical protein in Bacteroides thetaiotaomicron, 1YLK–involved in the nitrogen metabolism of Mycobacterium tuberculosis, and 1XSV - with possible DNA/RNA binding domains. The results demonstrated the possibility to predict CD-related proteins using molecular information encoded into the protein 3D structure. Thus, the current work demonstrated the possibility to predict new molecular targets involved in cell-death processes.Xunta de Galicia; 10SIN105004PRInstituto de Salud Carlos III; PI13/0028

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

A weighted nearest neighbor algorithm for learning with symbolic features

Author: B.W. Mathews
C. Stanfill
D. Aha
D. Aha
D. Aha
D. Aha
D. Fisher
D. Medin
D. Rumelhart
D. Rumelhart
D. Rumelhart
D. Waltz
F. Cohen
F. Crick
F. Preparata
G. Towell
J. Garnier
J. McClelland
J. Shavlik
L. Holley
M. O'Neill
N. Qian
P. Chou
R. Lathrop
R. Mooney
R. Nosofsky
S. Fertig
S. Hanson
S. Reed
S. Salzberg
S. Salzberg
S. Salzberg
S. Weiss
T. Cover
T. Dietterich
T. Sejnowski
V. Lim
W. Kabsch
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A nearest hyperrectangle learning method

Author: A. Blumer
A. Bundy
B. Buchanan
B. Everitt
B. Porter
C. Thornton
D. Aha
D. Aha
D. Aha
D. Ashley
D. Fisher
D. Helmbold
D. Kibler
D. Medin
D. Medin
D. Osherson
E. Rissland
E. Smith
G. Kan
J. Kolodner
J.R. Quinlan
L. Breiman
L. Valiant
R. Bareiss
R. Barr
R. Michalski
R.A. Fisher
S. Crawford
S. Reed
S. Salzberg
S. Salzberg
S. Vere
S. Weiss
Steven Salzberg
T. Cover
T. Dietterich
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Recommended from our members

A study of instance-based algorithms for supervised learning tasks : mathematical, empirical, and psychological evaluations

Author: Aha David W.
Publication venue: eScholarship, University of California
Publication date: 27/11/1990
Field of study

This dissertation introduces a framework for specifying instance-based algorithms that can solve supervised learning tasks. These algorithms input a sequence of instances and yield a partial concept description, which is represented by a set of stored instances and associated information. This description can be used to predict values for subsequently presented instances. The thesis of this framework is that extensional concept descriptions and lazy generalization strategies can support efficient supervised learning behavior.The instance-based learning framework consists of three components. The pre-processor component transforms an instance into a more palatable form for the performance component, which computes the instance's similarity with a set of stored instances and yields a prediction for its target value(s). Therefore, the similarity and prediction functions impose generalizations on the stored instances to inductively derive predictions. The learning component assesses the accuracy of these prediction(s) and updates partial concept descriptions to improve their predictive accuracy.This framework is evaluated in four ways. First, its generality is evaluated by mathematically determining the classes of symbolic concepts and numeric functions that can be closely approximated by IB_1, a simple algorithm specified by this framework. Second, this framework is empirically evaluated for its ability to specify algorithms that improve IB_1's learning efficiency. Significant efficiency improvements are obtained by instance-based algorithms that reduce storage requirements, tolerate noisy data, and learn domain-specific similarity functions respectively. Alternative component definitions for these algorithms are empirically analyzed in a set of five high-level parameter studies. Third, this framework is evaluated for its ability to specify psychologically plausible process models for categorization tasks. Results from subject experiments indicate a positive correlation between a models' ability to utilize attribute correlation information and its ability to explain psychological phenomena. Finally, this framework is evaluated for its ability to explain and relate a dozen prominent instance-based learning systems. The survey shows that this framework requires only slight modifications to fit these highly diverse systems. Relationships with edited nearest neighbor algorithms, case-based reasoners, and artificial neural networks are also described

eScholarship - University of California

Efficacy of different protein descriptors in predicting protein functional families using support vector machine

Author: ONG AI KIANG SERENE
Publication venue
Publication date: 29/01/2008
Field of study

Master'sMASTER OF SCIENCE (PHARMACY

ScholarBank@NUS