22 research outputs found

    Learning from Ambiguity

    Get PDF
    There are many learning problems for which the examples given by the teacher are ambiguously labeled. In this thesis, we will examine one framework of learning from ambiguous examples known as Multiple-Instance learning. Each example is a bag, consisting of any number of instances. A bag is labeled negative if all instances in it are negative. A bag is labeled positive if at least one instance in it is positive. Because the instances themselves are not labeled, each positive bag is an ambiguous example. We would like to learn a concept which will correctly classify unseen bags. We have developed a measure called Diverse Density and algorithms for learning from multiple-instance examples. We have applied these techniques to problems in drug design, stock prediction, and image database retrieval. These serve as examples of how to translate the ambiguity in the application domain into bags, as well as successful examples of applying Diverse Density techniques

    Multivariate Correlation Analysis for Supervised Feature Selection in High-Dimensional Data

    Get PDF
    The main theme of this dissertation focuses on multivariate correlation analysis on different data types and we identify and define various research gaps in the same. For the defined research gaps we develop novel techniques that address relevance of features to the target and redundancy of features amidst themselves. Our techniques aim at handling homogeneous data, i.e., only continuous or categorical features, mixed data, i.e., continuous and categorical features, and time serie

    SVM-Based Negative Data Mining to Binary Classification

    Get PDF
    The properties of training data set such as size, distribution and the number of attributes significantly contribute to the generalization error of a learning machine. A not well-distributed data set is prone to lead to a partial overfitting model. Two approaches proposed in this dissertation for the binary classification enhance useful data information by mining negative data. First, an error driven compensating hypothesis approach is based on Support Vector Machines (SVMs) with (1+k)-iteration learning, where the base learning hypothesis is iteratively compensated k times. This approach produces a new hypothesis on the new data set in which each label is a transformation of the label from the negative data set, further producing the positive and negative child data subsets in subsequent iterations. This procedure refines the base hypothesis by the k child hypotheses created in k iterations. A prediction method is also proposed to trace the relationship between negative subsets and testing data set by a vector similarity technique. Second, a statistical negative example learning approach based on theoretical analysis improves the performance of the base learning algorithm learner by creating one or two additional hypotheses audit and booster to mine the negative examples output from the learner. The learner employs a regular Support Vector Machine to classify main examples and recognize which examples are negative. The audit works on the negative training data created by learner to predict whether an instance is negative. However, the boosting learning booster is applied when audit does not have enough accuracy to judge learner correctly. Booster works on training data subsets with which learner and audit do not agree. The classifier for testing is the combination of learner, audit and booster. The classifier for testing a specific instance returns the learner\u27s result if audit acknowledges learner\u27s result or learner agrees with audit\u27s judgment, otherwise returns the booster\u27s result. The error of the classifier is decreased to O(e^2) comparing to the error O(e) of a base learning algorithm

    Learning visual concepts for image classification

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (leaves 166-174).by Aparna Lakshmi Ratan.Ph.D
    corecore