Search CORE

15 research outputs found

Recommended from our members

Knowledge transfer techniques for dynamic environments

Author: Rajan Suju
Publication venue
Publication date: 01/01/2006
Field of study

The expense involved in obtaining class labels for data has led to the emergence of semi-supervised learning techniques which try to make use of both the labeled and the unlabeled data to obtain classifiers with better generalization capabilities. Most existing semi-supervised methods assume that the unlabeled data have the same underlying distribution as the training data. However, data acquired for actual problems often suffer from population drift over time or space, and consequently classifiers learned from existing labeled data tend to become obsolete over time or extended geographic areas. In this dissertation, semi-supervised techniques are considered for updating existing classifiers, while allowing for the possibility of population drift in the incoming data. The proposed techniques make use of meta-information that is not explicitly provided by the data to aid in semi-supervision. First, a framework that exploits the contextual information in an existing hierarchical binary classifier is presented to rapidly construct a new classifier for a new but related classification problem. The knowledge transfer technique is augmented with active learning to efficiently update the classifier using far fewer data points than simple semi-supervised methods. The proposed technique is shown to be well-suited for adapting classifiers, even when there is a significant difference between the labeled and unlabeled data. The knowledge transfer approach detailed in this thesis assumes the existence of a pre-defined hierarchy of classes. However, it is possible that several different class hierarchies are defined or obtained for the same domain. A maximum likelihood framework is proposed for integrating available hierarchies into a single ‘master hierarchy’. The taxonomy integration method is shown to result in more natural mappings between existing taxonomies compared to alternative approaches that do not exploit the class hierarchy information. A technique that automatically generates n-ary class hierarchies is also presented. The n-ary trees are shown to better reflect the inter-class relationships and are in general more effective for knowledge transfer than binary trees. Focusing on the domain of hyperspectral data, the efficacy of the new techniques is evaluated for the problem of classifying spatially/temporally varying hyperspectral images. The empirical results clearly demonstrate the utility of exploiting ‘contextual’ information for the problem of knowledge transfer in dynamic environments.Electrical and Computer Engineerin

Texas ScholarWorks

Automatic construction of n-ary tree based taxonomies

Author: Joydeep Ghosh
Kunal Punera
Suju Rajan
Publication venue
Publication date: 01/01/2006
Field of study

kunal,suju,ghosh @ ece.utexas.edu Hierarchies are an intuitive and effective organization paradigm for data. Of late there has been considerable research on automatically learning hierarchical organizations of data. In this paper, we explore the problem of learning n-ary tree based hierarchies of categories with no user-defined parameters. We propose a framework that characterizes a “good ” taxonomy and also provide an algorithm to find it. This algorithm works completely automatically (with no user input) and is significantly less greedy than existing algorithms in literature. We evaluate our approach on multiple real life datasets from diverse domains, such as text mining, hyper-spectral analysis, written character recognition etc. Our experimental results show that not only are n-ary trees based taxonomies more “natural”, but also the output space decompositions induced by these taxonomies for many datasets yield better classification accuracies as opposed to classification on binary tree based taxonomies.

CiteSeerX

Crossref

Automatically Learning Document Taxonomies for Hierarchical Classification

Author: Joydeep Ghosh
Kunal Punera
Suju Rajan
Publication venue
Publication date: 01/01/2005
Field of study

While several hierarchical classification methods have been applied to web content, such techniques invariably rely on a pre-defined taxonomy of documents. We propose a new technique that extracts a suitable hierarchical structure automatically from a corpus of labeled documents. We show that our technique groups similar classes closer together in the tree and discovers relationships among documents that are not encoded in the class labels. The learned taxonomy is then used along with binary SVMs for multi-class classification. We demonstrate the efficacy of our approach by testing it on the 20-Newsgroup dataset

CiteSeerX

Crossref

A large-scale active learning system for topical categorization on the web

Author: Adwait Ratnaparkhi
Dragomir Yankov
Scott J. Gaffney
Suju Rajan
Publication venue
Publication date: 01/01/2010
Field of study

Many web applications such as ad matching systems, vertical search engines, and page categorization systems require the identification of a particular type or class of pages on the Web. The sheer number and diversity of the pages on the Web, however, makes the problem of obtaining a good sample of the class of interest hard. In this paper, we describe a successfully deployed end-to-end system that starts from a biased training sample and makes use of several stateof-the-art machine learning algorithms working in tandem, including a powerful active learning component, in order to achieve a good classification system. The system is evaluated on traffic from a real-world ad-matching platform and is shown to achieve high categorization effectiveness with a significant reduction in editorial effort and labeling time

CiteSeerX

Crossref