9 research outputs found

    An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets

    Full text link

    Bootstrapping SVM active learning by incorporating unlabelled images for image retrieval

    Get PDF
    The performance of image retrieval with SVM active learning is known to be poor when started with few labelled images only. In this paper, the problem is solved by incorporating the unlabelled images into the bootstrapping of the learning process. In this work, the initial SVM classifier is trained with the few labelled images and the unlabelled images randomly selected from the image database. Both theoretical analysis and experimental results show that by incorporating unlabelled images in the bootstrapping, the efficiency of SVM active learning can be improved, and thus improves the overall retrieval performance

    A reduced labeled samples (RLS) framework for classification of imbalanced concept-drifting streaming data.

    Get PDF
    Stream processing frameworks are designed to process the streaming data that arrives in time. An example of such data is stream of emails that a user receives every day. Most of the real world data streams are also imbalanced as is in the stream of emails, which contains few spam emails compared to a lot of legitimate emails. The classification of the imbalanced data stream is challenging due to the several reasons: First of all, data streams are huge and they can not be stored in the memory for one time processing. Second, if the data is imbalanced, the accuracy of the majority class mostly dominates the results. Third, data streams are changing over time, and that causes degradation in the model performance. Hence the model should get updated when such changes are detected. Finally, the true labels of the all samples are not available immediately after classification, and only a fraction of the data is possible to get labeled in real world applications. That is because the labeling is expensive and time consuming. In this thesis, a framework for modeling the streaming data when the classes of the data samples are imbalanced is proposed. This framework is called Reduced Labeled Samples (RLS). RLS is a chunk based learning framework that builds a model using partially labeled data stream, when the characteristics of the data change. In RLS, a fraction of the samples are labeled and are used in modeling, and the performance is not significantly different from that of the 100% labeling. RLS maintains an ensemble of classifiers to boost the performance. RLS uses the information from labeled data in a supervised fashion, and also is extended to use the information from unlabeled data in a semi supervised fashion. RLS addresses both binary and multi class partially labeled data stream and the results show the basis of RLS is effective even in the context of multi class classification problems. Overall, the RLS is shown to be an effective framework for processing imbalanced and partially labeled data streams

    Learning on relevance feedback in content-based image retrieval.

    Get PDF
    Hoi, Chu-Hong.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves 89-103).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Content-based Image Retrieval --- p.1Chapter 1.2 --- Relevance Feedback --- p.3Chapter 1.3 --- Contributions --- p.4Chapter 1.4 --- Organization of This Work --- p.6Chapter 2 --- Background --- p.8Chapter 2.1 --- Relevance Feedback --- p.8Chapter 2.1.1 --- Heuristic Weighting Methods --- p.9Chapter 2.1.2 --- Optimization Formulations --- p.10Chapter 2.1.3 --- Various Machine Learning Techniques --- p.11Chapter 2.2 --- Support Vector Machines --- p.12Chapter 2.2.1 --- Setting of the Learning Problem --- p.12Chapter 2.2.2 --- Optimal Separating Hyperplane --- p.13Chapter 2.2.3 --- Soft-Margin Support Vector Machine --- p.15Chapter 2.2.4 --- One-Class Support Vector Machine --- p.16Chapter 3 --- Relevance Feedback with Biased SVM --- p.18Chapter 3.1 --- Introduction --- p.18Chapter 3.2 --- Biased Support Vector Machine --- p.19Chapter 3.3 --- Relevance Feedback Using Biased SVM --- p.22Chapter 3.3.1 --- Advantages of BSVM in Relevance Feedback --- p.22Chapter 3.3.2 --- Relevance Feedback Algorithm by BSVM --- p.23Chapter 3.4 --- Experiments --- p.24Chapter 3.4.1 --- Datasets --- p.24Chapter 3.4.2 --- Image Representation --- p.25Chapter 3.4.3 --- Experimental Results --- p.26Chapter 3.5 --- Discussions --- p.29Chapter 3.6 --- Summary --- p.30Chapter 4 --- Optimizing Learning with SVM Constraint --- p.31Chapter 4.1 --- Introduction --- p.31Chapter 4.2 --- Related Work and Motivation --- p.33Chapter 4.3 --- Optimizing Learning with SVM Constraint --- p.35Chapter 4.3.1 --- Problem Formulation and Notations --- p.35Chapter 4.3.2 --- Learning boundaries with SVM --- p.35Chapter 4.3.3 --- OPL for the Optimal Distance Function --- p.38Chapter 4.3.4 --- Overall Similarity Measure with OPL and SVM --- p.40Chapter 4.4 --- Experiments --- p.41Chapter 4.4.1 --- Datasets --- p.41Chapter 4.4.2 --- Image Representation --- p.42Chapter 4.4.3 --- Performance Evaluation --- p.43Chapter 4.4.4 --- Complexity and Time Cost Evaluation --- p.45Chapter 4.5 --- Discussions --- p.47Chapter 4.6 --- Summary --- p.48Chapter 5 --- Group-based Relevance Feedback --- p.49Chapter 5.1 --- Introduction --- p.49Chapter 5.2 --- SVM Ensembles --- p.50Chapter 5.3 --- Group-based Relevance Feedback Using SVM Ensembles --- p.51Chapter 5.3.1 --- (x+l)-class Assumption --- p.51Chapter 5.3.2 --- Proposed Architecture --- p.52Chapter 5.3.3 --- Strategy for SVM Combination and Group Ag- gregation --- p.52Chapter 5.4 --- Experiments --- p.54Chapter 5.4.1 --- Experimental Implementation --- p.54Chapter 5.4.2 --- Performance Evaluation --- p.55Chapter 5.5 --- Discussions --- p.56Chapter 5.6 --- Summary --- p.57Chapter 6 --- Log-based Relevance Feedback --- p.58Chapter 6.1 --- Introduction --- p.58Chapter 6.2 --- Related Work and Motivation --- p.60Chapter 6.3 --- Log-based Relevance Feedback Using SLSVM --- p.61Chapter 6.3.1 --- Problem Statement --- p.61Chapter 6.3.2 --- Soft Label Support Vector Machine --- p.62Chapter 6.3.3 --- LRF Algorithm by SLSVM --- p.64Chapter 6.4 --- Experimental Results --- p.66Chapter 6.4.1 --- Datasets --- p.66Chapter 6.4.2 --- Image Representation --- p.66Chapter 6.4.3 --- Experimental Setup --- p.67Chapter 6.4.4 --- Performance Comparison --- p.68Chapter 6.5 --- Discussions --- p.73Chapter 6.6 --- Summary --- p.75Chapter 7 --- Application: Web Image Learning --- p.76Chapter 7.1 --- Introduction --- p.76Chapter 7.2 --- A Learning Scheme for Searching Semantic Concepts --- p.77Chapter 7.2.1 --- Searching and Clustering Web Images --- p.78Chapter 7.2.2 --- Learning Semantic Concepts with Relevance Feed- back --- p.73Chapter 7.3 --- Experimental Results --- p.79Chapter 7.3.1 --- Dataset and Features --- p.79Chapter 7.3.2 --- Performance Evaluation --- p.80Chapter 7.4 --- Discussions --- p.82Chapter 7.5 --- Summary --- p.82Chapter 8 --- Conclusions and Future Work --- p.84Chapter 8.1 --- Conclusions --- p.84Chapter 8.2 --- Future Work --- p.85Chapter A --- List of Publications --- p.87Bibliography --- p.10

    Bridging semantic gap: learning and integrating semantics for content-based retrieval

    Full text link
    Digital cameras have entered ordinary homes and produced^incredibly large number of photos. As a typical example of broad image domain, unconstrained consumer photos vary significantly. Unlike professional or domain-specific images, the objects in the photos are ill-posed, occluded, and cluttered with poor lighting, focus, and exposure. Content-based image retrieval research has yet to bridge the semantic gap between computable low-level information and high-level user interpretation. In this thesis, we address the issue of semantic gap with a structured learning framework to allow modular extraction of visual semantics. Semantic image regions (e.g. face, building, sky etc) are learned statistically, detected directly from image without segmentation, reconciled across multiple scales, and aggregated spatially to form compact semantic index. To circumvent the ambiguity and subjectivity in a query, a new query method that allows spatial arrangement of visual semantics is proposed. A query is represented as a disjunctive normal form of visual query terms and processed using fuzzy set operators. A drawback of supervised learning is the manual labeling of regions as training samples. In this thesis, a new learning framework to discover local semantic patterns and to generate their samples for training with minimal human intervention has been developed. The discovered patterns can be visualized and used in semantic indexing. In addition, three new class-based indexing schemes are explored. The winnertake- all scheme supports class-based image retrieval. The class relative scheme and the local classification scheme compute inter-class memberships and local class patterns as indexes for similarity matching respectively. A Bayesian formulation is proposed to unify local and global indexes in image comparison and ranking that resulted in superior image retrieval performance over those of single indexes. Query-by-example experiments on 2400 consumer photos with 16 semantic queries show that the proposed approaches have significantly better (18% to 55%) average precisions than a high-dimension feature fusion approach. The thesis has paved two promising research directions, namely the semantics design approach and the semantics discovery approach. They form elegant dual frameworks that exploits pattern classifiers in learning and integrating local and global image semantics

    Semi-supervised learning for image classification

    Get PDF
    Object class recognition is an active topic in computer vision still presenting many challenges. In most approaches, this task is addressed by supervised learning algorithms that need a large quantity of labels to perform well. This leads either to small datasets (< 10,000 images) that capture only a subset of the real-world class distribution (but with a controlled and verified labeling procedure), or to large datasets that are more representative but also add more label noise. Therefore, semi-supervised learning is a promising direction. It requires only few labels while simultaneously making use of the vast amount of images available today. We address object class recognition with semi-supervised learning. These algorithms depend on the underlying structure given by the data, the image description, and the similarity measure, and the quality of the labels. This insight leads to the main research questions of this thesis: Is the structure given by labeled and unlabeled data more important than the algorithm itself? Can we improve this neighborhood structure by a better similarity metric or with more representative unlabeled data? Is there a connection between the quality of labels and the overall performance and how can we get more representative labels? We answer all these questions, i.e., we provide an extensive evaluation, we propose several graph improvements, and we introduce a novel active learning framework to get more representative labels.Objektklassifizierung ist ein aktives Forschungsgebiet in maschineller Bildverarbeitung was bisher nur unzureichend gelöst ist. Die meisten Ansätze versuchen die Aufgabe durch überwachtes Lernen zu lösen. Aber diese Algorithmen benötigen eine hohe Anzahl von Trainingsdaten um gut zu funktionieren. Das führt häufig entweder zu sehr kleinen Datensätzen (< 10,000 Bilder) die nicht die reale Datenverteilung einer Klasse wiedergeben oder zu sehr grossen Datensätzen bei denen man die Korrektheit der Labels nicht mehr garantieren kann. Halbüberwachtes Lernen ist eine gute Alternative zu diesen Methoden, da sie nur sehr wenige Labels benötigen und man gleichzeitig Datenressourcen wie das Internet verwenden kann. In dieser Arbeit adressieren wir Objektklassifizierung mit halbüberwachten Lernverfahren. Diese Algorithmen sind sowohl von der zugrundeliegenden Struktur, die sich aus den Daten, der Bildbeschreibung und der Distanzmasse ergibt, als auch von der Qualität der Labels abhängig. Diese Erkenntnis hat folgende Forschungsfragen aufgeworfen: Ist die Struktur wichtiger als der Algorithmus selbst? Können wir diese Struktur gezielt verbessern z.B. durch eine bessere Metrik oder durch mehr Daten? Gibt es einen Zusammenhang zwischen der Qualität der Labels und der Gesamtperformanz der Algorithmen? In dieser Arbeit beantworten wir diese Fragen indem wir diese Methoden evaluieren. Ausserdem entwickeln wir neue Methoden um die Graphstruktur und die Labels zu verbessern
    corecore