304 research outputs found

    Biased classification for relevance feedback in content-based image retrieval.

    Get PDF
    Peng, Xiang.Thesis (M.Phil.)--Chinese University of Hong Kong, 2007.Includes bibliographical references (leaves 98-115).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Problem Statement --- p.3Chapter 1.2 --- Major Contributions --- p.6Chapter 1.3 --- Thesis Outline --- p.7Chapter 2 --- Background Study --- p.9Chapter 2.1 --- Content-based Image Retrieval --- p.9Chapter 2.1.1 --- Image Representation --- p.11Chapter 2.1.2 --- High Dimensional Indexing --- p.15Chapter 2.1.3 --- Image Retrieval Systems Design --- p.16Chapter 2.2 --- Relevance Feedback --- p.19Chapter 2.2.1 --- Self-Organizing Map in Relevance Feedback --- p.20Chapter 2.2.2 --- Decision Tree in Relevance Feedback --- p.22Chapter 2.2.3 --- Bayesian Classifier in Relevance Feedback --- p.24Chapter 2.2.4 --- Nearest Neighbor Search in Relevance Feedback --- p.25Chapter 2.2.5 --- Support Vector Machines in Relevance Feedback --- p.26Chapter 2.3 --- Imbalanced Classification --- p.29Chapter 2.4 --- Active Learning --- p.31Chapter 2.4.1 --- Uncertainly-based Sampling --- p.33Chapter 2.4.2 --- Error Reduction --- p.34Chapter 2.4.3 --- Batch Selection --- p.35Chapter 2.5 --- Convex Optimization --- p.35Chapter 2.5.1 --- Overview of Convex Optimization --- p.35Chapter 2.5.2 --- Linear Program --- p.37Chapter 2.5.3 --- Quadratic Program --- p.37Chapter 2.5.4 --- Quadratically Constrained Quadratic Program --- p.37Chapter 2.5.5 --- Cone Program --- p.38Chapter 2.5.6 --- Semi-definite Program --- p.39Chapter 3 --- Imbalanced Learning with BMPM for CBIR --- p.40Chapter 3.1 --- Research Motivation --- p.41Chapter 3.2 --- Background Review --- p.42Chapter 3.2.1 --- Relevance Feedback for CBIR --- p.42Chapter 3.2.2 --- Minimax Probability Machine --- p.42Chapter 3.2.3 --- Extensions of Minimax Probability Machine --- p.44Chapter 3.3 --- Relevance Feedback using BMPM --- p.45Chapter 3.3.1 --- Model Definition --- p.45Chapter 3.3.2 --- Advantages of BMPM in Relevance Feedback --- p.46Chapter 3.3.3 --- Relevance Feedback Framework by BMPM --- p.47Chapter 3.4 --- Experimental Results --- p.47Chapter 3.4.1 --- Experiment Datasets --- p.48Chapter 3.4.2 --- Performance Evaluation --- p.50Chapter 3.4.3 --- Discussions --- p.53Chapter 3.5 --- Summary --- p.53Chapter 4 --- BMPM Active Learning for CBIR --- p.55Chapter 4.1 --- Problem Statement and Motivation --- p.55Chapter 4.2 --- Background Review --- p.57Chapter 4.3 --- Relevance Feedback by BMPM Active Learning . --- p.58Chapter 4.3.1 --- Active Learning Concept --- p.58Chapter 4.3.2 --- General Approaches for Active Learning . --- p.59Chapter 4.3.3 --- Biased Minimax Probability Machine --- p.60Chapter 4.3.4 --- Proposed Framework --- p.61Chapter 4.4 --- Experimental Results --- p.63Chapter 4.4.1 --- Experiment Setup --- p.64Chapter 4.4.2 --- Performance Evaluation --- p.66Chapter 4.5 --- Summary --- p.68Chapter 5 --- Large Scale Learning with BMPM --- p.70Chapter 5.1 --- Introduction --- p.71Chapter 5.1.1 --- Motivation --- p.71Chapter 5.1.2 --- Contribution --- p.72Chapter 5.2 --- Background Review --- p.72Chapter 5.2.1 --- Second Order Cone Program --- p.72Chapter 5.2.2 --- General Methods for Large Scale Problems --- p.73Chapter 5.2.3 --- Biased Minimax Probability Machine --- p.75Chapter 5.3 --- Efficient BMPM Training --- p.78Chapter 5.3.1 --- Proposed Strategy --- p.78Chapter 5.3.2 --- Kernelized BMPM and Its Solution --- p.81Chapter 5.4 --- Experimental Results --- p.82Chapter 5.4.1 --- Experimental Testbeds --- p.83Chapter 5.4.2 --- Experimental Settings --- p.85Chapter 5.4.3 --- Performance Evaluation --- p.87Chapter 5.5 --- Summary --- p.92Chapter 6 --- Conclusion and Future Work --- p.93Chapter 6.1 --- Conclusion --- p.93Chapter 6.2 --- Future Work --- p.94Chapter A --- List of Symbols and Notations --- p.96Chapter B --- List of Publications --- p.98Bibliography --- p.10

    Learning on relevance feedback in content-based image retrieval.

    Get PDF
    Hoi, Chu-Hong.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves 89-103).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Content-based Image Retrieval --- p.1Chapter 1.2 --- Relevance Feedback --- p.3Chapter 1.3 --- Contributions --- p.4Chapter 1.4 --- Organization of This Work --- p.6Chapter 2 --- Background --- p.8Chapter 2.1 --- Relevance Feedback --- p.8Chapter 2.1.1 --- Heuristic Weighting Methods --- p.9Chapter 2.1.2 --- Optimization Formulations --- p.10Chapter 2.1.3 --- Various Machine Learning Techniques --- p.11Chapter 2.2 --- Support Vector Machines --- p.12Chapter 2.2.1 --- Setting of the Learning Problem --- p.12Chapter 2.2.2 --- Optimal Separating Hyperplane --- p.13Chapter 2.2.3 --- Soft-Margin Support Vector Machine --- p.15Chapter 2.2.4 --- One-Class Support Vector Machine --- p.16Chapter 3 --- Relevance Feedback with Biased SVM --- p.18Chapter 3.1 --- Introduction --- p.18Chapter 3.2 --- Biased Support Vector Machine --- p.19Chapter 3.3 --- Relevance Feedback Using Biased SVM --- p.22Chapter 3.3.1 --- Advantages of BSVM in Relevance Feedback --- p.22Chapter 3.3.2 --- Relevance Feedback Algorithm by BSVM --- p.23Chapter 3.4 --- Experiments --- p.24Chapter 3.4.1 --- Datasets --- p.24Chapter 3.4.2 --- Image Representation --- p.25Chapter 3.4.3 --- Experimental Results --- p.26Chapter 3.5 --- Discussions --- p.29Chapter 3.6 --- Summary --- p.30Chapter 4 --- Optimizing Learning with SVM Constraint --- p.31Chapter 4.1 --- Introduction --- p.31Chapter 4.2 --- Related Work and Motivation --- p.33Chapter 4.3 --- Optimizing Learning with SVM Constraint --- p.35Chapter 4.3.1 --- Problem Formulation and Notations --- p.35Chapter 4.3.2 --- Learning boundaries with SVM --- p.35Chapter 4.3.3 --- OPL for the Optimal Distance Function --- p.38Chapter 4.3.4 --- Overall Similarity Measure with OPL and SVM --- p.40Chapter 4.4 --- Experiments --- p.41Chapter 4.4.1 --- Datasets --- p.41Chapter 4.4.2 --- Image Representation --- p.42Chapter 4.4.3 --- Performance Evaluation --- p.43Chapter 4.4.4 --- Complexity and Time Cost Evaluation --- p.45Chapter 4.5 --- Discussions --- p.47Chapter 4.6 --- Summary --- p.48Chapter 5 --- Group-based Relevance Feedback --- p.49Chapter 5.1 --- Introduction --- p.49Chapter 5.2 --- SVM Ensembles --- p.50Chapter 5.3 --- Group-based Relevance Feedback Using SVM Ensembles --- p.51Chapter 5.3.1 --- (x+l)-class Assumption --- p.51Chapter 5.3.2 --- Proposed Architecture --- p.52Chapter 5.3.3 --- Strategy for SVM Combination and Group Ag- gregation --- p.52Chapter 5.4 --- Experiments --- p.54Chapter 5.4.1 --- Experimental Implementation --- p.54Chapter 5.4.2 --- Performance Evaluation --- p.55Chapter 5.5 --- Discussions --- p.56Chapter 5.6 --- Summary --- p.57Chapter 6 --- Log-based Relevance Feedback --- p.58Chapter 6.1 --- Introduction --- p.58Chapter 6.2 --- Related Work and Motivation --- p.60Chapter 6.3 --- Log-based Relevance Feedback Using SLSVM --- p.61Chapter 6.3.1 --- Problem Statement --- p.61Chapter 6.3.2 --- Soft Label Support Vector Machine --- p.62Chapter 6.3.3 --- LRF Algorithm by SLSVM --- p.64Chapter 6.4 --- Experimental Results --- p.66Chapter 6.4.1 --- Datasets --- p.66Chapter 6.4.2 --- Image Representation --- p.66Chapter 6.4.3 --- Experimental Setup --- p.67Chapter 6.4.4 --- Performance Comparison --- p.68Chapter 6.5 --- Discussions --- p.73Chapter 6.6 --- Summary --- p.75Chapter 7 --- Application: Web Image Learning --- p.76Chapter 7.1 --- Introduction --- p.76Chapter 7.2 --- A Learning Scheme for Searching Semantic Concepts --- p.77Chapter 7.2.1 --- Searching and Clustering Web Images --- p.78Chapter 7.2.2 --- Learning Semantic Concepts with Relevance Feed- back --- p.73Chapter 7.3 --- Experimental Results --- p.79Chapter 7.3.1 --- Dataset and Features --- p.79Chapter 7.3.2 --- Performance Evaluation --- p.80Chapter 7.4 --- Discussions --- p.82Chapter 7.5 --- Summary --- p.82Chapter 8 --- Conclusions and Future Work --- p.84Chapter 8.1 --- Conclusions --- p.84Chapter 8.2 --- Future Work --- p.85Chapter A --- List of Publications --- p.87Bibliography --- p.10

    Interactive Machine Learning with Applications in Health Informatics

    Full text link
    Recent years have witnessed unprecedented growth of health data, including millions of biomedical research publications, electronic health records, patient discussions on health forums and social media, fitness tracker trajectories, and genome sequences. Information retrieval and machine learning techniques are powerful tools to unlock invaluable knowledge in these data, yet they need to be guided by human experts. Unlike training machine learning models in other domains, labeling and analyzing health data requires highly specialized expertise, and the time of medical experts is extremely limited. How can we mine big health data with little expert effort? In this dissertation, I develop state-of-the-art interactive machine learning algorithms that bring together human intelligence and machine intelligence in health data mining tasks. By making efficient use of human expert's domain knowledge, we can achieve high-quality solutions with minimal manual effort. I first introduce a high-recall information retrieval framework that helps human users efficiently harvest not just one but as many relevant documents as possible from a searchable corpus. This is a common need in professional search scenarios such as medical search and literature review. Then I develop two interactive machine learning algorithms that leverage human expert's domain knowledge to combat the curse of "cold start" in active learning, with applications in clinical natural language processing. A consistent empirical observation is that the overall learning process can be reliably accelerated by a knowledge-driven "warm start", followed by machine-initiated active learning. As a theoretical contribution, I propose a general framework for interactive machine learning. Under this framework, a unified optimization objective explains many existing algorithms used in practice, and inspires the design of new algorithms.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147518/1/raywang_1.pd

    A picture is worth a thousand words : content-based image retrieval techniques

    Get PDF
    In my dissertation I investigate techniques for improving the state of the art in content-based image retrieval. To place my work into context, I highlight the current trends and challenges in my field by analyzing over 200 recent articles. Next, I propose a novel paradigm called __artificial imagination__, which gives the retrieval system the power to imagine and think along with the user in terms of what she is looking for. I then introduce a new user interface for visualizing and exploring image collections, empowering the user to navigate large collections based on her own needs and preferences, while simultaneously providing her with an accurate sense of what the database has to offer. In the later chapters I present work dealing with millions of images and focus in particular on high-performance techniques that minimize memory and computational use for both near-duplicate image detection and web search. Finally, I show early work on a scene completion-based image retrieval engine, which synthesizes realistic imagery that matches what the user has in mind.LEI Universiteit LeidenNWOImagin

    Fairly Adaptive Negative Sampling for Recommendations

    Full text link
    Pairwise learning strategies are prevalent for optimizing recommendation models on implicit feedback data, which usually learns user preference by discriminating between positive (i.e., clicked by a user) and negative items (i.e., obtained by negative sampling). However, the size of different item groups (specified by item attribute) is usually unevenly distributed. We empirically find that the commonly used uniform negative sampling strategy for pairwise algorithms (e.g., BPR) can inherit such data bias and oversample the majority item group as negative instances, severely countering group fairness on the item side. In this paper, we propose a Fairly adaptive Negative sampling approach (FairNeg), which improves item group fairness via adaptively adjusting the group-level negative sampling distribution in the training process. In particular, it first perceives the model's unfairness status at each step and then adjusts the group-wise sampling distribution with an adaptive momentum update strategy for better facilitating fairness optimization. Moreover, a negative sampling distribution Mixup mechanism is proposed, which gracefully incorporates existing importance-aware sampling techniques intended for mining informative negative samples, thus allowing for achieving multiple optimization purposes. Extensive experiments on four public datasets show our proposed method's superiority in group fairness enhancement and fairness-utility tradeoff.Comment: Accepted by TheWebConf202

    Spam elimination and bias correction : ensuring label quality in crowdsourced tasks.

    Get PDF
    Crowdsourcing is proposed as a powerful mechanism for accomplishing large scale tasks via anonymous workers online. It has been demonstrated as an effective and important approach for collecting labeled data in application domains which require human intelligence, such as image labeling, video annotation, natural language processing, etc. Despite the promises, one big challenge still exists in crowdsourcing systems: the difficulty of controlling the quality of crowds. The workers usually have diverse education levels, personal preferences, and motivations, leading to unknown work performance while completing a crowdsourced task. Among them, some are reliable, and some might provide noisy feedback. It is intrinsic to apply worker filtering approach to crowdsourcing applications, which recognizes and tackles noisy workers, in order to obtain high-quality labels. The presented work in this dissertation provides discussions in this area of research, and proposes efficient probabilistic based worker filtering models to distinguish varied types of poor quality workers. Most of the existing work in literature in the field of worker filtering either only concentrates on binary labeling tasks, or fails to separate the low quality workers whose label errors can be corrected from the other spam workers (with label errors which cannot be corrected). As such, we first propose a Spam Removing and De-biasing Framework (SRDF), to deal with the worker filtering procedure in labeling tasks with numerical label scales. The developed framework can detect spam workers and biased workers separately. The biased workers are defined as those who show tendencies of providing higher (or lower) labels than truths, and their errors are able to be corrected. To tackle the biasing problem, an iterative bias detection approach is introduced to recognize the biased workers. The spam filtering algorithm proposes to eliminate three types of spam workers, including random spammers who provide random labels, uniform spammers who give same labels for most of the items, and sloppy workers who offer low accuracy labels. Integrating the spam filtering and bias detection approaches into aggregating algorithms, which infer truths from labels obtained from crowds, can lead to high quality consensus results. The common characteristic of random spammers and uniform spammers is that they provide useless feedback without making efforts for a labeling task. Thus, it is not necessary to distinguish them separately. In addition, the removal of sloppy workers has great impact on the detection of biased workers, with the SRDF framework. To combat these problems, a different way of worker classification is presented in this dissertation. In particular, the biased workers are classified as a subcategory of sloppy workers. Finally, an ITerative Self Correcting - Truth Discovery (ITSC-TD) framework is then proposed, which can reliably recognize biased workers in ordinal labeling tasks, based on a probabilistic based bias detection model. ITSC-TD estimates true labels through applying an optimization based truth discovery method, which minimizes overall label errors by assigning different weights to workers. The typical tasks posted on popular crowdsourcing platforms, such as MTurk, are simple tasks, which are low in complexity, independent, and require little time to complete. Complex tasks, however, in many cases require the crowd workers to possess specialized skills in task domains. As a result, this type of task is more inclined to have the problem of poor quality of feedback from crowds, compared to simple tasks. As such, we propose a multiple views approach, for the purpose of obtaining high quality consensus labels in complex labeling tasks. In this approach, each view is defined as a labeling critique or rubric, which aims to guide the workers to become aware of the desirable work characteristics or goals. Combining the view labels results in the overall estimated labels for each item. The multiple views approach is developed under the hypothesis that workers\u27 performance might differ from one view to another. Varied weights are then assigned to different views for each worker. Additionally, the ITSC-TD framework is integrated into the multiple views model to achieve high quality estimated truths for each view. Next, we propose a Semi-supervised Worker Filtering (SWF) model to eliminate spam workers, who assign random labels for each item. The SWF approach conducts worker filtering with a limited set of gold truths available as priori. Each worker is associated with a spammer score, which is estimated via the developed semi-supervised model, and low quality workers are efficiently detected by comparing the spammer score with a predefined threshold value. The efficiency of all the developed frameworks and models are demonstrated on simulated and real-world data sets. By comparing the proposed frameworks to a set of state-of-art methodologies, such as expectation maximization based aggregating algorithm, GLAD and optimization based truth discovery approach, in the domain of crowdsourcing, up to 28.0% improvement can be obtained for the accuracy of true label estimation

    Neural Methods for Effective, Efficient, and Exposure-Aware Information Retrieval

    Get PDF
    Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents--or short passages--in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms--such as a person's name or a product model number--not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections--such as the document index of a commercial Web search engine--containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks.Comment: PhD thesis, Univ College London (2020
    • …
    corecore