22,104 research outputs found

    A novel active learning technique for multi-label remote sensing image scene classification

    Get PDF
    Copyright 2018 Society of Photo‑Optical Instrumentation Engineers (SPIE). One print or electronic copy may be made for personal use only. Systematic reproduction and distribution, duplication of any material in this publication for a fee or for commercial purposes, and modification of the contents of the publication are prohibited.This paper presents a novel multi-label active learning (MLAL) technique in the framework of multi-label remote sensing (RS) image scene classification problems. The proposed MLAL technique is developed in the framework of the multi-label SVM classifier (ML-SVM). Unlike the standard AL methods, the proposed MLAL technique redefines active learning by evaluating the informativeness of each image based on its multiple land-cover classes. Accordingly, the proposed MLAL technique is based on the joint evaluation of two criteria for the selection of the most informative images: i) multi-label uncertainty and ii) multi-label diversity. The multi-label uncertainty criterion is associated to the confidence of the multi-label classification algorithm in correctly assigning multi-labels to each image, whereas multi-label diversity criterion aims at selecting a set of un-annotated images that are as more diverse as possible to reduce the redundancy among them. In order to evaluate the multi-label uncertainty of each image, we propose a novel multi-label margin sampling strategy that: 1) considers the functional distances of each image to all ML-SVM hyperplanes; and then 2) estimates the occurrence on how many times each image falls inside the margins of ML-SVMs. If the occurrence is small, the classifiers are confident to correctly classify the considered image, and vice versa. In order to evaluate the multi-label diversity of each image, we propose a novel clustering-based strategy that clusters all the images inside the margins of the ML-SVMs and avoids selecting the uncertain images from the same clusters. The joint use of the two criteria allows one to enrich the training set of images with multi-labels. Experimental results obtained on a benchmark archive with 2100 images with their multi-labels show the effectiveness of the proposed MLAL method compared to the standard AL methods that neglect the evaluation of the uncertainty and diversity on multi-labels.EC/H2020/759764/EU/Accurate and Scalable Processing of Big Data in Earth Observation/BigEart

    Wireless Data Acquisition for Edge Learning: Data-Importance Aware Retransmission

    Full text link
    By deploying machine-learning algorithms at the network edge, edge learning can leverage the enormous real-time data generated by billions of mobile devices to train AI models, which enable intelligent mobile applications. In this emerging research area, one key direction is to efficiently utilize radio resources for wireless data acquisition to minimize the latency of executing a learning task at an edge server. Along this direction, we consider the specific problem of retransmission decision in each communication round to ensure both reliability and quantity of those training data for accelerating model convergence. To solve the problem, a new retransmission protocol called data-importance aware automatic-repeat-request (importance ARQ) is proposed. Unlike the classic ARQ focusing merely on reliability, importance ARQ selectively retransmits a data sample based on its uncertainty which helps learning and can be measured using the model under training. Underpinning the proposed protocol is a derived elegant communication-learning relation between two corresponding metrics, i.e., signal-to-noise ratio (SNR) and data uncertainty. This relation facilitates the design of a simple threshold based policy for importance ARQ. The policy is first derived based on the classic classifier model of support vector machine (SVM), where the uncertainty of a data sample is measured by its distance to the decision boundary. The policy is then extended to the more complex model of convolutional neural networks (CNN) where data uncertainty is measured by entropy. Extensive experiments have been conducted for both the SVM and CNN using real datasets with balanced and imbalanced distributions. Experimental results demonstrate that importance ARQ effectively copes with channel fading and noise in wireless data acquisition to achieve faster model convergence than the conventional channel-aware ARQ.Comment: This is an updated version: 1) extension to general classifiers; 2) consideration of imbalanced classification in the experiments. Submitted to IEEE Journal for possible publicatio

    Bag-Level Aggregation for Multiple Instance Active Learning in Instance Classification Problems

    Full text link
    A growing number of applications, e.g. video surveillance and medical image analysis, require training recognition systems from large amounts of weakly annotated data while some targeted interactions with a domain expert are allowed to improve the training process. In such cases, active learning (AL) can reduce labeling costs for training a classifier by querying the expert to provide the labels of most informative instances. This paper focuses on AL methods for instance classification problems in multiple instance learning (MIL), where data is arranged into sets, called bags, that are weakly labeled. Most AL methods focus on single instance learning problems. These methods are not suitable for MIL problems because they cannot account for the bag structure of data. In this paper, new methods for bag-level aggregation of instance informativeness are proposed for multiple instance active learning (MIAL). The \textit{aggregated informativeness} method identifies the most informative instances based on classifier uncertainty, and queries bags incorporating the most information. The other proposed method, called \textit{cluster-based aggregative sampling}, clusters data hierarchically in the instance space. The informativeness of instances is assessed by considering bag labels, inferred instance labels, and the proportion of labels that remain to be discovered in clusters. Both proposed methods significantly outperform reference methods in extensive experiments using benchmark data from several application domains. Results indicate that using an appropriate strategy to address MIAL problems yields a significant reduction in the number of queries needed to achieve the same level of performance as single instance AL methods

    Active Transfer Learning with Zero-Shot Priors: Reusing Past Datasets for Future Tasks

    Get PDF
    How can we reuse existing knowledge, in the form of available datasets, when solving a new and apparently unrelated target task from a set of unlabeled data? In this work we make a first contribution to answer this question in the context of image classification. We frame this quest as an active learning problem and use zero-shot classifiers to guide the learning process by linking the new task to the existing classifiers. By revisiting the dual formulation of adaptive SVM, we reveal two basic conditions to choose greedily only the most relevant samples to be annotated. On this basis we propose an effective active learning algorithm which learns the best possible target classification model with minimum human labeling effort. Extensive experiments on two challenging datasets show the value of our approach compared to the state-of-the-art active learning methodologies, as well as its potential to reuse past datasets with minimal effort for future tasks

    Active Sampling of Pairs and Points for Large-scale Linear Bipartite Ranking

    Full text link
    Bipartite ranking is a fundamental ranking problem that learns to order relevant instances ahead of irrelevant ones. The pair-wise approach for bi-partite ranking construct a quadratic number of pairs to solve the problem, which is infeasible for large-scale data sets. The point-wise approach, albeit more efficient, often results in inferior performance. That is, it is difficult to conduct bipartite ranking accurately and efficiently at the same time. In this paper, we develop a novel active sampling scheme within the pair-wise approach to conduct bipartite ranking efficiently. The scheme is inspired from active learning and can reach a competitive ranking performance while focusing only on a small subset of the many pairs during training. Moreover, we propose a general Combined Ranking and Classification (CRC) framework to accurately conduct bipartite ranking. The framework unifies point-wise and pair-wise approaches and is simply based on the idea of treating each instance point as a pseudo-pair. Experiments on 14 real-word large-scale data sets demonstrate that the proposed algorithm of Active Sampling within CRC, when coupled with a linear Support Vector Machine, usually outperforms state-of-the-art point-wise and pair-wise ranking approaches in terms of both accuracy and efficiency.Comment: a shorter version was presented in ACML 201

    Multiple Instance Learning: A Survey of Problem Characteristics and Applications

    Full text link
    Multiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, and a label is provided for the entire bag. This formulation is gaining interest because it naturally fits various problems and allows to leverage weakly labeled data. Consequently, it has been used in diverse application fields such as computer vision and document classification. However, learning from bags raises important challenges that are unique to MIL. This paper provides a comprehensive survey of the characteristics which define and differentiate the types of MIL problems. Until now, these problem characteristics have not been formally identified and described. As a result, the variations in performance of MIL algorithms from one data set to another are difficult to explain. In this paper, MIL problem characteristics are grouped into four broad categories: the composition of the bags, the types of data distribution, the ambiguity of instance labels, and the task to be performed. Methods specialized to address each category are reviewed. Then, the extent to which these characteristics manifest themselves in key MIL application areas are described. Finally, experiments are conducted to compare the performance of 16 state-of-the-art MIL methods on selected problem characteristics. This paper provides insight on how the problem characteristics affect MIL algorithms, recommendations for future benchmarking and promising avenues for research
    • …
    corecore