79,867 research outputs found

    On Using Active Learning and Self-Training when Mining Performance Discussions on Stack Overflow

    Full text link
    Abundant data is the key to successful machine learning. However, supervised learning requires annotated data that are often hard to obtain. In a classification task with limited resources, Active Learning (AL) promises to guide annotators to examples that bring the most value for a classifier. AL can be successfully combined with self-training, i.e., extending a training set with the unlabelled examples for which a classifier is the most certain. We report our experiences on using AL in a systematic manner to train an SVM classifier for Stack Overflow posts discussing performance of software components. We show that the training examples deemed as the most valuable to the classifier are also the most difficult for humans to annotate. Despite carefully evolved annotation criteria, we report low inter-rater agreement, but we also propose mitigation strategies. Finally, based on one annotator's work, we show that self-training can improve the classification accuracy. We conclude the paper by discussing implication for future text miners aspiring to use AL and self-training.Comment: Preprint of paper accepted for the Proc. of the 21st International Conference on Evaluation and Assessment in Software Engineering, 201

    S-Rocket: Selective Random Convolution Kernels for Time Series Classification

    Full text link
    Random convolution kernel transform (Rocket) is a fast, efficient, and novel approach for time series feature extraction using a large number of independent randomly initialized 1-D convolution kernels of different configurations. The output of the convolution operation on each time series is represented by a partial positive value (PPV). A concatenation of PPVs from all kernels is the input feature vector to a Ridge regression classifier. Unlike typical deep learning models, the kernels are not trained and there is no weighted/trainable connection between kernels or concatenated features and the classifier. Since these kernels are generated randomly, a portion of these kernels may not positively contribute in performance of the model. Hence, selection of the most important kernels and pruning the redundant and less important ones is necessary to reduce computational complexity and accelerate inference of Rocket for applications on the edge devices. Selection of these kernels is a combinatorial optimization problem. In this paper, we propose a scheme for selecting these kernels while maintaining the classification performance. First, the original model is pre-trained at full capacity. Then, a population of binary candidate state vectors is initialized where each element of a vector represents the active/inactive status of a kernel. A population-based optimization algorithm evolves the population in order to find a best state vector which minimizes the number of active kernels while maximizing the accuracy of the classifier. This activation function is a linear combination of the total number of active kernels and the classification accuracy of the pre-trained classifier with the active kernels. Finally, the selected kernels in the best state vector are utilized to train the Ridge regression classifier with the selected kernels

    Multi texture analysis of colorectal cancer continuum using multispectral imagery

    Get PDF
    Purpose This paper proposes to characterize the continuum of colorectal cancer (CRC) using multiple texture features extracted from multispectral optical microscopy images. Three types of pathological tissues (PT) are considered: benign hyperplasia, intraepithelial neoplasia and carcinoma. Materials and Methods In the proposed approach, the region of interest containing PT is first extracted from multispectral images using active contour segmentation. This region is then encoded using texture features based on the Laplacian-of-Gaussian (LoG) filter, discrete wavelets (DW) and gray level co-occurrence matrices (GLCM). To assess the significance of textural differences between PT types, a statistical analysis based on the Kruskal-Wallis test is performed. The usefulness of texture features is then evaluated quantitatively in terms of their ability to predict PT types using various classifier models. Results Preliminary results show significant texture differences between PT types, for all texture features (p-value < 0.01). Individually, GLCM texture features outperform LoG and DW features in terms of PT type prediction. However, a higher performance can be achieved by combining all texture features, resulting in a mean classification accuracy of 98.92%, sensitivity of 98.12%, and specificity of 99.67%. Conclusions These results demonstrate the efficiency and effectiveness of combining multiple texture features for characterizing the continuum of CRC and discriminating between pathological tissues in multispectral images

    Online learning and detection of faces with low human supervision

    Get PDF
    The final publication is available at link.springer.comWe present an efficient,online,and interactive approach for computing a classifier, called Wild Lady Ferns (WiLFs), for face learning and detection using small human supervision. More precisely, on the one hand, WiLFs combine online boosting and extremely randomized trees (Random Ferns) to compute progressively an efficient and discriminative classifier. On the other hand, WiLFs use an interactive human-machine approach that combines two complementary learning strategies to reduce considerably the degree of human supervision during learning. While the first strategy corresponds to query-by-boosting active learning, that requests human assistance over difficult samples in function of the classifier confidence, the second strategy refers to a memory-based learning which uses Âż Exemplar-based Nearest Neighbors (ÂżENN) to assist automatically the classifier. A pre-trained Convolutional Neural Network (CNN) is used to perform ÂżENN with high-level feature descriptors. The proposed approach is therefore fast (WilFs run in 1 FPS using a code not fully optimized), accurate (we obtain detection rates over 82% in complex datasets), and labor-saving (human assistance percentages of less than 20%). As a byproduct, we demonstrate that WiLFs also perform semi-automatic annotation during learning, as while the classifier is being computed, WiLFs are discovering faces instances in input images which are used subsequently for training online the classifier. The advantages of our approach are demonstrated in synthetic and publicly available databases, showing comparable detection rates as offline approaches that require larger amounts of handmade training data.Peer ReviewedPostprint (author's final draft

    Interactive multiple object learning with scanty human supervision

    Get PDF
    © 2016. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/We present a fast and online human-robot interaction approach that progressively learns multiple object classifiers using scanty human supervision. Given an input video stream recorded during the human robot interaction, the user just needs to annotate a small fraction of frames to compute object specific classifiers based on random ferns which share the same features. The resulting methodology is fast (in a few seconds, complex object appearances can be learned), versatile (it can be applied to unconstrained scenarios), scalable (real experiments show we can model up to 30 different object classes), and minimizes the amount of human intervention by leveraging the uncertainty measures associated to each classifier.; We thoroughly validate the approach on synthetic data and on real sequences acquired with a mobile platform in indoor and outdoor scenarios containing a multitude of different objects. We show that with little human assistance, we are able to build object classifiers robust to viewpoint changes, partial occlusions, varying lighting and cluttered backgrounds. (C) 2016 Elsevier Inc. All rights reserved.Peer ReviewedPostprint (author's final draft

    Recommending with an Agenda: Active Learning of Private Attributes using Matrix Factorization

    Full text link
    Recommender systems leverage user demographic information, such as age, gender, etc., to personalize recommendations and better place their targeted ads. Oftentimes, users do not volunteer this information due to privacy concerns, or due to a lack of initiative in filling out their online profiles. We illustrate a new threat in which a recommender learns private attributes of users who do not voluntarily disclose them. We design both passive and active attacks that solicit ratings for strategically selected items, and could thus be used by a recommender system to pursue this hidden agenda. Our methods are based on a novel usage of Bayesian matrix factorization in an active learning setting. Evaluations on multiple datasets illustrate that such attacks are indeed feasible and use significantly fewer rated items than static inference methods. Importantly, they succeed without sacrificing the quality of recommendations to users.Comment: This is the extended version of a paper that appeared in ACM RecSys 201
    • …
    corecore