237,012 research outputs found

    A lazy learning approach for building classification models

    Get PDF
    In this paper, we propose a lazy learning strategy for building classification learning models. Instead of learning the models with the whole training data set before observing the new instance, a selection of patterns is made depending on the new query received and a classification model is learnt with those selected patterns. The selection of patterns is not homogeneous, in the sense that the number of selected patterns depends on the position of the query instance in the input space. That selection is made using a weighting function to give more importance to the training patterns that are more similar to the query instance. Our intention is to provide a lazy learning mechanism suited to any machine learning classification algorithm. For this reason, we study two different methods to avoid fixing any parameter. Experimental results show that classification rates of traditional machine learning algorithms based on trees, rules, or functions can be improved when they are learnt with the lazy learning approach proposed.This work has been funded by the Spanish Ministry of Science under contract TIN2008-06491-C04-03 (MSTAR project).Publicad

    Two-Step Active Learning for Instance Segmentation with Uncertainty and Diversity Sampling

    Full text link
    Training high-quality instance segmentation models requires an abundance of labeled images with instance masks and classifications, which is often expensive to procure. Active learning addresses this challenge by striving for optimum performance with minimal labeling cost by selecting the most informative and representative images for labeling. Despite its potential, active learning has been less explored in instance segmentation compared to other tasks like image classification, which require less labeling. In this study, we propose a post-hoc active learning algorithm that integrates uncertainty-based sampling with diversity-based sampling. Our proposed algorithm is not only simple and easy to implement, but it also delivers superior performance on various datasets. Its practical application is demonstrated on a real-world overhead imagery dataset, where it increases the labeling efficiency fivefold.Comment: UNCV ICCV 202

    Multi-graph learning

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Multi-instance learning (MIL) is a special learning task where labels are only available for a bag of instances. Although MIL has been used for many applications, existing MIL algorithms cannot handle complex data objects, and all require that instances inside each bag are represented as feature vectors (e.g. being represented in an instance-feature format). In reality, many real-world objects are inherently complicated, and an object can be represented as multiple instances with dependency structures (i.e. graphs). Such dependency allows relationships between objects to play important roles, which, unfortunately, remain unaddressed in traditional instance-feature representations. Motivated by the challenges, this thesis formulates a new multi-graph learning paradigm for representing and classifying complicated objects. With the proposed multi-graph representation, the thesis systematically addresses several key learning tasks, including Multi-Graph Learning: A graph bag contains one or multiple graphs, and each bag is labeled as either positive or negative. The aim of multi-graph learning is to build a learning model from a number of labeled training bags to predict previously unseen bags with maximum accuracy. To solve the problem, we propose two types of approaches: 1) Multi-Graph Feature based Learning (gMGFL) algorithm that explores and selects an optimal set of subgraphs as features to transfer each bag into a single instance for further learning; and 2) Boosting based Multi-Graph Classification framework (bMGC), which employs dynamic weight adjustment, at both graph- and bag-levels, to select one subgraph in each iteration to form a set of weak graph classifiers. Multi-Instance Multi-Graph learning: A bag contains a number of instances and graphs in pairs, and the learning objective is to derive classification models from labeled bags, containing both instances and graphs, to predict previously unseen bags with maximum accuracy. In the thesis, we propose a Dual Embedding Multi-Instance Multi-Graph Learning (DE-MIMG) algorithm, which employs a dual embedding learning approach to (1) embed instance distributions into the informative subgraphs discovery process, and (2) embed discovered subgraphs into the instance feature selection process. Positive and Unlabeled Multi-Graph Learning: The training set only contains positive and unlabeled bags, where labels are only available for bags but not for individual graphs inside the bag. This problem setting raises significant challenges because bag-of-graph setting does not have features available to directly represent graph data, and no negative bags exits for deriving discriminative classification models. To solve the challenge, we propose a puMGL learning framework which relies on two iteratively combined processes: (1) deriving features to represent graphs for learning; and (2) deriving discriminative models with only positive and unlabeled graph bags. Multi-Graph-View Learning: A multi-graph-view model utilizes graphs constructed from multiple graph-views to represent an object. In our research, we formulate a new multi-graph-view learning task for graph classification, where each object to be classified is represented graphs under multi-graph-view. To solve the problem, we propose a Cross Graph-View Subgraph Feature based Learning (gCGVFL) algorithm that explores an optimal set of subgraph features cross multiple graph-views. In addition, a bag based multi-graph model is further used to relax the labeling by only requiring one label for each graph bag, which corresponds to one object. For learning classification models, we propose a multi-graph-view bag learning algorithm (MGVBL), to explore subgraphs from multiple graph-views for learning. Experiments on real-world data validate and demonstrate the performance of proposed methods for classifying complicated objects using multi-graph learning

    Are Machine Learning Based Intrusion Detection System Always Secure?:An insight into tampered learning

    Get PDF
    Machine learning is successful in many applications including securing a network from unseen attack. The application of learning algorithm for detecting anomaly in a Network has been fundamental since few years. With increasing use of machine learning techniques it has become important to study to what extent it is good to be dependent on them. Altogether a different discipline called ‘Adversarial Learning’ have come up as a separate dimension of study. The work in this paper is to test the robustness of online machine learning based IDS to carefully crafted packets by attacker called poison packets. The objective is to observe how a remote attacker can deviate the normal behavior of machine learning based classifier in the IDS by injecting the network with carefully crafted packets externally, that may seem normal by the classification algorithm and the instance made part of its future training set. This behavior eventually can lead to a poison learning by the classification algorithm in the long run, resulting in misclassification of true attack instances. This work explores one such approach with SOM and SVM as the online learning based classification algorithms

    ISBDD model for classification of hyperspectral remote sensing imagery

    Get PDF
    The diverse density (DD) algorithm was proposed to handle the problem of low classification accuracy when training samples contain interference such as mixed pixels. The DD algorithm can learn a feature vector from training bags, which comprise instances (pixels). However, the feature vector learned by the DD algorithm cannot always effectively represent one type of ground cover. To handle this problem, an instance space-based diverse density (ISBDD) model that employs a novel training strategy is proposed in this paper. In the ISBDD model, DD values of each pixel are computed instead of learning a feature vector, and as a result, the pixel can be classified according to its DD values. Airborne hyperspectral data collected by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor and the Push-broom Hyperspectral Imager (PHI) are applied to evaluate the performance of the proposed model. Results show that the overall classification accuracy of ISBDD model on the AVIRIS and PHI images is up to 97.65% and 89.02%, respectively, while the kappa coefficient is up to 0.97 and 0.88, respectively

    A genetic prototype learner

    Get PDF
    Supervised classification problems have received considerable attention from the machine learning community. We propose a novel genetic algorithm based prototype learning system, PLEASE, for this class of problems. Given a set of prototypes for each of the possible classes, the class of an input instance is determined by the prototype nearest to this instance. We assume ordinal attributes and prototypes are represented as sets of feature-value pairs. A genetic algorithm is used to evolve the number of prototypes per class and their positions on the input space as determined by corresponding feature-value pairs. Comparisons with C4.5 on a set of artificial problems of controlled complexity demonstrate the effectiveness of the proposed system.

    Large-width machine learning algorithm

    Get PDF
    We introduce an algorithm, called Large Width (LW), that produces a multi-category classifier (defined on a distance space) with the property that the classifier has a large ‘sample width.’ (Width is a notion similar to classification margin.) LW is an incremental instance-based (also known as ‘lazy’) learning algorithm. Given a sample of labeled and unlabeled examples, it iteratively picks the next unlabeled example and classifies it while maintaining a large distance between each labeled example and its nearest-unlike prototype. (A prototype is either a labeled example or an unlabeled example which has already been classified.) Thus, LW gives a higher priority to unlabeled points whose classification decision ‘interferes’ less with the labeled sample. On a collection UCI benchmark datasets, the LW algorithm ranks at the top when compared to 11 instance-based learning algorithms (or configurations). When compared to the best candidate from instance-based learners, MLP, SVM, decision tree learner (C4.5) and Naive Bayes, LW is ranked at second place after only MLP which comes at first place by a single extra win against LW. The LW algorithm can be implemented in parallel distributed processing to yield a high speedup factor and is suitable for any distance space, with a distance function which need not necessarily satisfy the conditions of a metric

    A Novel Multiinstance Learning Approach for Liver Cancer Recognition on Abdominal CT Images Based on CPSO-SVM and IO

    Get PDF
    A novel multi-instance learning (MIL) method is proposed to recognize liver cancer with abdominal CT images based on instance optimization (IO) and support vector machine with parameters optimized by a combination algorithm of particle swarm optimization and local optimization (CPSO-SVM). Introducing MIL into liver cancer recognition can solve the problem of multiple regions of interest classification. The images we use in the experiments are liver CT images extracted from abdominal CT images. The proposed method consists of two main steps: (1) obtaining the key instances through IO by texture features and a classification threshold in classification of instances with CPSO-SVM and (2) predicting unknown samples with the key instances and the classification threshold. By extracting the instances equally based on the entire image, the proposed method can ignore the procedure of tumor region segmentation and lower the demand of segmentation accuracy of liver region. The normal SVM method and two MIL algorithms, Citation-kNN algorithm and WEMISVM algorithm, have been chosen as comparing algorithms. The experimental results show that the proposed method can effectively recognize liver cancer images from two kinds of cancer CT images and greatly improve the recognition accuracy
    • …
    corecore