5 research outputs found

    A Flexible Semi-supervised Feature Extraction Method for Image Classification

    Full text link
    Abstract. This paper proposes a novel discriminant semi-supervised feature extraction for generic classification and recognition tasks. The paper has two main contributions. First, we propose a flexible linear semi-supervised feature extraction method that seeks a non-linear subspace that is close to a linear one. The proposed method is based on a criterion that simultaneously exploits the discrimination information provided by the labeled samples, maintains the graph-based smoothness associated with all samples, regularizes the complexity of the linear transform, and minimizes the discrepancy between the unknown linear regression and the unknown non-linear projection. Second, we provide extensive exper-iments on four benchmark databases in order to study the performance of the proposed method. These experiments demonstrate much improve-ment over the state-of-the-art algorithms that are either based on label propagation or semi-supervised graph-based embedding.

    A flocking-like technique to perform semi-supervised learning

    Get PDF
    We present a nature-inspired semi-supervised learning technique based on the flocking formation of certain living species like birds and fishes. Each data item is treated as an individual in the flock. Starting from random directions, each data item moves according to its surrounding items, by getting closer to them (but not too much close) and taking the same direction of motion. Labeled items play special roles, ensuring that data from different classes will belong to different, distant flocks. Experiments on both artificial and benchmark datasets were performed and show its classification accuracy. Despite the rich behavior, we argue that this technique has a sub-quadratic asymptotic time complexity, thus being feasible to be used on large datasets. In order to achieve such performance, a space-partitioning technique is introduced. We also argue that the richness behind this dynamic, self-organizing model is quite robust and may be used to do much more than simply propagating the labels from labeled to unlabeled data. It could be used to determine class overlapping, wrong labeling, etc.The State of São Paulo Research Foundation (FAPESP)Brazilian National Research Council (CNPq

    Techniques for data pattern selection and abstraction

    Get PDF
    This thesis concerns the problem of prototype reduction in instance-based learning. In order to deal with problems such as storage requirements, sensitivity to noise and computational complexity, various algorithms have been presented that condense the number of stored prototypes, while maintaining competent classification accuracy. Instance selection, which recovers a smaller subset of the original training set, is the most widely used technique for instance reduction. But, prototype abstraction that generates new prototypes to replace the initial ones has also gained a lot of interest recently. The major contribution of this work is the proposal of four novel frameworks for performing prototype reduction, the Class Boundary Preserving algorithm (CBP), a hybrid method that uses both selection and generation of prototypes, Instance Seriation for Prototype Abstraction (ISPA), which is an abstraction algorithm, and two selective techniques, Spectral Instance Reduction (SIR) and Direct Weight Optimization (DWO). CBP is a multi-stage method based on a simple heuristic that is very effective in identifying samples close to class borders. Using a noise filter harmful instances are removed, while the powerful heuristic determines the geometrical distribution of patterns around every instance. Together with the concepts of nearest enemy pairs and mean shift clustering this algorithm decides on the final set of retained prototypes. DWO is a selection model whose output set of prototypes is decided by a set of binary weights. These weights are computed according to an objective function composed of the ratio between the nearest friend and nearest enemy of every sample. In order to obtain good quality results DWO is optimized using a genetic algorithm. ISPA is an abstraction technique that employs the concept of data seriation to organize instances in an arrangement that favours merging between them. As a result, a new set of prototypes is created. Results show that CBP, SIR and DWO, the three major algorithms presented in this thesis, are competent and efficient in terms of at least one of the two basic objectives, classification accuracy and condensation ratio. The comparison against other successful condensation algorithms illustrates the competitiveness of the proposed models. The SIR algorithm presents a set of border discriminating features (BDFs) that depicts the local distribution of friends and enemies of all samples. These are then used along with spectral graph theory to partition the training set in to border and internal instances

    Network-based stochastic semisupervised learning

    No full text
    Semisupervised learning is a machine learning approach that is able to employ both labeled and unlabeled samples in the training process. In this paper, we propose a semisupervised data classification model based on a combined random-preferential walk of particles in a network (graph) constructed from the input dataset. The particles of the same class cooperate among themselves, while the particles of different classes compete with each other to propagate class labels to the whole network. A rigorous model definition is provided via a nonlinear stochastic dynamical system and a mathematical analysis of its behavior is carried out. A numerical validation presented in this paper confirms the theoretical predictions. An interesting feature brought by the competitive-cooperative mechanism is that the proposed model can achieve good classification rates while exhibiting low computational complexity order in comparison to other network-based semisupervised algorithms. Computer simulations conducted on synthetic and real-world datasets reveal the effectiveness of the model.Sao Paulo State Research Foundation (FAPESP)Brazilian National Research Council (CNPq

    Network-Based Stochastic Semisupervised Learning

    No full text
    corecore