13 research outputs found

    Single-Pass Distributed Learning of Multi-Class SVMs using Core-Sets

    No full text
    We explore a technique to learn Support Vector Models (SVMs) when training data is partitioned among several data sources. The basic idea is to consider SVMs which can be reduced to Minimal Enclosing Ball (MEB) problems in an feature space. Computation of such SVMs can be efficiently achieved by finding a coreset for the image of the data in the feature space. Our main result is that the union of local core-sets provides a close approximation to a global core-setfrom which the SVM can be recovered. The method requires hence a single pass through each source of data in order to compute local core-sets and then to recover the SVM from its union. Extensive simulations in small and large datasets are presented in order to evaluate its classification accuracy, transmission efficiency and global complexity, comparing its results with a widely used single-pass heuristic to learn standard SVMs

    Two One-Pass Algorithms for Data Stream Classification Using Approximate MEBs

    No full text
    It has been recently shown that the quadratic programming formulation underlying a number of kernel methods can be treated as a minimal enclosing ball (MEB) problem in a feature space where data has been previously embedded. Core Vector Machines (CVMs) in particular, make use of this equivalence in order to compute Support Vector Machines (SVMs) from very large datasets in the batch scenario. In this paper we study two algorithms for online classification which extend this family of algorithms to deal with large data streams. Both algorithms use analytical rules to adjust the model extracted from the stream instead of recomputing the entire solution on the augmented dataset. We show that these algorithms are more accurate than the current extension of CVMs to handle data streams using an analytical rule instead of solving large quadratic programs. Experiments also show that the online approaches are considerably more efficient than periodic computation of CVMs even though warm start is being used

    Two bagging algorithms with coupled learners to encourage diversity

    No full text
    Abstract. In this paper, we present two ensemble learning algorithms which make use of boostrapping and out-of-bag estimation in an attempt to inherit the robustness of bagging to overfitting. As against bagging, with these algorithms learners have visibility on the other learners and cooperate to get diversity, a characteristic that has proved to be an issue of major concern to ensemble models. Experiments are provided using two regression problems obtained from UCI

    Local Negative Correlation with Resampling ⋆

    No full text
    Abstract. This paper deals with a learning algorithm which combines two well known methods to generate ensemble diversity- error negative correlation and resampling. In this algorithm, a set of learners iteratively and synchronously improve their state considering information about the performance of a fixed number of other learners in the ensemble, to generate a sort of local negative correlation. Resampling allows the base algorithm controls the impact of highly influential data points which in turns can improve its generalization error. The resulting algorithm can be viewed as a generalization of bagging, where each learner no longer is independent but can be locally coupled with other learners. We will demonstrate our technique on two real data sets using neural networks ensembles.

    Self-supervised Bernoulli Autoencoders for Semi-supervised Hashing

    No full text
    none5siSemantichashingisatechniquetorepresenthigh-dimensional data using similarity-preserving binary codes for efficient indexing and search. Recently, variational autoencoders with Bernoulli latent represen- tations achieved remarkable success in learning such codes in supervised and unsupervised scenarios, outperforming traditional methods thanks to their ability to handle the binary constraints architecturally. In this paper, we propose a novel method for supervision (self- supervised) of variational autoencoders where the model uses its own predictions of the label distribution to implement the pairwise objective function. Also, we investigate the robustness of hashing methods based on variational autoencoders to the lack of supervision, focusing on two semi-supervised approaches currently in use. Our experiments on text and image retrieval tasks show that, as expected, both methods can signifi- cantly increase the quality of the hash codes as the number of labelled observations increases, but deteriorates when the amount of labelled sam- ples decreases. In this scenario, the proposed self-supervised approach out- performs the classical approaches and yields similar performance in fully- supervised settings.mixedRicardo Ă‘anculef, Francisco Alejandro Mena, Antonio Macaluso, Stefano Lodi, Claudio SartoriRicardo Ă‘anculef, Francisco Alejandro Mena, Antonio Macaluso, Stefano Lodi, Claudio Sartor

    A New Algorithm for Training SVMs using Approximate Minimal Enclosing Balls

    No full text
    It has been shown that many kernel methods can be equivalently formulated as minimal enclosing ball (MEB) problems in a certain feature space. Exploiting this reduction, efficient algorithms to scale up Support Vector Machines (SVMs) and other kernel methods have been introduced under the name of Core Vector Machines (CVMs). In this paper, we study a new algorithm to train SVMs based on an instance of the Frank-Wolfe optimization method recently proposed to approximate the solution of the MEB problem. We show that, specialized to SVM training, this algorithm can scale better than CVMs at the price of a slightly lower accuracy
    corecore