9 research outputs found

    Linear algorithms for online multitask classification

    Get PDF
    We design and analyze interacting online algorithms for multitask classification that perform better than independent learners whenever the tasks are related in a certain sense. We formalize task relatedness in different ways, and derive formal guarantees on the performance advantage provided by interaction. Our online analysis gives new stimulating insights into previously known co-regularization techniques, such as the multitask kernels and the margin correlation analysis for multiview learning. In the last part we apply our approach to spectral co-regularization: we introduce a natural matrix extension of the quasiadditive algorithm for classification and prove bounds depending on certain unitarily invariant norms of the matrix of task coefficients.

    Linear algorithms for online multitask classification

    No full text
    We introduce new Perceptron-based algorithms for the online multitask binary classification problem. Under suitable regularity conditions, our algorithms are shown to improve on their baselines by a factor proportional to the number of tasks. We achieve these improvements using various types of regularization that bias our algorithms towards specific notions of task relatedness. More specifically, similarity among tasks is either measured in terms of the geometric closeness of the task reference vectors or as a function of the dimension of their spanned subspace. In addition to adapting to the online setting a mix of known techniques, such as the multitask kernels of Evgeniou et al., our analysis also introduces a matrix-based multitask extension of the p-norm Perceptron, which is used to implement spectral co-regularization. Experiments on real-world data sets complement and support our theoretical findings

    Tracking the Best Hyperplane with a Simple Budget Perceptron

    Get PDF
    Shifting bounds for on-line classification algorithms ensure good performance on any sequence of examples that is well predicted by a sequence of changing classifiers. When proving shifting bounds for kernel-based classifiers, one also faces the problem of storing a number of support vectors that can grow unboundedly, unless an eviction policy is used to keep this number under control. In this paper, we show that shifting and on-line learning on a budget can be combined surprisingly well. First, we introduce and analyze a shifting Perceptron algorithm achieving the best known shifting bounds while using an unlimited budget. Second, we show that by applying to the Perceptron algorithm the simplest possible eviction policy, which discards a random support vector each time a new one comes in, we achieve a shifting bound close to the one we obtained with no budget restrictions. More importantly, we show that our randomized algorithm strikes the optimal trade-off U = Θ ( √ B) between budget B and norm U of the largest classifier in the comparison sequence. Experiments are presented comparing several linear-threshold algorithms on chronologically-ordered textual datasets. These experiments support our theoretical findings in that they show to what extent randomized budget algorithms are more robust than deterministic ones when learning shifting target data streams

    Learning noisy linear classifiers via adaptive and selective sampling

    No full text
    We introduce efficient margin-based algorithms for selective sampling and filtering in binary classification tasks. Experiments on real-world textual data reveal that our algorithms perform significantly better than popular and similarly efficient competitors. Using the so-called Mammen-Tsybakov low noise condition to parametrize the instance distribution, and assuming linear label noise, we show bounds on the convergence rate to the Bayes risk of a weaker adaptive variant of our selective sampler. Our analysis reveals that, excluding logarithmic factors, the average risk of this adaptive sampler converges to the Bayes risk at rate N 12(1+\u3b1)(2+\u3b1)/2(3+\u3b1) where N denotes the number of queried labels, and \u3b1>0 is the exponent in the low noise condition. For all 3 121073 this convergence rate is asymptotically faster than the rate N 12(1+\u3b1)/(2+\u3b1) achieved by the fully supervised version of the base selective sampler, which queries all labels. Moreover, for \u3b1\u2192 1e (hard margin condition) the gap between the semi- and fully-supervised rates becomes exponential
    corecore