44 research outputs found

    Learning from Distributions via Support Measure Machines

    Full text link
    This paper presents a kernel-based discriminative learning framework on probability measures. Rather than relying on large collections of vectorial training examples, our framework learns using a collection of probability distributions that have been constructed to meaningfully represent training data. By representing these probability distributions as mean embeddings in the reproducing kernel Hilbert space (RKHS), we are able to apply many standard kernel-based learning techniques in straightforward fashion. To accomplish this, we construct a generalization of the support vector machine (SVM) called a support measure machine (SMM). Our analyses of SMMs provides several insights into their relationship to traditional SVMs. Based on such insights, we propose a flexible SVM (Flex-SVM) that places different kernel functions on each training example. Experimental results on both synthetic and real-world data demonstrate the effectiveness of our proposed framework.Comment: Advances in Neural Information Processing Systems 2

    Discriminative models for multi-instance problems with tree-structure

    Full text link
    Modeling network traffic is gaining importance in order to counter modern threats of ever increasing sophistication. It is though surprisingly difficult and costly to construct reliable classifiers on top of telemetry data due to the variety and complexity of signals that no human can manage to interpret in full. Obtaining training data with sufficiently large and variable body of labels can thus be seen as prohibitive problem. The goal of this work is to detect infected computers by observing their HTTP(S) traffic collected from network sensors, which are typically proxy servers or network firewalls, while relying on only minimal human input in model training phase. We propose a discriminative model that makes decisions based on all computer's traffic observed during predefined time window (5 minutes in our case). The model is trained on collected traffic samples over equally sized time window per large number of computers, where the only labels needed are human verdicts about the computer as a whole (presumed infected vs. presumed clean). As part of training the model itself recognizes discriminative patterns in traffic targeted to individual servers and constructs the final high-level classifier on top of them. We show the classifier to perform with very high precision, while the learned traffic patterns can be interpreted as Indicators of Compromise. In the following we implement the discriminative model as a neural network with special structure reflecting two stacked multi-instance problems. The main advantages of the proposed configuration include not only improved accuracy and ability to learn from gross labels, but also automatic learning of server types (together with their detectors) which are typically visited by infected computers

    On Classification with Bags, Groups and Sets

    Full text link
    Many classification problems can be difficult to formulate directly in terms of the traditional supervised setting, where both training and test samples are individual feature vectors. There are cases in which samples are better described by sets of feature vectors, that labels are only available for sets rather than individual samples, or, if individual labels are available, that these are not independent. To better deal with such problems, several extensions of supervised learning have been proposed, where either training and/or test objects are sets of feature vectors. However, having been proposed rather independently of each other, their mutual similarities and differences have hitherto not been mapped out. In this work, we provide an overview of such learning scenarios, propose a taxonomy to illustrate the relationships between them, and discuss directions for further research in these areas

    Computing Functions of Random Variables via Reproducing Kernel Hilbert Space Representations

    Full text link
    We describe a method to perform functional operations on probability distributions of random variables. The method uses reproducing kernel Hilbert space representations of probability distributions, and it is applicable to all operations which can be applied to points drawn from the respective distributions. We refer to our approach as {\em kernel probabilistic programming}. We illustrate it on synthetic data, and show how it can be used for nonparametric structural equation models, with an application to causal inference

    Testing and Learning on Distributions with Symmetric Noise Invariance

    Full text link
    Kernel embeddings of distributions and the Maximum Mean Discrepancy (MMD), the resulting distance between distributions, are useful tools for fully nonparametric two-sample testing and learning on distributions. However, it is rarely that all possible differences between samples are of interest -- discovered differences can be due to different types of measurement noise, data collection artefacts or other irrelevant sources of variability. We propose distances between distributions which encode invariance to additive symmetric noise, aimed at testing whether the assumed true underlying processes differ. Moreover, we construct invariant features of distributions, leading to learning algorithms robust to the impairment of the input distributions with symmetric additive noise.Comment: 22 page
    corecore