44 research outputs found
Learning from Distributions via Support Measure Machines
This paper presents a kernel-based discriminative learning framework on
probability measures. Rather than relying on large collections of vectorial
training examples, our framework learns using a collection of probability
distributions that have been constructed to meaningfully represent training
data. By representing these probability distributions as mean embeddings in the
reproducing kernel Hilbert space (RKHS), we are able to apply many standard
kernel-based learning techniques in straightforward fashion. To accomplish
this, we construct a generalization of the support vector machine (SVM) called
a support measure machine (SMM). Our analyses of SMMs provides several insights
into their relationship to traditional SVMs. Based on such insights, we propose
a flexible SVM (Flex-SVM) that places different kernel functions on each
training example. Experimental results on both synthetic and real-world data
demonstrate the effectiveness of our proposed framework.Comment: Advances in Neural Information Processing Systems 2
Discriminative models for multi-instance problems with tree-structure
Modeling network traffic is gaining importance in order to counter modern
threats of ever increasing sophistication. It is though surprisingly difficult
and costly to construct reliable classifiers on top of telemetry data due to
the variety and complexity of signals that no human can manage to interpret in
full. Obtaining training data with sufficiently large and variable body of
labels can thus be seen as prohibitive problem. The goal of this work is to
detect infected computers by observing their HTTP(S) traffic collected from
network sensors, which are typically proxy servers or network firewalls, while
relying on only minimal human input in model training phase. We propose a
discriminative model that makes decisions based on all computer's traffic
observed during predefined time window (5 minutes in our case). The model is
trained on collected traffic samples over equally sized time window per large
number of computers, where the only labels needed are human verdicts about the
computer as a whole (presumed infected vs. presumed clean). As part of training
the model itself recognizes discriminative patterns in traffic targeted to
individual servers and constructs the final high-level classifier on top of
them. We show the classifier to perform with very high precision, while the
learned traffic patterns can be interpreted as Indicators of Compromise. In the
following we implement the discriminative model as a neural network with
special structure reflecting two stacked multi-instance problems. The main
advantages of the proposed configuration include not only improved accuracy and
ability to learn from gross labels, but also automatic learning of server types
(together with their detectors) which are typically visited by infected
computers
On Classification with Bags, Groups and Sets
Many classification problems can be difficult to formulate directly in terms
of the traditional supervised setting, where both training and test samples are
individual feature vectors. There are cases in which samples are better
described by sets of feature vectors, that labels are only available for sets
rather than individual samples, or, if individual labels are available, that
these are not independent. To better deal with such problems, several
extensions of supervised learning have been proposed, where either training
and/or test objects are sets of feature vectors. However, having been proposed
rather independently of each other, their mutual similarities and differences
have hitherto not been mapped out. In this work, we provide an overview of such
learning scenarios, propose a taxonomy to illustrate the relationships between
them, and discuss directions for further research in these areas
Computing Functions of Random Variables via Reproducing Kernel Hilbert Space Representations
We describe a method to perform functional operations on probability
distributions of random variables. The method uses reproducing kernel Hilbert
space representations of probability distributions, and it is applicable to all
operations which can be applied to points drawn from the respective
distributions. We refer to our approach as {\em kernel probabilistic
programming}. We illustrate it on synthetic data, and show how it can be used
for nonparametric structural equation models, with an application to causal
inference
Testing and Learning on Distributions with Symmetric Noise Invariance
Kernel embeddings of distributions and the Maximum Mean Discrepancy (MMD),
the resulting distance between distributions, are useful tools for fully
nonparametric two-sample testing and learning on distributions. However, it is
rarely that all possible differences between samples are of interest --
discovered differences can be due to different types of measurement noise, data
collection artefacts or other irrelevant sources of variability. We propose
distances between distributions which encode invariance to additive symmetric
noise, aimed at testing whether the assumed true underlying processes differ.
Moreover, we construct invariant features of distributions, leading to learning
algorithms robust to the impairment of the input distributions with symmetric
additive noise.Comment: 22 page