307,817 research outputs found
Recommended from our members
Provably efficient methods for large-scale learning
The scale of machine learning problems grows rapidly in recent years and calls for efficient methods. In this dissertation, we propose simple and efficient methods for various large-scale learning problems. We start with a standard supervised learning problem of solving quadratic regression. In Chapter 2, we show that by utilizing the quadratic structure and a novel gradient estimation algorithm, we can solve sparse quadratic regression with sub-quadratic time complexity and near-optimal sample complexity. We then move to online learning problems. In Chapter 3, we identify a weak assumption and theoretically prove that the standard UCB algorithm efficiently learns from inconsistent human preferences with nearly optimal regret; in Chapter 4 we propose an approximate maximum inner product search data structure for adaptive queries and present two efficient algorithms that achieve sublinear time complexity for linear bandits, which is especially desirable for extremely large and slowly changing action sets. In Chapter 5, we study how to efficiently use privileged features with deep learning models. We present an efficient learning algorithm to exploit privileged features that are not available during testing time. We conduct comprehensive empirical evaluations and present rigorous analysis for linear models to build theoretical insights. It provides a general algorithmic paradigm that can be integrated with many other machine learning methods.Computer Science
Multiclass latent locally linear support vector machines
Kernelized Support Vector Machines (SVM) have gained the status of off-the-shelf classifiers, able to deliver state of the art performance on almost any problem. Still, their practical use is constrained by their computational and memory complexity, which grows super-linearly with the number of training samples. In order to retain the low training and testing complexity of linear classifiers and the exibility of non linear ones, a growing, promising alternative is represented by methods that learn non-linear classifiers through local combinations of linear ones. In this paper we propose a new multi class local classifier, based on a latent SVM formulation. The proposed classifier makes use of a set of linear models that are linearly combined using sample and class specific weights. Thanks to the latent formulation, the combination coefficients are modeled as latent variables. We allow soft combinations and we provide a closed-form solution for their estimation, resulting in an efficient prediction rule. This novel formulation allows to learn in a principled way the sample specific weights and the linear classifiers, in a unique optimization problem, using a CCCP optimization procedure. Extensive experiments on ten standard UCI machine learning datasets, one large binary dataset, three character and digit recognition databases, and a visual place categorization dataset show the power of the proposed approach
Discriminative models for multi-instance problems with tree-structure
Modeling network traffic is gaining importance in order to counter modern
threats of ever increasing sophistication. It is though surprisingly difficult
and costly to construct reliable classifiers on top of telemetry data due to
the variety and complexity of signals that no human can manage to interpret in
full. Obtaining training data with sufficiently large and variable body of
labels can thus be seen as prohibitive problem. The goal of this work is to
detect infected computers by observing their HTTP(S) traffic collected from
network sensors, which are typically proxy servers or network firewalls, while
relying on only minimal human input in model training phase. We propose a
discriminative model that makes decisions based on all computer's traffic
observed during predefined time window (5 minutes in our case). The model is
trained on collected traffic samples over equally sized time window per large
number of computers, where the only labels needed are human verdicts about the
computer as a whole (presumed infected vs. presumed clean). As part of training
the model itself recognizes discriminative patterns in traffic targeted to
individual servers and constructs the final high-level classifier on top of
them. We show the classifier to perform with very high precision, while the
learned traffic patterns can be interpreted as Indicators of Compromise. In the
following we implement the discriminative model as a neural network with
special structure reflecting two stacked multi-instance problems. The main
advantages of the proposed configuration include not only improved accuracy and
ability to learn from gross labels, but also automatic learning of server types
(together with their detectors) which are typically visited by infected
computers
- …