113 research outputs found
Generalization bounds for averaged classifiers
We study a simple learning algorithm for binary classification. Instead of
predicting with the best hypothesis in the hypothesis class, that is, the
hypothesis that minimizes the training error, our algorithm predicts with a
weighted average of all hypotheses, weighted exponentially with respect to
their training error. We show that the prediction of this algorithm is much
more stable than the prediction of an algorithm that predicts with the best
hypothesis. By allowing the algorithm to abstain from predicting on some
examples, we show that the predictions it makes when it does not abstain are
very reliable. Finally, we show that the probability that the algorithm
abstains is comparable to the generalization error of the best hypothesis in
the class.Comment: Published by the Institute of Mathematical Statistics
(http://www.imstat.org) in the Annals of Statistics
(http://www.imstat.org/aos/) at http://dx.doi.org/10.1214/00905360400000005
It's Simplex! Disaggregating Measures to Improve Certified Robustness
Certified robustness circumvents the fragility of defences against
adversarial attacks, by endowing model predictions with guarantees of class
invariance for attacks up to a calculated size. While there is value in these
certifications, the techniques through which we assess their performance do not
present a proper accounting of their strengths and weaknesses, as their
analysis has eschewed consideration of performance over individual samples in
favour of aggregated measures. By considering the potential output space of
certified models, this work presents two distinct approaches to improve the
analysis of certification mechanisms, that allow for both dataset-independent
and dataset-dependent measures of certification performance. Embracing such a
perspective uncovers new certification approaches, which have the potential to
more than double the achievable radius of certification, relative to current
state-of-the-art. Empirical evaluation verifies that our new approach can
certify more samples at noise scale , with greater relative
improvements observed as the difficulty of the predictive task increases.Comment: IEEE S&P 2024, IEEE Security & Privacy 2024, 14 page
Random Feature Maps for Dot Product Kernels
Approximating non-linear kernels using feature maps has gained a lot of
interest in recent years due to applications in reducing training and testing
times of SVM classifiers and other kernel based learning algorithms. We extend
this line of work and present low distortion embeddings for dot product kernels
into linear Euclidean spaces. We base our results on a classical result in
harmonic analysis characterizing all dot product kernels and use it to define
randomized feature maps into explicit low dimensional Euclidean spaces in which
the native dot product provides an approximation to the dot product kernel with
high confidence.Comment: To appear in the proceedings of the 15th International Conference on
Artificial Intelligence and Statistics (AISTATS 2012). This version corrects
a minor error with Lemma 10. Acknowledgements : Devanshu Bhimwa
Improving Hoeffding Trees
Modern information technology allows information to be collected at a far greater rate than ever before. So fast, in fact, that the main problem is making sense of it all. Machine learning offers promise of a solution, but the field mainly focusses on achieving high accuracy when data supply is limited. While this has created sophisticated classification algorithms, many do not cope with increasing data set sizes. When the data set sizes get to a point where they could be considered to represent a continuous supply, or data stream, then incremental classification algorithms are required. In this setting, the effectiveness of an algorithm cannot simply be assessed by accuracy alone. Consideration needs to be given to the memory available to the algorithm and the speed at which data is processed in terms of both the time taken to predict the class of a new data sample and the time taken to include this sample in an incrementally updated classification model.
The Hoeffding tree algorithm is a state-of-the-art method for inducing decision trees from data streams. The aim of this thesis is to improve this algorithm. To measure improvement, a comprehensive framework for evaluating the performance of data stream algorithms is developed. Within the framework memory size is fixed in order to simulate realistic application scenarios. In order to simulate continuous operation, classes of synthetic data are generated providing an evaluation on a large scale. Improvements to many aspects of the Hoeffding tree algorithm are demonstrated. First, a number of methods for handling continuous numeric features are compared. Second, tree prediction strategy is investigated to evaluate the utility of various methods. Finally, the possibility of improving accuracy using ensemble methods is explored.
The experimental results provide meaningful comparisons of accuracy and processing speeds between different modifications of the Hoeffding tree algorithm under various memory limits. The study on numeric attributes demonstrates that sacrificing accuracy for space at the local level often results in improved global accuracy. The prediction strategy shown to perform best adaptively chooses between standard majority class and Naive Bayes prediction in the leaves. The ensemble method investigation shows that combining trees can be worthwhile, but only when sufficient memory is available, and improvement is less likely than in traditional machine learning. In particular, issues are encountered when applying the popular boosting method to streams
Resonant Anomaly Detection with Multiple Reference Datasets
An important class of techniques for resonant anomaly detection in high
energy physics builds models that can distinguish between reference and target
datasets, where only the latter has appreciable signal. Such techniques,
including Classification Without Labels (CWoLa) and Simulation Assisted
Likelihood-free Anomaly Detection (SALAD) rely on a single reference dataset.
They cannot take advantage of commonly-available multiple datasets and thus
cannot fully exploit available information. In this work, we propose
generalizations of CWoLa and SALAD for settings where multiple reference
datasets are available, building on weak supervision techniques. We demonstrate
improved performance in a number of settings with realistic and synthetic data.
As an added benefit, our generalizations enable us to provide finite-sample
guarantees, improving on existing asymptotic analyses
Learning discrete and Lipschitz representations
Learning to embed data into a low dimensional vector space that is more useful for some downstream task is one of the most common problems addressed in the representation learning literature. Conventional approaches to solving this problem typically rely on training neural networks using labelled training data. In order to construct an accurate embedding function that will generalise to data not seen during training, one must either gather a very large training dataset, or adequately bias the learning process. This thesis focuses on the task of incorporating new inductive biases into the representation learning paradigm by constraining the set of functions that a learned feature extractor can come from.
The first part of this thesis investigates how one can learn a mapping that changes slowly with respect to its input. This is first addressed by deriving the Lipschitz constant of common feed-forward neural network architectures, and subsequently demonstrating how this constant can be constrained during training. Following this, it is investigated how a similar goal can be accomplished when one assumes that the inputs of interest lie near a low dimensional manifold embedded in a high dimensional vector space. This results in an algorithm that takes advantage of an empirical analog to the Lipschitz constant. Experimental results show that these methods have favourable performance compared to other methods commonly used for imposing inductive biases on neural network learning algorithms.
In the second part of this thesis, methods for extracting representations using decision tree models are developed. The first method presented is a problem transformation approach that allows one to reuse existing tree induction techniques. The second approach shows how one can incrementally construct decision trees using gradient information as the source of supervision, allowing one to use an ensemble of decision trees as a layer in a neural network. The experimental results indicate that these approaches improve the performance of representation learning on tabular data across multiple tasks
Adaptive Algorithms For Classification On High-Frequency Data Streams: Application To Finance
Mención Internacional en el título de doctorIn recent years, the problem of concept drift has gained importance in the financial
domain. The succession of manias, panics and crashes have stressed the nonstationary
nature and the likelihood of drastic structural changes in financial markets.
The most recent literature suggests the use of conventional machine learning and statistical
approaches for this. However, these techniques are unable or slow to adapt
to non-stationarities and may require re-training over time, which is computationally
expensive and brings financial risks.
This thesis proposes a set of adaptive algorithms to deal with high-frequency data
streams and applies these to the financial domain. We present approaches to handle
different types of concept drifts and perform predictions using up-to-date models.
These mechanisms are designed to provide fast reaction times and are thus applicable
to high-frequency data. The core experiments of this thesis are based on the prediction
of the price movement direction at different intraday resolutions in the SPDR S&P 500
exchange-traded fund. The proposed algorithms are benchmarked against other popular
methods from the data stream mining literature and achieve competitive results.
We believe that this thesis opens good research prospects for financial forecasting
during market instability and structural breaks. Results have shown that our proposed
methods can improve prediction accuracy in many of these scenarios. Indeed, the
results obtained are compatible with ideas against the efficient market hypothesis.
However, we cannot claim that we can beat consistently buy and hold; therefore, we
cannot reject it.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: Gustavo Recio Isasi.- Secretario: Pedro Isasi Viñuela.- Vocal: Sandra García Rodrígue
- …