262 research outputs found
A New Look at an Old Problem: A Universal Learning Approach to Linear Regression
Linear regression is a classical paradigm in statistics. A new look at it is
provided via the lens of universal learning. In applying universal learning to
linear regression the hypotheses class represents the label as
a linear combination of the feature vector where ,
within a Gaussian error. The Predictive Normalized Maximum Likelihood (pNML)
solution for universal learning of individual data can be expressed
analytically in this case, as well as its associated learnability measure.
Interestingly, the situation where the number of parameters may even be
larger than the number of training samples can be examined. As expected, in
this case learnability cannot be attained in every situation; nevertheless, if
the test vector resides mostly in a subspace spanned by the eigenvectors
associated with the large eigenvalues of the empirical correlation matrix of
the training data, linear regression can generalize despite the fact that it
uses an ``over-parametrized'' model. We demonstrate the results with a
simulation of fitting a polynomial to data with a possibly large polynomial
degree
Clustering-based Source-aware Assessment of True Robustness for Learning Models
We introduce a novel validation framework to measure the true robustness of
learning models for real-world applications by creating source-inclusive and
source-exclusive partitions in a dataset via clustering. We develop a
robustness metric derived from source-aware lower and upper bounds of model
accuracy even when data source labels are not readily available. We clearly
demonstrate that even on a well-explored dataset like MNIST, challenging
training scenarios can be constructed under the proposed assessment framework
for two separate yet equally important applications: i) more rigorous learning
model comparison and ii) dataset adequacy evaluation. In addition, our findings
not only promise a more complete identification of trade-offs between model
complexity, accuracy and robustness but can also help researchers optimize
their efforts in data collection by identifying the less robust and more
challenging class labels.Comment: Submitted to UAI 201
Multiclass latent locally linear support vector machines
Kernelized Support Vector Machines (SVM) have gained the status of off-the-shelf classifiers, able to deliver state of the art performance on almost any problem. Still, their practical use is constrained by their computational and memory complexity, which grows super-linearly with the number of training samples. In order to retain the low training and testing complexity of linear classifiers and the exibility of non linear ones, a growing, promising alternative is represented by methods that learn non-linear classifiers through local combinations of linear ones. In this paper we propose a new multi class local classifier, based on a latent SVM formulation. The proposed classifier makes use of a set of linear models that are linearly combined using sample and class specific weights. Thanks to the latent formulation, the combination coefficients are modeled as latent variables. We allow soft combinations and we provide a closed-form solution for their estimation, resulting in an efficient prediction rule. This novel formulation allows to learn in a principled way the sample specific weights and the linear classifiers, in a unique optimization problem, using a CCCP optimization procedure. Extensive experiments on ten standard UCI machine learning datasets, one large binary dataset, three character and digit recognition databases, and a visual place categorization dataset show the power of the proposed approach
Connections Between Adaptive Control and Optimization in Machine Learning
This paper demonstrates many immediate connections between adaptive control
and optimization methods commonly employed in machine learning. Starting from
common output error formulations, similarities in update law modifications are
examined. Concepts in stability, performance, and learning, common to both
fields are then discussed. Building on the similarities in update laws and
common concepts, new intersections and opportunities for improved algorithm
analysis are provided. In particular, a specific problem related to higher
order learning is solved through insights obtained from these intersections.Comment: 18 page
Reliable credence and the foundations of statistics
If the goal of statistical analysis is to form justified credences based on data, then an account
of the foundations of statistics should explain what makes credences justified. I present a
new account called statistical reliabilism (SR), on which credences resulting from a statistical
analysis are justified (relative to alternatives) when they are in a sense closest, on average, to
the corresponding objective probabilities. This places (SR) in the same vein as recent work on
the reliabilist justification of credences generally [Dunn, 2015, Tang, 2016, Pettigrew, 2018],
but it has the advantage of being action-guiding in that knowledge of objective probabilities
is not required to identify the best-justified available credences. The price is that justification
is relativized to a specific class of candidate objective probabilities, and to a particular choice
of reliability measure. On the other hand, I show that (SR) has welcome implications for
frequentist-Bayesian reconciliation, including a clarification of the use of priors; complemen-
tarity between probabilist and fallibilist [Gelman and Shalizi, 2013, Mayo, 2018] approaches
towards statistical foundations; and the justification of credences outside of formal statistical
settings. Regarding the latter, I demonstrate how the insights of statistics may be used to
amend other reliabilist accounts so as to render them action-guiding. I close by discussing
new possible research directions for epistemologists and statisticians (and other applied users
of probability) raised by the (SR) framework
Learning from networked examples
Many machine learning algorithms are based on the assumption that training
examples are drawn independently. However, this assumption does not hold
anymore when learning from a networked sample because two or more training
examples may share some common objects, and hence share the features of these
shared objects. We show that the classic approach of ignoring this problem
potentially can have a harmful effect on the accuracy of statistics, and then
consider alternatives. One of these is to only use independent examples,
discarding other information. However, this is clearly suboptimal. We analyze
sample error bounds in this networked setting, providing significantly improved
results. An important component of our approach is formed by efficient sample
weighting schemes, which leads to novel concentration inequalities
- …