1,475 research outputs found
Maximum Margin Multiclass Nearest Neighbors
We develop a general framework for margin-based multicategory classification
in metric spaces. The basic work-horse is a margin-regularized version of the
nearest-neighbor classifier. We prove generalization bounds that match the
state of the art in sample size and significantly improve the dependence on
the number of classes . Our point of departure is a nearly Bayes-optimal
finite-sample risk bound independent of . Although -free, this bound is
unregularized and non-adaptive, which motivates our main result: Rademacher and
scale-sensitive margin bounds with a logarithmic dependence on . As the best
previous risk estimates in this setting were of order , our bound is
exponentially sharper. From the algorithmic standpoint, in doubling metric
spaces our classifier may be trained on examples in time and
evaluated on new points in time
Le Cam meets LeCun: Deficiency and Generic Feature Learning
"Deep Learning" methods attempt to learn generic features in an unsupervised
fashion from a large unlabelled data set. These generic features should perform
as well as the best hand crafted features for any learning problem that makes
use of this data. We provide a definition of generic features, characterize
when it is possible to learn them and provide methods closely related to the
autoencoder and deep belief network of deep learning. In order to do so we use
the notion of deficiency and illustrate its value in studying certain general
learning problems.Comment: 25 pages, 2 figure
Federated Learning You May Communicate Less Often!
We investigate the generalization error of statistical learning models in a
Federated Learning (FL) setting. Specifically, we study the evolution of the
generalization error with the number of communication rounds between the
clients and the parameter server, i.e., the effect on the generalization error
of how often the local models as computed by the clients are aggregated at the
parameter server. We establish PAC-Bayes and rate-distortion theoretic bounds
on the generalization error that account explicitly for the effect of the
number of rounds, say , in addition to the number of
participating devices and individual datasets size . The bounds, which
apply in their generality for a large class of loss functions and learning
algorithms, appear to be the first of their kind for the FL setting.
Furthermore, we apply our bounds to FL-type Support Vector Machines (FSVM); and
we derive (more) explicit bounds on the generalization error in this case. In
particular, we show that the generalization error of FSVM increases with ,
suggesting that more frequent communication with the parameter server
diminishes the generalization power of such learning algorithms. Combined with
that the empirical risk generally decreases for larger values of , this
indicates that might be a parameter to optimize in order to minimize the
population risk of FL algorithms. Moreover, specialized to the case
(sometimes referred to as "one-shot" FL or distributed learning) our bounds
suggest that the generalization error of the FL setting decreases faster than
that of centralized learning by a factor of ,
thereby generalizing recent findings in this direction to arbitrary loss
functions and algorithms. The results of this paper are also validated on some
experiments
PAC-Bayesian Bounds on Rate-Efficient Classifiers
We derive analytic bounds on the noise invariance of majority vote classifiers operating on compressed inputs. Specifically, starting from recent
bounds on the true risk of majority vote classifiers,
we extend the applicability of PAC-Bayesian theory to quantify the resilience of majority votes to
input noise stemming from compression. The derived bounds are intuitive in binary classification
settings, where they can be measured as expressions of voter differentials and voter pair agreement. By combining measures of input distortion
with analytic guarantees on noise invariance, we
prescribe rate-efficient machines to compress inputs without affecting subsequent classification.
Our validation shows how bounding noise invariance can inform the compression stage for any
majority vote classifier such that worst-case implications of bad input reconstructions are known,
and inputs can be compressed to the minimum
amount of information needed prior to inference
Information Theory and Machine Learning
The recent successes of machine learning, especially regarding systems based on deep neural networks, have encouraged further research activities and raised a new set of challenges in understanding and designing complex machine learning algorithms. New applications require learning algorithms to be distributed, have transferable learning results, use computation resources efficiently, convergence quickly on online settings, have performance guarantees, satisfy fairness or privacy constraints, incorporate domain knowledge on model structures, etc. A new wave of developments in statistical learning theory and information theory has set out to address these challenges. This Special Issue, "Machine Learning and Information Theory", aims to collect recent results in this direction reflecting a diverse spectrum of visions and efforts to extend conventional theories and develop analysis tools for these complex machine learning systems
- β¦