158,232 research outputs found
High-performance Kernel Machines with Implicit Distributed Optimization and Randomization
In order to fully utilize "big data", it is often required to use "big
models". Such models tend to grow with the complexity and size of the training
data, and do not make strong parametric assumptions upfront on the nature of
the underlying statistical dependencies. Kernel methods fit this need well, as
they constitute a versatile and principled statistical methodology for solving
a wide range of non-parametric modelling problems. However, their high
computational costs (in storage and time) pose a significant barrier to their
widespread adoption in big data applications.
We propose an algorithmic framework and high-performance implementation for
massive-scale training of kernel-based statistical models, based on combining
two key technical ingredients: (i) distributed general purpose convex
optimization, and (ii) the use of randomization to improve the scalability of
kernel methods. Our approach is based on a block-splitting variant of the
Alternating Directions Method of Multipliers, carefully reconfigured to handle
very large random feature matrices, while exploiting hybrid parallelism
typically found in modern clusters of multicore machines. Our implementation
supports a variety of statistical learning tasks by enabling several loss
functions, regularization schemes, kernels, and layers of randomized
approximations for both dense and sparse datasets, in a highly extensible
framework. We evaluate the ability of our framework to learn models on data
from applications, and provide a comparison against existing sequential and
parallel libraries.Comment: Work presented at MMDS 2014 (June 2014) and JSM 201
On the Inversion of High Energy Proton
Inversion of the K-fold stochastic autoconvolution integral equation is an
elementary nonlinear problem, yet there are no de facto methods to solve it
with finite statistics. To fix this problem, we introduce a novel inverse
algorithm based on a combination of minimization of relative entropy, the Fast
Fourier Transform and a recursive version of Efron's bootstrap. This gives us
power to obtain new perspectives on non-perturbative high energy QCD, such as
probing the ab initio principles underlying the approximately negative binomial
distributions of observed charged particle final state multiplicities, related
to multiparton interactions, the fluctuating structure and profile of proton
and diffraction. As a proof-of-concept, we apply the algorithm to ALICE
proton-proton charged particle multiplicity measurements done at different
center-of-mass energies and fiducial pseudorapidity intervals at the LHC,
available on HEPData. A strong double peak structure emerges from the
inversion, barely visible without it.Comment: 29 pages, 10 figures, v2: extended analysis (re-projection ratios,
2D
- …