156,427 research outputs found
Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
For supervised and unsupervised learning, positive definite kernels allow to
use large and potentially infinite dimensional feature spaces with a
computational cost that only depends on the number of observations. This is
usually done through the penalization of predictor functions by Euclidean or
Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing
norms such as the l1-norm or the block l1-norm. We assume that the kernel
decomposes into a large sum of individual basis kernels which can be embedded
in a directed acyclic graph; we show that it is then possible to perform kernel
selection through a hierarchical multiple kernel learning framework, in
polynomial time in the number of selected kernels. This framework is naturally
applied to non linear variable selection; our extensive simulations on
synthetic datasets and datasets from the UCI repository show that efficiently
exploring the large feature space through sparsity-inducing norms leads to
state-of-the-art predictive performance
Error estimates for DeepOnets: A deep learning framework in infinite dimensions
DeepOnets have recently been proposed as a framework for learning nonlinear
operators mapping between infinite dimensional Banach spaces. We analyze
DeepOnets and prove estimates on the resulting approximation and generalization
errors. In particular, we extend the universal approximation property of
DeepOnets to include measurable mappings in non-compact spaces. By a
decomposition of the error into encoding, approximation and reconstruction
errors, we prove both lower and upper bounds on the total error, relating it to
the spectral decay properties of the covariance operators, associated with the
underlying measures. We derive almost optimal error bounds with very general
affine reconstructors and with random sensor locations as well as bounds on the
generalization error, using covering number arguments. We illustrate our
general framework with four prototypical examples of nonlinear operators,
namely those arising in a nonlinear forced ODE, an elliptic PDE with variable
coefficients and nonlinear parabolic and hyperbolic PDEs. In all these
examples, we prove that DeepOnets break the curse of dimensionality, thus
demonstrating the efficient approximation of infinite-dimensional operators
with this machine learning framework
High-dimensional and Permutation Invariant Anomaly Detection
Methods for anomaly detection of new physics processes are often limited to
low-dimensional spaces due to the difficulty of learning high-dimensional
probability densities. Particularly at the constituent level, incorporating
desirable properties such as permutation invariance and variable-length inputs
becomes difficult within popular density estimation methods. In this work, we
introduce a permutation-invariant density estimator for particle physics data
based on diffusion models, specifically designed to handle variable-length
inputs. We demonstrate the efficacy of our methodology by utilizing the learned
density as a permutation-invariant anomaly detection score, effectively
identifying jets with low likelihood under the background-only hypothesis. To
validate our density estimation method, we investigate the ratio of learned
densities and compare to those obtained by a supervised classification
algorithm.Comment: 7 pages, 5 figure
Joint Probability Trees
We introduce Joint Probability Trees (JPT), a novel approach that makes
learning of and reasoning about joint probability distributions tractable for
practical applications. JPTs support both symbolic and subsymbolic variables in
a single hybrid model, and they do not rely on prior knowledge about variable
dependencies or families of distributions. JPT representations build on tree
structures that partition the problem space into relevant subregions that are
elicited from the training data instead of postulating a rigid dependency model
prior to learning. Learning and reasoning scale linearly in JPTs, and the tree
structure allows white-box reasoning about any posterior probability ,
such that interpretable explanations can be provided for any inference result.
Our experiments showcase the practical applicability of JPTs in
high-dimensional heterogeneous probability spaces with millions of training
samples, making it a promising alternative to classic probabilistic graphical
models
Quantum kernels with squeezed-state encoding for machine learning
Kernel methods are powerful for machine learning, as they can represent data
in feature spaces that similarities between samples may be faithfully captured.
Recently, it is realized that machine learning enhanced by quantum computing is
closely related to kernel methods, where the exponentially large Hilbert space
turns to be a feature space more expressive than classical ones. In this paper,
we generalize quantum kernel methods by encoding data into continuous-variable
quantum states, which can benefit from the infinite-dimensional Hilbert space
of continuous variables. Specially, we propose squeezed-state encoding, in
which data is encoded as either in the amplitude or the phase. The kernels can
be calculated on a quantum computer and then are combined with classical
machine learning, e.g. support vector machine, for training and predicting
tasks. Their comparisons with other classical kernels are also addressed.
Lastly, we discuss physical implementations of squeezed-state encoding for
machine learning in quantum platforms such as trapped ions.Comment: 5 pages, 4 figure
Consistency of the group Lasso and multiple kernel learning
We consider the least-square regression problem with regularization by a
block 1-norm, i.e., a sum of Euclidean norms over spaces of dimensions larger
than one. This problem, referred to as the group Lasso, extends the usual
regularization by the 1-norm where all spaces have dimension one, where it is
commonly referred to as the Lasso. In this paper, we study the asymptotic model
consistency of the group Lasso. We derive necessary and sufficient conditions
for the consistency of group Lasso under practical assumptions, such as model
misspecification. When the linear predictors and Euclidean norms are replaced
by functions and reproducing kernel Hilbert norms, the problem is usually
referred to as multiple kernel learning and is commonly used for learning from
heterogeneous data sources and for non linear variable selection. Using tools
from functional analysis, and in particular covariance operators, we extend the
consistency results to this infinite dimensional case and also propose an
adaptive scheme to obtain a consistent model estimate, even when the necessary
condition required for the non adaptive scheme is not satisfied
Manifold Relevance Determination
In this paper we present a fully Bayesian latent variable model which
exploits conditional nonlinear(in)-dependence structures to learn an efficient
latent representation. The latent space is factorized to represent shared and
private information from multiple views of the data. In contrast to previous
approaches, we introduce a relaxation to the discrete segmentation and allow
for a "softly" shared latent space. Further, Bayesian techniques allow us to
automatically estimate the dimensionality of the latent spaces. The model is
capable of capturing structure underlying extremely high dimensional spaces.
This is illustrated by modelling unprocessed images with tenths of thousands of
pixels. This also allows us to directly generate novel images from the trained
model by sampling from the discovered latent spaces. We also demonstrate the
model by prediction of human pose in an ambiguous setting. Our Bayesian
framework allows us to perform disambiguation in a principled manner by
including latent space priors which incorporate the dynamic nature of the data.Comment: ICML201
- …