12 research outputs found

    Screening Sinkhorn Algorithm for Regularized Optimal Transport

    Get PDF
    International audienceWe introduce in this paper a novel strategy for efficiently approximating the Sinkhorn distance between two discrete measures. After identifying neglectable components of the dual solution of the regularized Sinkhorn problem, we propose to screen those components by directly setting them at that value before entering the Sinkhorn problem. This allows us to solve a smaller Sinkhorn problem while ensuring approximation with provable guarantees. More formally, the approach is based on a new formulation of dual of Sinkhorn divergence problem and on the KKT optimality conditions of this problem, which enable identification of dual components to be screened. This new analysis leads to the Screenkhorn algorithm. We illustrate the efficiency of Screenkhorn on complex tasks such as dimensionality reduction and domain adaptation involving regularized optimal transport

    A New Family of Dual-norm regularized pp-Wasserstein Metrics

    Full text link
    We develop a novel family of metrics over measures, using pp-Wasserstein style optimal transport (OT) formulation with dual-norm based regularized marginal constraints. Our study is motivated by the observation that existing works have only explored ϕ\phi-divergence regularized Wasserstein metrics like the Generalized Wasserstein metrics or the Gaussian-Hellinger-Kantorovich metrics. It is an open question if Wasserstein style metrics can be defined using regularizers that are not ϕ\phi-divergence based. Our work provides an affirmative answer by proving that the proposed formulation, under mild conditions, indeed induces valid metrics for any dual norm. The proposed regularized metrics seem to achieve the best of both worlds by inheriting useful properties from the parent metrics, viz., the pp-Wasserstein and the dual-norm involved. For example, when the dual norm is Maximum Mean Discrepancy (MMD), we prove that the proposed regularized metrics inherit the dimension-free sample complexity from the MMD regularizer; while preserving/enhancing other useful properties of the pp-Wasserstein metric. Further, when p=1p=1, we derive a Fenchel dual, which enables proving that the proposed metrics actually induce novel norms over measures. Also, in this case, we show that the mixture geodesic, which is a common geodesic for the parent metrics, remains a geodesic. We empirically study various properties of the proposed metrics and show their utility in diverse applications

    Efficient online learning with kernels for adversarial large scale problems

    Get PDF
    We are interested in a framework of online learning with kernels for low-dimensional but large-scale and potentially adversarial datasets. We study the computational and theoretical performance of online variations of kernel Ridge regression. Despite its simplicity, the algorithm we study is the first to achieve the optimal regret for a wide range of kernels with a per-round complexity of order nαn^\alpha with α<2\alpha < 2. The algorithm we consider is based on approximating the kernel with the linear span of basis functions. Our contributions is two-fold: 1) For the Gaussian kernel, we propose to build the basis beforehand (independently of the data) through Taylor expansion. For dd-dimensional inputs, we provide a (close to) optimal regret of order O((logn)d+1)O((\log n)^{d+1}) with per-round time complexity and space complexity O((logn)2d)O((\log n)^{2d}). This makes the algorithm a suitable choice as soon as nedn \gg e^d which is likely to happen in a scenario with small dimensional and large-scale dataset; 2) For general kernels with low effective dimension, the basis functions are updated sequentially in a data-adaptive fashion by sampling Nyström points. In this case, our algorithm improves the computational trade-off known for online kernel regression

    Machine learning applications for noisy intermediate-scale quantum computers

    Get PDF
    Quantum machine learning (QML) has proven to be a fruitful area in which to search for applications of quantum computers. This is particularly true for those available in the near term, so called noisy intermediate-scale quantum (NISQ) devices. In this thesis, we develop and study QML algorithms in three application areas. We focus our attention towards heuristic algorithms of a variational (meaning hybrid quantum-classical) nature, using parameterised quantum circuits as the underlying quantum machine learning model. The variational nature of these models makes them especially suited for NISQ computers. We order these applications in terms of the increasing complexity of the data presented to them. Firstly, we study a variational quantum classifier in supervised machine learning, and focus on how (classical) data, feature vectors, may be encoded in such models in a way that is robust to the inherent noise on NISQ computers. We provide a framework for studying the robustness of these classification models, prove theoretical results relative to some common noise channels, and demonstrate extensive numerical results reinforcing these findings. Secondly, we move to a variational generative model called the Born machine, where the data becomes a (classical or quantum) probability distribution. Now, the problem falls into the category of unsupervised machine learning. Here, we develop new training methods for the Born machine which outperform the previous state of the art, discuss the possibility of quantum advantage in generative modelling, and perform a systematic comparison of the Born machine relative to a classical competitor, the restricted Boltzmann machine. We also demonstrate the largest scale implementation (28 qubits) of such a model on real quantum hardware to date, using the Rigetti superconducting platform. Finally, for our third QML application, the data becomes purely quantum in nature. We focus on the problem of approximately cloning quantum states, an important primitive in the foundations of quantum mechanics. For this, we develop a variational quantum algorithm which can learn to clone such states, and show how this algorithm can be used to improve quantum cloning fidelities on NISQ hardware. Interestingly, this application can be viewed as either supervised or unsupervised in nature. Furthermore, we demonstrate how this can algorithm can be used to discover novel implementable attacks on quantum cryptographic protocols, focusing on quantum coin flipping and key distribution as examples. For the algorithm, we derive differentiable cost functions, prove theoretical guarantees such as faithfulness, and incorporate state of the art methods such as quantum architecture search

    Understanding Data Manipulation and How to Leverage it To Improve Generalization

    Get PDF
    Augmentations and other transformations of data, either in the input or latent space, are a critical component of modern machine learning systems. While these techniques are widely used in practice and known to provide improved generalization in many cases, it is still unclear how data manipulation impacts learning and generalization. To take a step toward addressing the problem, this thesis focuses on understanding and leveraging data augmentation and alignment for improving machine learning performance and transfer. In the first part of the thesis, we establish a novel theoretical framework to understand how data augmentation (DA) impacts learning in linear regression and classification tasks. The results demonstrate how the augmented transformed data spectrum plays a key role in characterizing the behavior of different augmentation strategies, especially in the overparameterized regime. The tools developed in this aim provide simple guidelines to build new augmentation strategies and a simple framework for comparing the generalization of different types of DA. In the second part of the thesis, we demonstrate how latent data alignment can be used to tackle the domain transfer problem, where training and testing datasets vary in distribution. Our algorithm builds upon joint clustering and data-matching through optimal transport, and outperforms the pure matching algorithm baselines in both synthetic and real datasets. Extension of the generalization analysis and algorithm design for data augmentation and alignment for nonlinear models such as artificial neural networks and random feature models are discussed. This thesis provides tools and analyses for better data manipulation design, which benefit both supervised and unsupervised learning schemes.Ph.D
    corecore