518 research outputs found

    Kernel Method based on Non-Linear Coherent State

    Full text link
    In this paper, by mapping datasets to a set of non-linear coherent states, the process of encoding inputs in quantum states as a non-linear feature map is re-interpreted. As a result of this fact that the Radial Basis Function is recovered when data is mapped to a complex Hilbert state represented by coherent states, non-linear coherent states can be considered as natural generalisation of associated kernels. By considering the non-linear coherent states of a quantum oscillator with variable mass, we propose a kernel function based on generalized hypergeometric functions, as orthogonal polynomial functions. The suggested kernel is implemented with support vector machine on two well known datasets (make circles, and make moons) and outperforms the baselines, even in the presence of high noise. In addition, we study impact of geometrical properties of feature space, obtaining by non-linear coherent states, on the SVM classification task, by using considering the Fubini-Study metric of associated coherent states

    Construction of 'Support Vector' Machine Feature Spaces via Deformed Weyl-Heisenberg Algebra

    Full text link
    This paper uses deformed coherent states, based on a deformed Weyl-Heisenberg algebra that unifies the well-known SU(2), Weyl-Heisenberg, and SU(1,1) groups, through a common parameter. We show that deformed coherent states provide the theoretical foundation of a meta-kernel function, that is a kernel which in turn defines kernel functions. Kernel functions drive developments in the field of machine learning and the meta-kernel function presented in this paper opens new theoretical avenues for the definition and exploration of kernel functions. The meta-kernel function applies associated revolution surfaces as feature spaces identified with non-linear coherent states. An empirical investigation compares the deformed SU(2) and SU(1,1) kernels derived from the meta-kernel which shows performance similar to the Radial Basis kernel, and offers new insights (based on the deformed Weyl-Heisenberg algebra)

    Estimation of mutual information via quantum kernel method

    Full text link
    Recently, the importance of analysing data and collecting valuable insight efficiently has been increasing in various fields. Estimating mutual information (MI) plays a critical role to investigate the relationship among multiple random variables with a nonlinear correlation. Particularly, the task to determine whether they are independent or not is called the independence test, whose core subroutine is estimating MI from given data. It is a fundamental tool in statistics and data analysis that can be applied in a wide range of application such as hypothesis testing, causal discovery and more. In this paper, we propose a method for estimating mutual information using the quantum kernel. We investigate the performance under various problem settings, such as different sample size or the shape of the probability distribution. As a result, the quantum kernel method showed higher performance than the classical one under the situation that the number of samples is small, the variance is large or the variables posses highly non-linear relationships. We discuss this behavior in terms of the central limit theorem and the structure of the corresponding quantum reproducing kernel Hilbert space.Comment: 20 pages, 10 figure

    Exponentially Improved Efficient Machine Learning for Quantum Many-body States with Provable Guarantees

    Full text link
    Solving the ground state and the ground-state properties of quantum many-body systems is generically a hard task for classical algorithms. For a family of Hamiltonians defined on an mm-dimensional space of physical parameters, the ground state and its properties at an arbitrary parameter configuration can be predicted via a machine learning protocol up to a prescribed prediction error ε\varepsilon, provided that a sample set (of size NN) of the states can be efficiently prepared and measured. In a recent work [Huang et al., Science 377, eabk3333 (2022)], a rigorous guarantee for such an generalization was proved. Unfortunately, an exponential scaling, N=mO(1ε)N = m^{ {\cal{O}} \left(\frac{1}{\varepsilon} \right) }, was found to be universal for generic gapped Hamiltonians. This result applies to the situation where the dimension of the parameter space is large while the scaling with the accuracy is not an urgent factor, not entering the realm of more precise learning and prediction. In this work, we consider an alternative scenario, where mm is a finite, not necessarily large constant while the scaling with the prediction error becomes the central concern. By exploiting physical constraints and positive good kernels for predicting the density matrix, we rigorously obtain an exponentially improved sample complexity, N=poly(ε1,n,log1δ)N = \mathrm{poly} \left(\varepsilon^{-1}, n, \log \frac{1}{\delta}\right), where poly\mathrm{poly} denotes a polynomial function; nn is the number of qubits in the system, and (1δ1-\delta) is the probability of success. Moreover, if restricted to learning ground-state properties with strong locality assumptions, the number of samples can be further reduced to N=poly(ε1,lognδ)N = \mathrm{poly} \left(\varepsilon^{-1}, \log \frac{n}{\delta}\right). This provably rigorous result represents a significant improvement and an indispensable extension of the existing work.Comment: 8 + 10 pages, 1 + 1 figures; With supplemental materia

    Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

    Full text link
    Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.Comment: 232 page

    Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

    Full text link
    Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.Comment: 232 page

    Learning and Testing Powerful Hypotheses

    Get PDF
    Progress in science is driven through the formulation of hypotheses about phenomena of interest and by collecting evidence for their validity or refuting them. While some hypotheses are amenable to deductive proofs, other hypotheses can only be accessed in a data-driven manner. For most phenomena, scientists cannot control all degrees of freedom and hence data is often inherently stochastic. This stochasticity disallows to test hypotheses with absolute certainty. The field of statistical hypothesis testing formalizes the probabilistic assessment of hypotheses, enabling researchers to control the error rates, for example, at which they reject a true hypothesis, while aiming to reject false hypotheses as often as possible. But how do we come up with promising hypotheses, and how can we test them efficiently? Can we use machine learning systems to automatically generate promising hypotheses? This thesis studies different aspects of this question. A simple rule for statistical hypothesis testing states that one should not peek at the data when formulating a hypothesis. This is indeed true if done naively, that is, when the hypothesis is then simply tested with the data as if one had not looked at it yet. However, we show that in principle using the same data for learning the hypothesis and testing it is feasible if we can correct for the selection of the hypothesis. We treat this in the case of the two-sample problem. Given two samples, the hypothesis to be tested is whether the samples originate from the same distribution. We can reformulate this by testing whether the maximum mean discrepancy over a (unit ball of a) reproducing kernel Hilbert space is zero. We show that we can learn the kernel function, hence the exact test we use, and perform the test with the same data, while still correctly controlling the Type-I error rates. Likewise, we demonstrate experimentally that taking all data into account can lead to more powerful testing procedures than the data splitting approach. However, deriving the formulae that correct for the selection procedure requires strong assumptions, which are only valid for a specific, the linear-time, estimate of the maximum mean discrepancy. In more general settings it is difficult, if not impossible, to adjust for the selection. We thus also analyze the case where we split the data and use part of it to learn a test statistic. The maximum mean discrepancy implicitly optimizes a mean discrepancy over the unit ball of a reproducing kernel Hilbert space, and often the kernel itself is optimized on held-out data.We instead propose to optimize a witness function directly on held-out data and use its mean discrepancy as a test statistic. This allows us to directly maximize the test power, simplifies the theoretical treatment, and makes testing more efficient.We provide and implement algorithms to learn the test statistics. Furthermore, we show analytically that the optimization objective to learn powerful tests for the two-sample problem is closely related to the objectives used in standard supervised learning tasks, namely the least-square loss and cross-entropy loss. This allows us to indeed use existing machine learning tools when learning powerful hypotheses. Furthermore, since we use held-out data for learning the test statistic, we can use any kind of model-selection and cross-validation techniques to maximize the performance. To facilitate this for practitioners, we provide an open-source Python package ’autotst’ implementing an interface to existing libraries and running the whole testing pipeline, including the learning of the hypothesis. Our presented methods reach state-of-the-art performance on two-sample testing tasks. We also show how to trade off the computational resources required for the test by sacrificing some statistical power, which can be important in practice. Furthermore, our test easily allows interpreting the results. Having more computational power potentially allows extracting more information from data and thus obtain more significant results. Hence, investigating whether quantum computers can help in machine learning tasks has gained popularity over the past years. We investigate this in light of the two-sample problem. We define the quantum mean embedding, mapping probability distributions onto quantum states, and analyze when this mapping is injective. While this is conceptually interesting on its own, we do not find a straight-forward way of harnessing any speed-up. The main problem here is that there is no known way to efficiently create the quantum mean embedding. On the contrary, fundamental results in quantum information theory show that this might generally be hard to do. For two-sample testing, the usage of reproducing kernel Hilbert spaces has been established for many years and proven important both theoretically and practically. In this case, we thus focused on practically relevant aspects to make the tests as powerful and easy to use as possible. For other hypothesis testing tasks, the usage of advanced machine learning tools still lags far behind. For specification tests based on conditional moment restrictions, popular in econometrics, we do the first steps by defining a consistent test based on kernel methods. Our test already has promising performance, but optimizing it, potentially with the other insights gained in this thesis, is an open task
    corecore