55 research outputs found

    A Mutually-Dependent Hadamard Kernel for Modelling Latent Variable Couplings

    Full text link
    We introduce a novel kernel that models input-dependent couplings across multiple latent processes. The pairwise joint kernel measures covariance along inputs and across different latent signals in a mutually-dependent fashion. A latent correlation Gaussian process (LCGP) model combines these non-stationary latent components into multiple outputs by an input-dependent mixing matrix. Probit classification and support for multiple observation sets are derived by Variational Bayesian inference. Results on several datasets indicate that the LCGP model can recover the correlations between latent signals while simultaneously achieving state-of-the-art performance. We highlight the latent covariances with an EEG classification dataset where latent brain processes and their couplings simultaneously emerge from the model.Comment: 17 pages, 6 figures; accepted to ACML 201

    Large Scale Tensor Regression using Kernels and Variational Inference

    Full text link
    We outline an inherent weakness of tensor factorization models when latent factors are expressed as a function of side information and propose a novel method to mitigate this weakness. We coin our method \textit{Kernel Fried Tensor}(KFT) and present it as a large scale forecasting tool for high dimensional data. Our results show superior performance against \textit{LightGBM} and \textit{Field Aware Factorization Machines}(FFM), two algorithms with proven track records widely used in industrial forecasting. We also develop a variational inference framework for KFT and associate our forecasts with calibrated uncertainty estimates on three large scale datasets. Furthermore, KFT is empirically shown to be robust against uninformative side information in terms of constants and Gaussian noise

    Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

    Full text link
    Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.Comment: 232 page

    Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

    Full text link
    Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.Comment: 232 page

    Quantum information outside quantum information

    Get PDF
    Quantum theory, as counter-intuitive as a theory can get, has turned out to make predictions of the physical world that match observations so precisely that it has been described as the most accurate physical theory ever devised. Viewing quantum entanglement, superposition and interference not as undesirable necessities but as interesting resources paved the way to the development of quantum information science. This area studies the processing, transmission and storage of information when one accounts that information is physical and subjected to the laws of nature that govern the systems it is encoded in. The development of the consequences of this idea, along with the great advances experienced in the control of individual quantum systems, has led to what is now known as the second quantum revolution, in which quantum information science has emerged as a fully-grown field. As such, ideas and tools developed within the framework of quantum information theory begin to permeate to other fields of research. This Ph.D. dissertation is devoted to the use of concepts and methods akin to the field of quantum information science in other areas of research. In the same way, it also considers how encoding information in quantum degrees of freedom may allow further development of well-established research fields and industries. This is, this thesis aims to the study of quantum information outside the field of quantum information. Four different areas are visited. A first question posed is that of the role of quantum information in quantum field theory, with a focus in the quantum vacuum. It is known that the quantum vacuum contains entanglement, but it remains unknown whether it can be accessed and exploited in experiments. We give crucial steps in this direction by studying the extraction of vacuum entanglement in realistic models of light-matter interaction, and by giving strict mathematical conditions of general applicability that must be fulfilled for extraction to be possible at all. Another field where quantum information methods can offer great insight is in that of quantum thermodynamics, where the idealizations made in macroscopic thermodynamics break down. Making use of a quintessential framework of quantum information and quantum optics, we study the cyclic operation of a microscopic heat engine composed by a single particle reciprocating between two finite-size baths, focusing on the consequences of the removal of the macroscopic idealizations. One more step down the stairs to applications in society, we analyze the impact that encoding information in quantum systems and processing it in quantum computers may have in the field of machine learning. A great desideratum in this area, largely obstructed by computational power, is that of explainable models which not only make predictions but also provide information about the decision process that triggers them. We develop an algorithm to train neural networks using explainable techniques that exploits entanglement and superposition to execute efficiently in quantum computers, in contrast with classical counterparts. Furthermore, we run it in state-of-the-art quantum computers with the aim of assessing the viability of realistic implementations. Lastly, and encompassing all the above, we explore the notion of causality in quantum mechanics from an information-theoretic point of view. While it is known since the work of John S. Bell in 1964 that, for a same causal pattern, quantum systems can generate correlations between variables that are impossible to obtain employing only classical systems, there is an important lack of tools to study complex causal effects whenever a quantum behavior is expected. We fill this gap by providing general methods for the characterization of the quantum correlations achievable in complex causal patterns. Closing the circle, we make use of these tools to find phenomena of fundamental and experimental relevance back in quantum information.La teoría cuántica, la más extraña y antiintuitiva de las teorías físicas, es también considerada como la teoría más precisa jamás desarrollada. La interpretación del entrelazamiento, la superposición y la interferencia como interesantes recursos aprovechables cimentó el desarrollo de la teoría cuántica de la información (QIT), que estudia el procesado, transmisión y almacenamiento de información teniendo en cuenta que ésta es física, en tanto a que está sujeta a las leyes de la naturaleza que gobiernan los sistemas en que se codifica. El desarrollo de esta idea, en conjunción con los recientes avances en el control de sistemas cuánticos individuales, ha dado lugar a la conocida como segunda revolución cuántica, en la cual la QIT ha emergido como un área de estudio con denominación propia. A consecuencia de su desarrollo actual, ideas y herramientas creadas en su seno comienzan a permear a otros ámbitos de investigación. Esta tesis doctoral está dedicada a la utilización de conceptos y métodos originales del campo de información cuántica en otras áreas. También considera cómo la codificación de información en grados de libertad cuánticos puede afectar el futuro desarrollo de áreas de investigación e industrias bien establecidas. Es decir, esta tesis tiene como objetivo el estudio de la información cuántica fuera de la información cuántica, haciendo hincapié en cuatro ámbitos diferentes. Una primera cuestión propuesta es la del papel de la información cuántica en la teoría cuántica de campos, con especial énfasis en el vacío cuántico. Es conocido que el vacío cuántico contiene entrelazamiento, pero aún se desconoce éste es accesible para su uso en realizaciones experimentales. En esta tesis se dan pasos cruciales en esta dirección mediante el estudio de la extracción de entrelazamiento en modelos realistas de la interacción materia-radiación, y dando condiciones matemáticas estrictas que deben ser satisfechas para que dicha extracción sea posible. Otro campo en el cual métodos propios de QIT pueden ofrecer nuevos puntos de vista es en termodinámica cuántica. A través del uso de un marco de trabajo ampliamente utilizado en información y óptica cuánticas, estudiamos la operación cíclica de un motor térmico microscópico que alterna entre dos baños térmicos de tamaño finito, prestando especial atención a las consecuencias de la eliminación de las idealizaciones macroscópicas utilizadas en termodinámica macroscópica. Acercándonos a aplicaciones industriales, analizamos el potencial impacto de codificar y procesar información en sistemas cuánticos en el ámbito del aprendizaje automático. Un fin codiciado en esta área, inaccesible debido a su coste computacional, es el de modelos explicativos que realicen predicciones, y además ofrezcan información acerca del proceso de decisión que las genera. Presentamos un algoritmo de entrenamiento de redes neuronales con técnicas explicativas que hace uso del entrelazamiento y la superposición para tener una ejecución eficiente en ordenadores cuánticos, en comparación con homólogos clásicos. Además, ejecutamos el algoritmo en ordenadores cuánticos contemporáneos con el objetivo de evaluar la viabilidad de implementaciones realistas. Finalmente, y englobando todo lo anterior, exploramos la noción de causalidad en mecánica cuántica desde el punto de vista de la teoría de la información. A pesar de que es conocido que para un mismo patrón causal existen sistemas cuánticos que dan lugar a correlaciones imposibles de generar por mediación de sistemas clásicos, existe una notable falta de herramientas para estudiar efectos causales cuánticos complejos. Cubrimos esta falta mediante métodos generales para la caracterización de las correlaciones cuánticas que pueden ser generadas en estructuras causales complejas. Cerrando el círculo, usamos estas herramientas para encontrar fenómenos de relevancia fundamental y experimental en la información cuántic

    Statistical signal processing of nonstationary tensor-valued data

    Get PDF
    Real-world signals, such as the evolution of three-dimensional vector fields over time, can exhibit highly structured probabilistic interactions across their multiple constitutive dimensions. This calls for analysis tools capable of directly capturing the inherent multi-way couplings present in such data. Yet, current analyses typically employ multivariate matrix models and their associated linear algebras which are agnostic to the global data structure and can only describe local linear pairwise relationships between data entries. To address this issue, this thesis uses the property of linear separability -- a notion intrinsic to multi-dimensional data structures called tensors -- as a linchpin to consider the probabilistic, statistical and spectral separability under one umbrella. This helps to both enhance physical meaning in the analysis and reduce the dimensionality of tensor-valued problems. We first introduce a new identifiable probability distribution which appropriately models the interactions between random tensors, whereby linear relationships are considered between tensor fibres as opposed to between individual entries as in standard matrix analysis. Unlike existing models, the proposed tensor probability distribution formulation is shown to yield a unique maximum likelihood estimator which is demonstrated to be statistically efficient. Both matrices and vectors are lower-order tensors, and this gives us a unique opportunity to consider some matrix signal processing models under the more powerful framework of multilinear tensor algebra. By introducing a model for the joint distribution of multiple random tensors, it is also possible to treat random tensor regression analyses and subspace methods within a unified separability framework. Practical utility of the proposed analysis is demonstrated through case studies over synthetic and real-world tensor-valued data, including the evolution over time of global atmospheric temperatures and international interest rates. Another overarching theme in this thesis is the nonstationarity inherent to real-world signals, which typically consist of both deterministic and stochastic components. This thesis aims to help bridge the gap between formal probabilistic theory of stochastic processes and empirical signal processing methods for deterministic signals by providing a spectral model for a class of nonstationary signals, whereby the deterministic and stochastic time-domain signal properties are designated respectively by the first- and second-order moments of the signal in the frequency domain. By virtue of the assumed probabilistic model, novel tests for nonstationarity detection are devised and demonstrated to be effective in low-SNR environments. The proposed spectral analysis framework, which is intrinsically complex-valued, is facilitated by augmented complex algebra in order to fully capture the joint distribution of the real and imaginary parts of complex random variables, using a compact formulation. Finally, motivated by the need for signal processing algorithms which naturally cater for the nonstationarity inherent to real-world tensors, the above contributions are employed simultaneously to derive a general statistical signal processing framework for nonstationary tensors. This is achieved by introducing a new augmented complex multilinear algebra which allows for a concise description of the multilinear interactions between the real and imaginary parts of complex tensors. These contributions are further supported by new physically meaningful empirical results on the statistical analysis of nonstationary global atmospheric temperatures.Open Acces

    W Boson Polarization Studies for Vector Boson Scattering at LHC: from Classical Approaches to Quantum Computing

    Get PDF
    The Large Hadron Collider (LHC) at the European Organization for Nuclear Research (CERN) has, in the recent years, delivered unprecedented high-energy proton-proton collisions that have been collected and studied by two multi-purpose experiments, ATLAS and CMS. In this thesis, we focus on one physics process in particular, the Vector Boson Scattering (VBS), which is one of the keys to probe the ElectroWeak sector of the Standard Model in the TeV regime and to shed light on the mechanism of ElectroWeak symmetry breaking. VBS measurement is extremely challenging, because of its low signal yields, complex final states and large backgrounds. Its understanding requires a coordinated effort of theorists and experimentalists, to explore all possible information about inclusive observables, kinematics and background isolation. The present work wants to contribute to Vector Boson Scattering studies by exploring the possibility to disentangle among W boson polarizations when analyzing a pure VBS sample. This work is organized as follows. In Chapter1, we overview the main concepts related to the Standard Model of particle physics. We introduce the VBS process from a theoretical perspective in Chapter2, underlying its role with respect to the known mechanism of ElectroWeak Symmetry Breaking. We emphasize the importance of regularizing the VBS amplitude by canceling divergences arising from longitudinally polarized vector bosons at high energy. In the same Chapter, we discuss strategies to explore how to identify the contribution of longitudinally polarized W bosons in the VBS process. We investigate the possibility to reconstruct the event kinematics and to thereby develop a technique that would efficiently discriminate between the longitudinal contribution and the rest of the participating processes in the VBS. In Chapter 3, we perform a Montecarlo generator comparison at different orders in perturbation theory, to explore the state-of-art of VBS Montecarlo programs and to provide suggestions and limits to the experimental community. In the last part of the same Chapter we provide an estimation of PDF uncertainty contribution to VBS observables. Chapter 4 introduces the phenomenological study of this work. We perform an extensive study on polarization fraction extraction and on reconstruction of the W boson reference frame. We first make use of traditional kinematic approaches, moving then to a Deep Learning strategy. Finally, in Chapter 5, we test a new technological paradigm, the Quantum Computer, to evaluate its potential in our case study and overall in the HEP sector. This work has been carried on in the framework of a PhD Executive project, in partnership between the University of Pavia and IBM Italia, and has therefore received supports from both the institutions. This work has been funded by the European Community via the COST Action VBSCan, created with the purpose of connecting all the main players involved in Vector Boson Scattering studies at hadron colliders, gathering a solid and multidisciplinary community and aiming at providing the worldwide phenomenological reference on this fundamental process

    Bayesian Gaussian Process Models: PAC-Bayesian Generalisation Error Bounds and Sparse Approximations

    Get PDF
    Non-parametric models and techniques enjoy a growing popularity in the field of machine learning, and among these Bayesian inference for Gaussian process (GP) models has recently received significant attention. We feel that GP priors should be part of the standard toolbox for constructing models relevant to machine learning in the same way as parametric linear models are, and the results in this thesis help to remove some obstacles on the way towards this goal. In the first main chapter, we provide a distribution-free finite sample bound on the difference between generalisation and empirical (training) error for GP classification methods. While the general theorem (the PAC-Bayesian bound) is not new, we give a much simplified and somewhat generalised derivation and point out the underlying core technique (convex duality) explicitly. Furthermore, the application to GP models is novel (to our knowledge). A central feature of this bound is that its quality depends crucially on task knowledge being encoded faithfully in the model and prior distributions, so there is a mutual benefit between a sharp theoretical guarantee and empirically well-established statistical practices. Extensive simulations on real-world classification tasks indicate an impressive tightness of the bound, in spite of the fact that many previous bounds for related kernel machines fail to give non-trivial guarantees in this practically relevant regime. In the second main chapter, sparse approximations are developed to address the problem of the unfavourable scaling of most GP techniques with large training sets. Due to its high importance in practice, this problem has received a lot of attention recently. We demonstrate the tractability and usefulness of simple greedy forward selection with information-theoretic criteria previously used in active learning (or sequential design) and develop generic schemes for automatic model selection with many (hyper)parameters. We suggest two new generic schemes and evaluate some of their variants on large real-world classification and regression tasks. These schemes and their underlying principles (which are clearly stated and analysed) can be applied to obtain sparse approximations for a wide regime of GP models far beyond the special cases we studied here
    corecore