43 research outputs found

    Enhancing Deep Learning Models through Tensorization: A Comprehensive Survey and Framework

    Full text link
    The burgeoning growth of public domain data and the increasing complexity of deep learning model architectures have underscored the need for more efficient data representation and analysis techniques. This paper is motivated by the work of (Helal, 2023) and aims to present a comprehensive overview of tensorization. This transformative approach bridges the gap between the inherently multidimensional nature of data and the simplified 2-dimensional matrices commonly used in linear algebra-based machine learning algorithms. This paper explores the steps involved in tensorization, multidimensional data sources, various multiway analysis methods employed, and the benefits of these approaches. A small example of Blind Source Separation (BSS) is presented comparing 2-dimensional algorithms and a multiway algorithm in Python. Results indicate that multiway analysis is more expressive. Contrary to the intuition of the dimensionality curse, utilising multidimensional datasets in their native form and applying multiway analysis methods grounded in multilinear algebra reveal a profound capacity to capture intricate interrelationships among various dimensions while, surprisingly, reducing the number of model parameters and accelerating processing. A survey of the multi-away analysis methods and integration with various Deep Neural Networks models is presented using case studies in different application domains.Comment: 34 pages, 8 figures, 4 table

    Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

    Full text link
    Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.Comment: 232 page

    Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

    Full text link
    Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.Comment: 232 page

    Statistical signal processing of nonstationary tensor-valued data

    Get PDF
    Real-world signals, such as the evolution of three-dimensional vector fields over time, can exhibit highly structured probabilistic interactions across their multiple constitutive dimensions. This calls for analysis tools capable of directly capturing the inherent multi-way couplings present in such data. Yet, current analyses typically employ multivariate matrix models and their associated linear algebras which are agnostic to the global data structure and can only describe local linear pairwise relationships between data entries. To address this issue, this thesis uses the property of linear separability -- a notion intrinsic to multi-dimensional data structures called tensors -- as a linchpin to consider the probabilistic, statistical and spectral separability under one umbrella. This helps to both enhance physical meaning in the analysis and reduce the dimensionality of tensor-valued problems. We first introduce a new identifiable probability distribution which appropriately models the interactions between random tensors, whereby linear relationships are considered between tensor fibres as opposed to between individual entries as in standard matrix analysis. Unlike existing models, the proposed tensor probability distribution formulation is shown to yield a unique maximum likelihood estimator which is demonstrated to be statistically efficient. Both matrices and vectors are lower-order tensors, and this gives us a unique opportunity to consider some matrix signal processing models under the more powerful framework of multilinear tensor algebra. By introducing a model for the joint distribution of multiple random tensors, it is also possible to treat random tensor regression analyses and subspace methods within a unified separability framework. Practical utility of the proposed analysis is demonstrated through case studies over synthetic and real-world tensor-valued data, including the evolution over time of global atmospheric temperatures and international interest rates. Another overarching theme in this thesis is the nonstationarity inherent to real-world signals, which typically consist of both deterministic and stochastic components. This thesis aims to help bridge the gap between formal probabilistic theory of stochastic processes and empirical signal processing methods for deterministic signals by providing a spectral model for a class of nonstationary signals, whereby the deterministic and stochastic time-domain signal properties are designated respectively by the first- and second-order moments of the signal in the frequency domain. By virtue of the assumed probabilistic model, novel tests for nonstationarity detection are devised and demonstrated to be effective in low-SNR environments. The proposed spectral analysis framework, which is intrinsically complex-valued, is facilitated by augmented complex algebra in order to fully capture the joint distribution of the real and imaginary parts of complex random variables, using a compact formulation. Finally, motivated by the need for signal processing algorithms which naturally cater for the nonstationarity inherent to real-world tensors, the above contributions are employed simultaneously to derive a general statistical signal processing framework for nonstationary tensors. This is achieved by introducing a new augmented complex multilinear algebra which allows for a concise description of the multilinear interactions between the real and imaginary parts of complex tensors. These contributions are further supported by new physically meaningful empirical results on the statistical analysis of nonstationary global atmospheric temperatures.Open Acces

    Nonlinear Filtering based on Log-homotopy Particle Flow : Methodological Clarification and Numerical Evaluation

    Get PDF
    The state estimation of dynamical systems based on measurements is an ubiquitous problem. This is relevant in applications like robotics, industrial manufacturing, computer vision, target tracking etc. Recursive Bayesian methodology can then be used to estimate the hidden states of a dynamical system. The procedure consists of two steps: a process update based on solving the equations modelling the state evolution, and a measurement update in which the prior knowledge about the system is improved based on the measurements. For most real world systems, both the evolution and the measurement models are nonlinear functions of the system states. Additionally, both models can also be perturbed by random noise sources, which could be non-Gaussian in their nature. Unlike linear Gaussian models, there does not exist any optimal estimation scheme for nonlinear/non-Gaussian scenarios. This thesis investigates a particular method for nonlinear and non-Gaussian data assimilation, termed as the log-homotopy based particle flow. Practical filters based on such flows have been known in the literature as Daum Huang filters (DHF), named after the developers. The key concept behind such filters is the gradual inclusion of measurements to counter a major drawback of single step update schemes like the particle filters i.e. namely the degeneracy. This could refer to a situation where the likelihood function has its probability mass well seperated from the prior density, and/or is peaked in comparison. Conventional sampling or grid based techniques do not perform well under such circumstances and in order to achieve a reasonable accuracy, could incur a high processing cost. DHF is a sampling based scheme, which provides a unique way to tackle this challenge thereby lowering the processing cost. This is achieved by dividing the single measurement update step into multiple sub steps, such that particles originating from their prior locations are graduated incrementally until they reach their final locations. The motion is controlled by a differential equation, which is numerically solved to yield the updated states. DH filters, even though not new in the literature, have not been fully explored in the detail yet. They lack the in-depth analysis that the other contemporary filters have gone through. Especially, the implementation details for the DHF are very application specific. In this work, we have pursued four main objectives. The first objective is the exploration of theoretical concepts behind DHF. Secondly, we build an understanding of the existing implementation framework and highlight its potential shortcomings. As a sub task to this, we carry out a detailed study of important factors that affect the performance of a DHF, and suggest possible improvements for each of those factors. The third objective is to use the improved implementation to derive new filtering algorithms. Finally, we have extended the DHF theory and derived new flow equations and filters to cater for more general scenarios. Improvements in the implementation architecture of a standard DHF is one of the key contributions of this thesis. The scope of the applicability of DHF is expanded by combining it with other schemes like the Sequential Markov chain Monte Carlo and the tensor decomposition based solution of the Fokker Planck equation, resulting in the development of new nonlinear filtering algorithms. The standard DHF, using improved implementation and the newly derived algorithms are tested in challenging simulated test scenarios. Detailed analysis have been carried out, together with the comparison against more established filtering schemes. Estimation error and the processing time are used as important performance parameters. We show that our new filtering algorithms exhibit marked performance improvements over the traditional schemes
    corecore