43 research outputs found
Enhancing Deep Learning Models through Tensorization: A Comprehensive Survey and Framework
The burgeoning growth of public domain data and the increasing complexity of
deep learning model architectures have underscored the need for more efficient
data representation and analysis techniques. This paper is motivated by the
work of (Helal, 2023) and aims to present a comprehensive overview of
tensorization. This transformative approach bridges the gap between the
inherently multidimensional nature of data and the simplified 2-dimensional
matrices commonly used in linear algebra-based machine learning algorithms.
This paper explores the steps involved in tensorization, multidimensional data
sources, various multiway analysis methods employed, and the benefits of these
approaches. A small example of Blind Source Separation (BSS) is presented
comparing 2-dimensional algorithms and a multiway algorithm in Python. Results
indicate that multiway analysis is more expressive. Contrary to the intuition
of the dimensionality curse, utilising multidimensional datasets in their
native form and applying multiway analysis methods grounded in multilinear
algebra reveal a profound capacity to capture intricate interrelationships
among various dimensions while, surprisingly, reducing the number of model
parameters and accelerating processing. A survey of the multi-away analysis
methods and integration with various Deep Neural Networks models is presented
using case studies in different application domains.Comment: 34 pages, 8 figures, 4 table
Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives
Part 2 of this monograph builds on the introduction to tensor networks and
their operations presented in Part 1. It focuses on tensor network models for
super-compressed higher-order representation of data/parameters and related
cost functions, while providing an outline of their applications in machine
learning and data analytics. A particular emphasis is on the tensor train (TT)
and Hierarchical Tucker (HT) decompositions, and their physically meaningful
interpretations which reflect the scalability of the tensor network approach.
Through a graphical approach, we also elucidate how, by virtue of the
underlying low-rank tensor approximations and sophisticated contractions of
core tensors, tensor networks have the ability to perform distributed
computations on otherwise prohibitively large volumes of data/parameters,
thereby alleviating or even eliminating the curse of dimensionality. The
usefulness of this concept is illustrated over a number of applied areas,
including generalized regression and classification (support tensor machines,
canonical correlation analysis, higher order partial least squares),
generalized eigenvalue decomposition, Riemannian optimization, and in the
optimization of deep neural networks. Part 1 and Part 2 of this work can be
used either as stand-alone separate texts, or indeed as a conjoint
comprehensive review of the exciting field of low-rank tensor networks and
tensor decompositions.Comment: 232 page
Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives
Part 2 of this monograph builds on the introduction to tensor networks and
their operations presented in Part 1. It focuses on tensor network models for
super-compressed higher-order representation of data/parameters and related
cost functions, while providing an outline of their applications in machine
learning and data analytics. A particular emphasis is on the tensor train (TT)
and Hierarchical Tucker (HT) decompositions, and their physically meaningful
interpretations which reflect the scalability of the tensor network approach.
Through a graphical approach, we also elucidate how, by virtue of the
underlying low-rank tensor approximations and sophisticated contractions of
core tensors, tensor networks have the ability to perform distributed
computations on otherwise prohibitively large volumes of data/parameters,
thereby alleviating or even eliminating the curse of dimensionality. The
usefulness of this concept is illustrated over a number of applied areas,
including generalized regression and classification (support tensor machines,
canonical correlation analysis, higher order partial least squares),
generalized eigenvalue decomposition, Riemannian optimization, and in the
optimization of deep neural networks. Part 1 and Part 2 of this work can be
used either as stand-alone separate texts, or indeed as a conjoint
comprehensive review of the exciting field of low-rank tensor networks and
tensor decompositions.Comment: 232 page
Statistical signal processing of nonstationary tensor-valued data
Real-world signals, such as the evolution of three-dimensional vector fields over time, can exhibit highly structured probabilistic interactions across their multiple constitutive dimensions. This calls for analysis tools capable of directly capturing the inherent multi-way couplings present in such data. Yet, current analyses typically employ multivariate matrix models and their associated linear algebras which are agnostic to the global data structure and can only describe local linear pairwise relationships between data entries.
To address this issue, this thesis uses the property of linear separability -- a notion intrinsic to multi-dimensional data structures called tensors -- as a linchpin to consider the probabilistic, statistical and spectral separability under one umbrella. This helps to both enhance physical meaning in the analysis and reduce the dimensionality of tensor-valued problems. We first introduce a new identifiable probability distribution which appropriately models the interactions between random tensors, whereby linear relationships are considered between tensor fibres as opposed to between individual entries as in standard matrix analysis. Unlike existing models, the proposed tensor probability distribution formulation is shown to yield a unique maximum likelihood estimator which is demonstrated to be statistically efficient.
Both matrices and vectors are lower-order tensors, and this gives us a unique opportunity to consider some matrix signal processing models under the more powerful framework of multilinear tensor algebra. By introducing a model for the joint distribution of multiple random tensors, it is also possible to treat random tensor regression analyses and subspace methods within a unified separability framework. Practical utility of the proposed analysis is demonstrated through case studies over synthetic and real-world tensor-valued data, including the evolution over time of global atmospheric temperatures and international interest rates.
Another overarching theme in this thesis is the nonstationarity inherent to real-world signals, which typically consist of both deterministic and stochastic components. This thesis aims to help bridge the gap between formal probabilistic theory of stochastic processes and empirical signal processing methods for deterministic signals by providing a spectral model for a class of nonstationary signals, whereby the deterministic and stochastic time-domain signal properties are designated respectively by the first- and second-order moments of the signal in the frequency domain. By virtue of the assumed probabilistic model, novel tests for nonstationarity detection are devised and demonstrated to be effective in low-SNR environments. The proposed spectral analysis framework, which is intrinsically complex-valued, is facilitated by augmented complex algebra in order to fully capture the joint distribution of the real and imaginary parts of complex random variables, using a compact formulation.
Finally, motivated by the need for signal processing algorithms which naturally cater for the nonstationarity inherent to real-world tensors, the above contributions are employed simultaneously to derive a general statistical signal processing framework for nonstationary tensors. This is achieved by introducing a new augmented complex multilinear algebra which allows for a concise description of the multilinear interactions between the real and imaginary parts of complex tensors. These contributions are further supported by new physically meaningful empirical results on the statistical analysis of nonstationary global atmospheric temperatures.Open Acces
Nonlinear Filtering based on Log-homotopy Particle Flow : Methodological Clarification and Numerical Evaluation
The state estimation of dynamical systems based on measurements is an ubiquitous problem. This is relevant in applications like robotics, industrial manufacturing, computer vision, target tracking etc. Recursive Bayesian methodology can then be used to estimate the hidden states of a dynamical system. The procedure consists of two steps: a process update based on solving the equations modelling the state evolution, and a measurement update in which the prior knowledge about the system is improved based on the measurements. For most real world systems, both the evolution and the measurement models are nonlinear functions of the system states. Additionally, both models can also be perturbed by random noise sources, which could be non-Gaussian in their nature. Unlike linear Gaussian models, there does not exist any optimal estimation scheme for nonlinear/non-Gaussian scenarios. This thesis investigates a particular method for nonlinear and non-Gaussian data assimilation, termed as the log-homotopy based particle flow. Practical filters based on such flows have been known in the literature as Daum Huang filters (DHF), named after the developers. The key concept behind such filters is the gradual inclusion of measurements to counter a major drawback of single step update schemes like the particle filters i.e. namely the degeneracy. This could refer to a situation where the likelihood function has its probability mass well seperated from the prior density, and/or is peaked in comparison. Conventional sampling or grid based techniques do not perform well under such circumstances and in order to achieve a reasonable accuracy, could incur a high processing cost. DHF is a sampling based scheme, which provides a unique way to tackle this challenge thereby lowering the processing cost. This is achieved by dividing the single measurement update step into multiple sub steps, such that particles originating from their prior locations are graduated incrementally until they reach their final locations. The motion is controlled by a differential equation, which is numerically solved to yield the updated states. DH filters, even though not new in the literature, have not been fully explored in the detail yet. They lack the in-depth analysis that the other contemporary filters have gone through. Especially, the implementation details for the DHF are very application specific. In this work, we have pursued four main objectives. The first objective is the exploration of theoretical concepts behind DHF. Secondly, we build an understanding of the existing implementation framework and highlight its potential shortcomings. As a sub task to this, we carry out a detailed study of important factors that affect the performance of a DHF, and suggest possible improvements for each of those factors. The third objective is to use the improved implementation to derive new filtering algorithms. Finally, we have extended the DHF theory and derived new flow equations and filters to cater for more general scenarios. Improvements in the implementation architecture of a standard DHF is one of the key contributions of this thesis. The scope of the applicability of DHF is expanded by combining it with other schemes like the Sequential Markov chain Monte Carlo and the tensor decomposition based solution of the Fokker Planck equation, resulting in the development of new nonlinear filtering algorithms. The standard DHF, using improved implementation and the newly derived algorithms are tested in challenging simulated test scenarios. Detailed analysis have been carried out, together with the comparison against more established filtering schemes. Estimation error and the processing time are used as important performance parameters. We show that our new filtering algorithms exhibit marked performance improvements over the traditional schemes