2,314 research outputs found
Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives
Part 2 of this monograph builds on the introduction to tensor networks and
their operations presented in Part 1. It focuses on tensor network models for
super-compressed higher-order representation of data/parameters and related
cost functions, while providing an outline of their applications in machine
learning and data analytics. A particular emphasis is on the tensor train (TT)
and Hierarchical Tucker (HT) decompositions, and their physically meaningful
interpretations which reflect the scalability of the tensor network approach.
Through a graphical approach, we also elucidate how, by virtue of the
underlying low-rank tensor approximations and sophisticated contractions of
core tensors, tensor networks have the ability to perform distributed
computations on otherwise prohibitively large volumes of data/parameters,
thereby alleviating or even eliminating the curse of dimensionality. The
usefulness of this concept is illustrated over a number of applied areas,
including generalized regression and classification (support tensor machines,
canonical correlation analysis, higher order partial least squares),
generalized eigenvalue decomposition, Riemannian optimization, and in the
optimization of deep neural networks. Part 1 and Part 2 of this work can be
used either as stand-alone separate texts, or indeed as a conjoint
comprehensive review of the exciting field of low-rank tensor networks and
tensor decompositions.Comment: 232 page
Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives
Part 2 of this monograph builds on the introduction to tensor networks and
their operations presented in Part 1. It focuses on tensor network models for
super-compressed higher-order representation of data/parameters and related
cost functions, while providing an outline of their applications in machine
learning and data analytics. A particular emphasis is on the tensor train (TT)
and Hierarchical Tucker (HT) decompositions, and their physically meaningful
interpretations which reflect the scalability of the tensor network approach.
Through a graphical approach, we also elucidate how, by virtue of the
underlying low-rank tensor approximations and sophisticated contractions of
core tensors, tensor networks have the ability to perform distributed
computations on otherwise prohibitively large volumes of data/parameters,
thereby alleviating or even eliminating the curse of dimensionality. The
usefulness of this concept is illustrated over a number of applied areas,
including generalized regression and classification (support tensor machines,
canonical correlation analysis, higher order partial least squares),
generalized eigenvalue decomposition, Riemannian optimization, and in the
optimization of deep neural networks. Part 1 and Part 2 of this work can be
used either as stand-alone separate texts, or indeed as a conjoint
comprehensive review of the exciting field of low-rank tensor networks and
tensor decompositions.Comment: 232 page
A Simple Iterative Algorithm for Parsimonious Binary Kernel Fisher Discrimination
By applying recent results in optimization theory variously known as optimization transfer or majorize/minimize algorithms, an algorithm for binary, kernel, Fisher discriminant analysis is introduced that makes use of a non-smooth penalty on the coefficients to provide a parsimonious solution. The problem is converted into a smooth optimization that can be solved iteratively with no greater overhead than iteratively re-weighted least-squares. The result is simple, easily programmed and is shown to perform, in terms of both accuracy and parsimony, as well as or better than a number of leading machine learning algorithms on two well-studied and substantial benchmarks
The wavelet-NARMAX representation : a hybrid model structure combining polynomial models with multiresolution wavelet decompositions
A new hybrid model structure combing polynomial models with multiresolution wavelet decompositions is introduced for nonlinear system identification. Polynomial models play an important role in approximation theory, and have been extensively used in linear and nonlinear system identification. Wavelet decompositions, in which the basis functions have the property of localization in both time and frequency, outperform many other approximation schemes and offer a flexible solution for approximating arbitrary functions. Although wavelet representations can approximate even severe nonlinearities in a given signal very well, the advantage of these representations can be lost when wavelets are used to capture linear or low-order nonlinear behaviour in a signal. In order to sufficiently utilise the global property of polynomials and the local property of wavelet representations simultaneously, in this study polynomial models and wavelet decompositions are combined together in a parallel structure to represent nonlinear input-output systems. As a special form of the NARMAX model, this hybrid model structure will be referred to as the WAvelet-NARMAX model, or simply WANARMAX. Generally, such a WANARMAX representation for an input-output system might involve a large number of basis functions and therefore a great number of model terms. Experience reveals that only a small number of these model terms are significant to the system output. A new fast orthogonal least squares algorithm, called the matching pursuit orthogonal least squares (MPOLS) algorithm, is also introduced in this study to determine which terms should be included in the final model
Bayesian Gaussian Process Models: PAC-Bayesian Generalisation Error Bounds and Sparse Approximations
Non-parametric models and techniques enjoy a growing popularity in the field of machine learning, and among these Bayesian inference for Gaussian process (GP) models has recently received significant attention. We feel that GP priors should be part of the standard toolbox for constructing models relevant to machine learning in the same way as parametric linear models are, and the results in this thesis help to remove some obstacles on the way towards this goal. In the first main chapter, we provide a distribution-free finite sample bound on the difference between generalisation and empirical (training) error for GP classification methods. While the general theorem (the PAC-Bayesian bound) is not new, we give a much simplified and somewhat generalised derivation and point out the underlying core technique (convex duality) explicitly. Furthermore, the application to GP models is novel (to our knowledge). A central feature of this bound is that its quality depends crucially on task knowledge being encoded faithfully in the model and prior distributions, so there is a mutual benefit between a sharp theoretical guarantee and empirically well-established statistical practices. Extensive simulations on real-world classification tasks indicate an impressive tightness of the bound, in spite of the fact that many previous bounds for related kernel machines fail to give non-trivial guarantees in this practically relevant regime. In the second main chapter, sparse approximations are developed to address the problem of the unfavourable scaling of most GP techniques with large training sets. Due to its high importance in practice, this problem has received a lot of attention recently. We demonstrate the tractability and usefulness of simple greedy forward selection with information-theoretic criteria previously used in active learning (or sequential design) and develop generic schemes for automatic model selection with many (hyper)parameters. We suggest two new generic schemes and evaluate some of their variants on large real-world classification and regression tasks. These schemes and their underlying principles (which are clearly stated and analysed) can be applied to obtain sparse approximations for a wide regime of GP models far beyond the special cases we studied here
A unified wavelet-based modelling framework for non-linear system identification: the WANARX model structure
A new unified modelling framework based on the superposition of additive submodels, functional components, and
wavelet decompositions is proposed for non-linear system identification. A non-linear model, which is often represented
using a multivariate non-linear function, is initially decomposed into a number of functional components via the wellknown
analysis of variance (ANOVA) expression, which can be viewed as a special form of the NARX (non-linear
autoregressive with exogenous inputs) model for representing dynamic input–output systems. By expanding each functional
component using wavelet decompositions including the regular lattice frame decomposition, wavelet series and
multiresolution wavelet decompositions, the multivariate non-linear model can then be converted into a linear-in-theparameters
problem, which can be solved using least-squares type methods. An efficient model structure determination
approach based upon a forward orthogonal least squares (OLS) algorithm, which involves a stepwise orthogonalization
of the regressors and a forward selection of the relevant model terms based on the error reduction ratio (ERR), is
employed to solve the linear-in-the-parameters problem in the present study. The new modelling structure is referred to
as a wavelet-based ANOVA decomposition of the NARX model or simply WANARX model, and can be applied to
represent high-order and high dimensional non-linear systems
Sparse machine learning methods with applications in multivariate signal processing
This thesis details theoretical and empirical work that draws from two main subject areas: Machine
Learning (ML) and Digital Signal Processing (DSP). A unified general framework is given for the application
of sparse machine learning methods to multivariate signal processing. In particular, methods that
enforce sparsity will be employed for reasons of computational efficiency, regularisation, and compressibility.
The methods presented can be seen as modular building blocks that can be applied to a variety
of applications. Application specific prior knowledge can be used in various ways, resulting in a flexible
and powerful set of tools. The motivation for the methods is to be able to learn and generalise from a set
of multivariate signals.
In addition to testing on benchmark datasets, a series of empirical evaluations on real world
datasets were carried out. These included: the classification of musical genre from polyphonic audio
files; a study of how the sampling rate in a digital radar can be reduced through the use of Compressed
Sensing (CS); analysis of human perception of different modulations of musical key from
Electroencephalography (EEG) recordings; classification of genre of musical pieces to which a listener
is attending from Magnetoencephalography (MEG) brain recordings. These applications demonstrate
the efficacy of the framework and highlight interesting directions of future research
- …