2,359 research outputs found
Maximum Entropy Production Principle for Stock Returns
In our previous studies we have investigated the structural complexity of
time series describing stock returns on New York's and Warsaw's stock
exchanges, by employing two estimators of Shannon's entropy rate based on
Lempel-Ziv and Context Tree Weighting algorithms, which were originally used
for data compression. Such structural complexity of the time series describing
logarithmic stock returns can be used as a measure of the inherent (model-free)
predictability of the underlying price formation processes, testing the
Efficient-Market Hypothesis in practice. We have also correlated the estimated
predictability with the profitability of standard trading algorithms, and found
that these do not use the structure inherent in the stock returns to any
significant degree. To find a way to use the structural complexity of the stock
returns for the purpose of predictions we propose the Maximum Entropy
Production Principle as applied to stock returns, and test it on the two
mentioned markets, inquiring into whether it is possible to enhance prediction
of stock returns based on the structural complexity of these and the mentioned
principle.Comment: 14 pages, 5 figure
OCReP: An Optimally Conditioned Regularization for Pseudoinversion Based Neural Training
In this paper we consider the training of single hidden layer neural networks
by pseudoinversion, which, in spite of its popularity, is sometimes affected by
numerical instability issues. Regularization is known to be effective in such
cases, so that we introduce, in the framework of Tikhonov regularization, a
matricial reformulation of the problem which allows us to use the condition
number as a diagnostic tool for identification of instability. By imposing
well-conditioning requirements on the relevant matrices, our theoretical
analysis allows the identification of an optimal value for the regularization
parameter from the standpoint of stability. We compare with the value derived
by cross-validation for overfitting control and optimisation of the
generalization performance. We test our method for both regression and
classification tasks. The proposed method is quite effective in terms of
predictivity, often with some improvement on performance with respect to the
reference cases considered. This approach, due to analytical determination of
the regularization parameter, dramatically reduces the computational load
required by many other techniques.Comment: Published on Neural Network
Scale-invariant segmentation of dynamic contrast-enhanced perfusion MR-images with inherent scale selection
Selection of the best set of scales is problematic when developing signaldriven
approaches for pixel-based image segmentation. Often, different
possibly conflicting criteria need to be fulfilled in order to obtain the best tradeoff
between uncertainty (variance) and location accuracy. The optimal set of
scales depends on several factors: the noise level present in the image material,
the prior distribution of the different types of segments, the class-conditional
distributions associated with each type of segment as well as the actual size of
the (connected) segments. We analyse, theoretically and through experiments,
the possibility of using the overall and class-conditional error rates as criteria
for selecting the optimal sampling of the linear and morphological scale spaces.
It is shown that the overall error rate is optimised by taking the prior class
distribution in the image material into account. However, a uniform (ignorant)
prior distribution ensures constant class-conditional error rates. Consequently,
we advocate for a uniform prior class distribution when an uncommitted, scaleinvariant
segmentation approach is desired.
Experiments with a neural net classifier developed for segmentation of
dynamic MR images, acquired with a paramagnetic tracer, support the
theoretical results. Furthermore, the experiments show that the addition of
spatial features to the classifier, extracted from the linear or morphological
scale spaces, improves the segmentation result compared to a signal-driven
approach based solely on the dynamic MR signal. The segmentation results
obtained from the two types of features are compared using two novel quality
measures that characterise spatial properties of labelled images
Improved prediction accuracy for disease risk mapping using Gaussian process stacked generalization.
Maps of infectious disease-charting spatial variations in the force of infection, degree of endemicity and the burden on human health-provide an essential evidence base to support planning towards global health targets. Contemporary disease mapping efforts have embraced statistical modelling approaches to properly acknowledge uncertainties in both the available measurements and their spatial interpolation. The most common such approach is Gaussian process regression, a mathematical framework composed of two components: a mean function harnessing the predictive power of multiple independent variables, and a covariance function yielding spatio-temporal shrinkage against residual variation from the mean. Though many techniques have been developed to improve the flexibility and fitting of the covariance function, models for the mean function have typically been restricted to simple linear terms. For infectious diseases, known to be driven by complex interactions between environmental and socio-economic factors, improved modelling of the mean function can greatly boost predictive power. Here, we present an ensemble approach based on stacked generalization that allows for multiple nonlinear algorithmic mean functions to be jointly embedded within the Gaussian process framework. We apply this method to mapping Plasmodium falciparum prevalence data in sub-Saharan Africa and show that the generalized ensemble approach markedly outperforms any individual method
Essays on distance metric learning
Many machine learning methods, such as the k-nearest neighbours algorithm, heavily depend on the distance measure between data points. As each task has its own notion of distance, distance metric learning has been proposed. It learns a distance metric to assign a small distance to semantically similar instances and a large distance to dissimilar instances by formulating an optimisation problem. While many loss functions and regularisation terms have been proposed to improve the discrimination and generalisation ability of the learned metric, the metric may be sensitive to a small perturbation in the input space. Moreover, these methods implicitly assume that features are numerical variables and labels are deterministic. However, categorical variables and probabilistic labels are common in real-world applications. This thesis develops three metric learning methods to enhance robustness against input perturbation and applicability for categorical variables and probabilistic labels. In Chapter 3, I identify that many existing methods maximise a margin in the feature space and such margin is insufficient to withstand perturbation in the input space. To address this issue, a new loss function is designed to penalise the input-space margin for being small and hence improve the robustness of the learned metric. In Chapter 4, I propose a metric learning method for categorical data. Classifying categorical data is difficult due to high feature ambiguity, and to this end, the technique of adversarial training is employed. Moreover, the generalisation bound of the proposed method is established, which informs the choice of the regularisation term. In Chapter 5, I adapt a classical probabilistic approach for metric learning to utilise information on probabilistic labels. The loss function is modified for training stability, and new evaluation criteria are suggested to assess the effectiveness of different methods. At the end of this thesis, two publications on hyperspectral target detection are appended as additional work during my PhD
Recommended from our members
The Foundations of Infinite-Dimensional Spectral Computations
Spectral computations in infinite dimensions are ubiquitous in the sciences. However, their many applications and theoretical studies depend on computations which are infamously difficult. This thesis, therefore, addresses the broad question,
âWhat is computationally possible within the field of spectral theory of separable Hilbert spaces?â
The boundaries of what computers can achieve in computational spectral theory and mathematical physics are unknown, leaving many open questions that have been unsolved for decades. This thesis provides solutions to several such long-standing problems.
To determine these boundaries, we use the Solvability Complexity Index (SCI) hierarchy, an idea which has its roots in Smale's comprehensive programme on the foundations of computational mathematics. The Smale programme led to a real-number counterpart of the Turing machine, yet left a substantial gap between theory and practice. The SCI hierarchy encompasses both these models and provides universal bounds on what is computationally possible. What makes spectral problems particularly delicate is that many of the problems can only be computed by using several limits, a phenomenon also shared in the foundations of polynomial root-finding as shown by McMullen. We develop and extend the SCI hierarchy to prove optimality of algorithms and construct a myriad of different methods for infinite-dimensional spectral problems, solving many computational spectral problems for the first time.
For arguably almost any operator of applicable interest, we solve the long-standing computational spectral problem and construct algorithms that compute spectra with error control. This is done for partial differential operators with coefficients of locally bounded total variation and also for discrete infinite matrix operators. We also show how to compute spectral measures of normal operators (when the spectrum is a subset of a regular enough Jordan curve), including spectral measures of classes of self-adjoint operators with error control and the construction of high-order rational kernel methods. We classify the problems of computing measures, measure decompositions, types of spectra (pure point, absolutely continuous, singular continuous), functional calculus, and Radon--Nikodym derivatives in the SCI hierarchy. We construct algorithms for and classify; fractal dimensions of spectra, Lebesgue measures of spectra, spectral gaps, discrete spectra, eigenvalue multiplicities, capacity, different spectral radii and the problem of detecting algorithmic failure of previous methods (finite section method). The infinite-dimensional QR algorithm is also analysed, recovering extremal parts of spectra, corresponding eigenvectors, and invariant subspaces, with convergence rates and error control. Finally, we analyse pseudospectra of pseudoergodic operators (a generalisation of random operators) on vector-valued spaces.
All of the algorithms developed in this thesis are sharp in the sense of the SCI hierarchy. In other words, we prove that they are optimal, realising the boundaries of what digital computers can achieve. They are also implementable and practical, and the majority are parallelisable. Extensive numerical examples are given throughout, demonstrating efficiency and tackling difficult problems taken from mathematics and also physical applications.
In summary, this thesis allows scientists to rigorously and efficiently compute many spectral properties for the first time. The framework provided by this thesis also encompasses a vast number of areas in computational mathematics, including the classical problem of polynomial root-finding, as well as optimisation, neural networks, PDEs and computer-assisted proofs. This framework will be explored in the future work of the author within these settings
Laser Based Mid-Infrared Spectroscopic Imaging â Exploring a Novel Method for Application in Cancer Diagnosis
A number of biomedical studies have shown that mid-infrared spectroscopic images can provide
both morphological and biochemical information that can be used for the diagnosis of cancer. Whilst
this technique has shown great potential it has yet to be employed by the medical profession. By
replacing the conventional broadband thermal source employed in modern FTIR spectrometers with
high-brightness, broadly tuneable laser based sources (QCLs and OPGs) we aim to solve one of the
main obstacles to the transfer of this technology to the medical arena; namely poor signal to noise
ratios at high spatial resolutions and short image acquisition times. In this thesis we take the first
steps towards developing the optimum experimental configuration, the data processing algorithms
and the spectroscopic image contrast and enhancement methods needed to utilise these high
intensity laser based sources. We show that a QCL system is better suited to providing numerical
absorbance values (biochemical information) than an OPG system primarily due to the QCL pulse
stability. We also discuss practical protocols for the application of spectroscopic imaging to cancer
diagnosis and present our spectroscopic imaging results from our laser based spectroscopic imaging
experiments of oesophageal cancer tissue
Full Covariance Modelling for Speech Recognition
HMM-based systems for Automatic Speech Recognition typically model
the acoustic features using mixtures of multivariate Gaussians. In this
thesis, we consider the problem of learning a suitable covariance matrix
for each Gaussian. A variety of schemes have been proposed for
controlling the number of covariance parameters per Gaussian, and
studies have shown that in general, the greater the number of parameters
used in the models, the better the recognition performance. We
therefore investigate systems with full covariance Gaussians. However,
in this case, the obvious choice of parameters â given by the sample
covariance matrix â leads to matrices that are poorly-conditioned, and
do not generalise well to unseen test data. The problem is particularly
acute when the amount of training data is limited.
We propose two solutions to this problem: firstly, we impose the requirement
that each matrix should take the form of a Gaussian graphical
model, and introduce a method for learning the parameters and
the model structure simultaneously. Secondly, we explain how an
alternative estimator, the shrinkage estimator, is preferable to the
standard maximum likelihood estimator, and derive formulae for the
optimal shrinkage intensity within the context of a Gaussian mixture
model. We show how this relates to the use of a diagonal covariance
smoothing prior.
We compare the effectiveness of these techniques to standard methods
on a phone recognition task where the quantity of training data is
artificially constrained. We then investigate the performance of the
shrinkage estimator on a large-vocabulary conversational telephone
speech recognition task. Discriminative training techniques can be used to compensate for the
invalidity of the model correctness assumption underpinning maximum
likelihood estimation. On the large-vocabulary task, we use discriminative
training of the full covariance models and diagonal priors
to yield improved recognition performance
- âŠ