2,830 research outputs found

    A study on multi-scale kernel optimisation via centered kernel-target alignment

    Get PDF
    Kernel mapping is one of the most widespread approaches to intrinsically deriving nonlinear classifiers. With the aim of better suiting a given dataset, different kernels have been proposed and different bounds and methodologies have been studied to optimise them. We focus on the optimisation of a multi-scale kernel, where a different width is chosen for each feature. This idea has been barely studied in the literature, although it has been shown to achieve better performance in the presence of heterogeneous attributes. The large number of parameters in multi-scale kernels makes it computationally unaffordable to optimise them by applying traditional cross-validation. Instead, an analytical measure known as centered kernel-target alignment (CKTA) can be used to align the kernel to the so-called ideal kernel matrix. This paper analyses and compares this and other alternatives, providing a review of the literature in kernel optimisation and some insights into the usefulness of multi-scale kernel optimisation via CKTA. When applied to the binary support vector machine paradigm (SVM), the results using 24 datasets show that CKTA with a multi-scale kernel leads to the construction of a well-defined feature space and simpler SVM models, provides an implicit filtering of non-informative features and achieves robust and comparable performance to other methods even when using random initialisations. Finally, we derive some considerations about when a multi-scale approach could be, in general, useful and propose a distance-based initialisation technique for the gradient-ascent method, which shows promising results

    Semi-supervised Learning for Ordinal Kernel Discriminant Analysis

    Get PDF
    Ordinal classication considers those classication problems where the labels of the variable to predict follow a given order. Naturally, labelled data is scarce or di_cult to obtain in this type of problems because, in many cases, ordinal labels are given by an user or expert (e.g. in recommendation systems). Firstly, this paper develops a new strategy for ordinal classi_cation where both labelled and unlabelled data are used in the model construction step (a scheme which is referred to as semi-supervised learning). More specically, the ordinal version of kernel discriminant learning is extended for this setting considering the neighbourhood information of unlabelled data, which is proposed to be computed in the feature space induced by the kernel function. Secondly, a new method for semi-supervised kernel learning is devised in the context of ordinal classi_cation, which is combined with our developed classi_cation strategy to optimise the kernel parameters. The experiments conducted compare 6 different approaches for semi-supervised learning in the context of ordinal classication in a battery of 30 datasets, showing 1) the good synergy of the ordinal version of discriminant analysis and the use of unlabelled data and 2) the advantage of computing distances in the feature space induced by the kernel function

    Tensor Decomposition in Multiple Kernel Learning

    Get PDF
    Modern data processing and analytic tasks often deal with high dimensional matrices or tensors; for example: environmental sensors monitor (time, location, temperature, light) data. For large scale tensors, efficient data representation plays a major role in reducing computational time and finding patterns. The thesis firstly studies about fundamental matrix, tensor decomposition algorithms and applications, in connection with Tensor Train decomposition algorithm. The second objective is applying the tensor perspective in Multiple Kernel Learning problems, where the stacking of kernels can be seen as a tensor. Decomposition this kind of tensor leads to an efficient factorization approach in finding the best linear combination of kernels through the similarity alignment. Interestingly, thanks to the symmetry of the kernel matrix, a novel decomposition algorithm for multiple kernels is derived for reducing the computational complexity. In term of applications, this new approach allows the manipulation of large scale multiple kernels problems. For example, with P kernels and n samples, it reduces the memory complexity of O(P^2n^2) to O(P^2r^2+ 2rn) where r < n is the number of low-rank components. This compression is also valuable in pair-wise multiple kernel learning problem which models the relation among pairs of objects and its complexity is in the double scale. This study proposes AlignF_TT, a kernel alignment algorithm which is based on the novel decomposition algorithm for the tensor of kernels. Regarding the predictive performance, the proposed algorithm can gain an improvement in 18 artificially constructed datasets and achieve comparable performance in 13 real-world datasets in comparison with other multiple kernel learning algorithms. It also reveals that the small number of low-rank components is sufficient for approximating the tensor of kernels

    Development and Application of Chemometric Methods for Modelling Metabolic Spectral Profiles

    No full text
    The interpretation of metabolic information is crucial to understanding the functioning of a biological system. Latent information about the metabolic state of a sample can be acquired using analytical chemistry methods, which generate spectroscopic profiles. Thus, nuclear magnetic resonance spectroscopy and mass spectrometry techniques can be employed to generate vast amounts of highly complex data on the metabolic content of biofluids and tissue, and this thesis discusses ways to process, analyse and interpret these data successfully. The evaluation of J -resolved spectroscopy in magnetic resonance profiling and the statistical techniques required to extract maximum information from the projections of these spectra are studied. In particular, data processing is evaluated, and correlation and regression methods are investigated with respect to enhanced model interpretation and biomarker identification. Additionally, it is shown that non-linearities in metabonomic data can be effectively modelled with kernel-based orthogonal partial least squares, for which an automated optimisation of the kernel parameter with nested cross-validation is implemented. The interpretation of orthogonal variation and predictive ability enabled by this approach are demonstrated in regression and classification models for applications in toxicology and parasitology. Finally, the vast amount of data generated with mass spectrometry imaging is investigated in terms of data processing, and the benefits of applying multivariate techniques to these data are illustrated, especially in terms of interpretation and visualisation using colour-coding of images. The advantages of methods such as principal component analysis, self-organising maps and manifold learning over univariate analysis are highlighted. This body of work therefore demonstrates new means of increasing the amount of biochemical information that can be obtained from a given set of samples in biological applications using spectral profiling. Various analytical and statistical methods are investigated and illustrated with applications drawn from diverse biomedical areas

    Computationally-efficient initialisation of GPs: The generalised variogram method

    Full text link
    We present a computationally-efficient strategy to find the hyperparameters of a Gaussian process (GP) avoiding the computation of the likelihood function. The found hyperparameters can then be used directly for regression or passed as initial conditions to maximum-likelihood (ML) training. Motivated by the fact that training a GP via ML is equivalent (on average) to minimising the KL-divergence between the true and learnt model, we set to explore different metrics/divergences among GPs that are computationally inexpensive and provide estimates close to those of ML. In particular, we identify the GP hyperparameters by projecting the empirical covariance or (Fourier) power spectrum onto a parametric family, thus proposing and studying various measures of discrepancy operating on the temporal or frequency domains. Our contribution extends the Variogram method developed by the geostatistics literature and, accordingly, it is referred to as the Generalised Variogram method (GVM). In addition to the theoretical presentation of GVM, we provide experimental validation in terms of accuracy, consistency with ML and computational complexity for different kernels using synthetic and real-world data

    Reference image selection for difference imaging analysis

    Full text link
    Difference image analysis (DIA) is an effective technique for obtaining photometry in crowded fields, relative to a chosen reference image. As yet, however, optimal reference image selection is an unsolved problem. We examine how this selection depends on the combination of seeing, background and detector pixel size. Our tests use a combination of simulated data and quality indicators from DIA of well-sampled optical data and under-sampled near-infrared data from the OGLE and VVV surveys, respectively. We search for a figure-of-merit (FoM) which could be used to select reference images for each survey. While we do not find a universally applicable FoM, survey-specific measures indicate that the effect of spatial under-sampling may require a change in strategy from the standard DIA approach, even though seeing remains the primary criterion. We find that background is not an important criterion for reference selection, at least for the dynamic range in the images we test. For our analysis of VVV data in particular, we find that spatial under-sampling is best handled by reversing the standard DIA procedure and convolving target images to a better-sampled (poor seeing) reference image.Comment: 14 pages, 8 figures, 4 tables, accepted for publication in MNRA

    A Statistical Perspective of the Empirical Mode Decomposition

    Get PDF
    This research focuses on non-stationary basis decompositions methods in time-frequency analysis. Classical methodologies in this field such as Fourier Analysis and Wavelet Transforms rely on strong assumptions of the underlying moment generating process, which, may not be valid in real data scenarios or modern applications of machine learning. The literature on non-stationary methods is still in its infancy, and the research contained in this thesis aims to address challenges arising in this area. Among several alternatives, this work is based on the method known as the Empirical Mode Decomposition (EMD). The EMD is a non-parametric time-series decomposition technique that produces a set of time-series functions denoted as Intrinsic Mode Functions (IMFs), which carry specific statistical properties. The main focus is providing a general and flexible family of basis extraction methods with minimal requirements compared to those within the Fourier or Wavelet techniques. This is highly important for two main reasons: first, more universal applications can be taken into account; secondly, the EMD has very little a priori knowledge of the process required to apply it, and as such, it can have greater generalisation properties in statistical applications across a wide array of applications and data types. The contributions of this work deal with several aspects of the decomposition. The first set regards the construction of an IMF from several perspectives: (1) achieving a semi-parametric representation of each basis; (2) extracting such semi-parametric functional forms in a computationally efficient and statistically robust framework. The EMD belongs to the class of path-based decompositions and, therefore, they are often not treated as a stochastic representation. (3) A major contribution involves the embedding of the deterministic pathwise decomposition framework into a formal stochastic process setting. One of the assumptions proper of the EMD construction is the requirement for a continuous function to apply the decomposition. In general, this may not be the case within many applications. (4) Various multi-kernel Gaussian Process formulations of the EMD will be proposed through the introduced stochastic embedding. Particularly, two different models will be proposed: one modelling the temporal mode of oscillations of the EMD and the other one capturing instantaneous frequencies location in specific frequency regions or bandwidths. (5) The construction of the second stochastic embedding will be achieved with an optimisation method called the cross-entropy method. Two formulations will be provided and explored in this regard. Application on speech time-series are explored to study such methodological extensions given that they are non-stationary
    • …
    corecore