42,788 research outputs found

    Convergence rates of Kernel Conjugate Gradient for random design regression

    Full text link
    We prove statistical rates of convergence for kernel-based least squares regression from i.i.d. data using a conjugate gradient algorithm, where regularization against overfitting is obtained by early stopping. This method is related to Kernel Partial Least Squares, a regression method that combines supervised dimensionality reduction with least squares projection. Following the setting introduced in earlier related literature, we study so-called "fast convergence rates" depending on the regularity of the target regression function (measured by a source condition in terms of the kernel integral operator) and on the effective dimensionality of the data mapped into the kernel space. We obtain upper bounds, essentially matching known minimax lower bounds, for the L2\mathcal{L}^2 (prediction) norm as well as for the stronger Hilbert norm, if the true regression function belongs to the reproducing kernel Hilbert space. If the latter assumption is not fulfilled, we obtain similar convergence rates for appropriate norms, provided additional unlabeled data are available

    Kernel-Partial Least Squares regression coupled to pseudo-sample trajectories for the analysis of mixture designs of experiments

    Get PDF
    [EN] This article explores the potential of Kernel-Partial Least Squares (K-PLS) regression for the analysis of data proceeding from mixture designs of experiments. Gower's idea of pseudo-sample trajectories is exploited for interpretation purposes. The results show that, when the datasets under study are affected by severe nonlinearities and comprise few observations, the proposed approach can represent a feasible lternative to classical methodologies (i.e. Scheffe polynomial fitting by means of Ordinary Least Squares - OLS - and Cox polynomial fitting by means of Partial Least Squares - PLS). Furthermore, a way of recovering the parameters of a Scheffe model (provided that it holds and has the same complexity as the K-PLS one) from the trend of the aforementioned pseudo-sample trajectories is illustrated via a simulated case-study.This research work was partially supported by the Spanish Ministry of Economy and Competitiveness under the project DPI2014-55276-C5-1R and Shell Global Solutions International B.V. (Amsterdam, The Netherlands).Vitale, R.; Palací-López, DG.; Kerkenaar, H.; Postma, G.; Buydens, L.; Ferrer, A. (2018). Kernel-Partial Least Squares regression coupled to pseudo-sample trajectories for the analysis of mixture designs of experiments. Chemometrics and Intelligent Laboratory Systems. 175:37-46. https://doi.org/10.1016/j.chemolab.2018.02.002S374617

    Gene Function Prediction from Functional Association Networks Using Kernel Partial Least Squares Regression

    Get PDF
    With the growing availability of large-scale biological datasets, automated methods of extracting functionally meaningful information from this data are becoming increasingly important. Data relating to functional association between genes or proteins, such as co-expression or functional association, is often represented in terms of gene or protein networks. Several methods of predicting gene function from these networks have been proposed. However, evaluating the relative performance of these algorithms may not be trivial: concerns have been raised over biases in different benchmarking methods and datasets, particularly relating to non-independence of functional association data and test data. In this paper we propose a new network-based gene function prediction algorithm using a commute-time kernel and partial least squares regression (Compass). We compare Compass to GeneMANIA, a leading network-based prediction algorithm, using a number of different benchmarks, and find that Compass outperforms GeneMANIA on these benchmarks. We also explicitly explore problems associated with the non-independence of functional association data and test data. We find that a benchmark based on the Gene Ontology database, which, directly or indirectly, incorporates information from other databases, may considerably overestimate the performance of algorithms exploiting functional association data for prediction
    corecore