965 research outputs found

    Kernel Methods for Machine Learning with Life Science Applications

    Get PDF

    Kernel Multivariate Analysis Framework for Supervised Subspace Learning: A Tutorial on Linear and Kernel Multivariate Methods

    Full text link
    Feature extraction and dimensionality reduction are important tasks in many fields of science dealing with signal processing and analysis. The relevance of these techniques is increasing as current sensory devices are developed with ever higher resolution, and problems involving multimodal data sources become more common. A plethora of feature extraction methods are available in the literature collectively grouped under the field of Multivariate Analysis (MVA). This paper provides a uniform treatment of several methods: Principal Component Analysis (PCA), Partial Least Squares (PLS), Canonical Correlation Analysis (CCA) and Orthonormalized PLS (OPLS), as well as their non-linear extensions derived by means of the theory of reproducing kernel Hilbert spaces. We also review their connections to other methods for classification and statistical dependence estimation, and introduce some recent developments to deal with the extreme cases of large-scale and low-sized problems. To illustrate the wide applicability of these methods in both classification and regression problems, we analyze their performance in a benchmark of publicly available data sets, and pay special attention to specific real applications involving audio processing for music genre prediction and hyperspectral satellite images for Earth and climate monitoring

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio

    Density forecasts of inflation using Gaussian process regression models

    Full text link
    The present study uses Gaussian Process regression models for generating density forecasts of inflation within the New Keynesian Phillips curve (NKPC) framework. The NKPC is a structural model of inflation dynamics in which we include the output gap, inflation expectations, fuel world prices and money market interest rates as predictors. We estimate country-specific time series models for the 19 Euro Area (EA) countries. As opposed to other machine learning models, Gaussian Process regression allows estimating confidence intervals for the predictions. The performance of the proposed model is assessed in a one-step-ahead forecasting exercise. The results obtained point out the recent inflationary pressures and show the potential of Gaussian Process regression for forecasting purposes

    Supervised and Ensemble Classification of Multivariate Functional Data: Applications to Lupus Diagnosis

    Get PDF
    abstract: This dissertation investigates the classification of systemic lupus erythematosus (SLE) in the presence of non-SLE alternatives, while developing novel curve classification methodologies with wide ranging applications. Functional data representations of plasma thermogram measurements and the corresponding derivative curves provide predictors yet to be investigated for SLE identification. Functional nonparametric classifiers form a methodological basis, which is used herein to develop a) the family of ESFuNC segment-wise curve classification algorithms and b) per-pixel ensembles based on logistic regression and fused-LASSO. The proposed methods achieve test set accuracy rates as high as 94.3%, while returning information about regions of the temperature domain that are critical for population discrimination. The undertaken analyses suggest that derivate-based information contributes significantly in improved classification performance relative to recently published studies on SLE plasma thermograms.Dissertation/ThesisDoctoral Dissertation Applied Mathematics 201

    Explaining the Negative Coefficient Associated with Human Capital in Augmented Solow Growth Regressions

    Get PDF
    In this paper we consider different explanations for why the coefficient associated with human capital is often negative in growth regressions once country-specific effects are controlled for whereas the coefficient in question is strongly positive in cross-sectional or panel results based on the pooling estimator. In turn, we explore: (i) additional sources of unobserved heterogeneity stemming from country-specific rates of labor-augmenting technological change, (ii) measurement error in the human capital series being used, and (iii) the lack of variability in the human capital series once the usual covariance transformations are implemented. Remaining unobserved country-specific heterogeneity and measurement error alone are shown to be inadequate explanations. The lack of variability in the human capital series is tackled using a new GMM-based estimator that combines the Hausman-Taylor (1981) approach, in which the impact of time-invariant covariates can be identified through use of covariance transformations of the variables themselves as instruments, with the orthogonality conditions of the Arellano-Bond (1991) estimator.panel estimation, measurement error, human capital, economic growth

    The Reasonable Effectiveness of Randomness in Scalable and Integrative Gene Regulatory Network Inference and Beyond

    Get PDF
    Gene regulation is orchestrated by a vast number of molecules, including transcription factors and co-factors, chromatin regulators, as well as epigenetic mechanisms, and it has been shown that transcriptional misregulation, e.g., caused by mutations in regulatory sequences, is responsible for a plethora of diseases, including cancer, developmental or neurological disorders. As a consequence, decoding the architecture of gene regulatory networks has become one of the most important tasks in modern (computational) biology. However, to advance our understanding of the mechanisms involved in the transcriptional apparatus, we need scalable approaches that can deal with the increasing number of large-scale, high-resolution, biological datasets. In particular, such approaches need to be capable of efficiently integrating and exploiting the biological and technological heterogeneity of such datasets in order to best infer the underlying, highly dynamic regulatory networks, often in the absence of sufficient ground truth data for model training or testing. With respect to scalability, randomized approaches have proven to be a promising alternative to deterministic methods in computational biology. As an example, one of the top performing algorithms in a community challenge on gene regulatory network inference from transcriptomic data is based on a random forest regression model. In this concise survey, we aim to highlight how randomized methods may serve as a highly valuable tool, in particular, with increasing amounts of large-scale, biological experiments and datasets being collected. Given the complexity and interdisciplinary nature of the gene regulatory network inference problem, we hope our survey maybe helpful to both computational and biological scientists. It is our aim to provide a starting point for a dialogue about the concepts, benefits, and caveats of the toolbox of randomized methods, since unravelling the intricate web of highly dynamic, regulatory events will be one fundamental step in understanding the mechanisms of life and eventually developing efficient therapies to treat and cure diseases

    Large-Scale Galaxy Bias

    Full text link
    This review presents a comprehensive overview of galaxy bias, that is, the statistical relation between the distribution of galaxies and matter. We focus on large scales where cosmic density fields are quasi-linear. On these scales, the clustering of galaxies can be described by a perturbative bias expansion, and the complicated physics of galaxy formation is absorbed by a finite set of coefficients of the expansion, called bias parameters. The review begins with a detailed derivation of this very important result, which forms the basis of the rigorous perturbative description of galaxy clustering, under the assumptions of General Relativity and Gaussian, adiabatic initial conditions. Key components of the bias expansion are all leading local gravitational observables, which include the matter density but also tidal fields and their time derivatives. We hence expand the definition of local bias to encompass all these contributions. This derivation is followed by a presentation of the peak-background split in its general form, which elucidates the physical meaning of the bias parameters, and a detailed description of the connection between bias parameters and galaxy statistics. We then review the excursion-set formalism and peak theory which provide predictions for the values of the bias parameters. In the remainder of the review, we consider the generalizations of galaxy bias required in the presence of various types of cosmological physics that go beyond pressureless matter with adiabatic, Gaussian initial conditions: primordial non-Gaussianity, massive neutrinos, baryon-CDM isocurvature perturbations, dark energy, and modified gravity. Finally, we discuss how the description of galaxy bias in the galaxies' rest frame is related to clustering statistics measured from the observed angular positions and redshifts in actual galaxy catalogs.Comment: 259 pages, 39 figures, 15 tables; published in Physics Reports; v2: minor corrections and clarifications, references added; v3: substantially revised and improved version; v4: minor edits and clarifications reflecting published version, corrected mistakes in Eqs. (7.57)-(7.58); v5: minor corrections [Eq. (5.5)] and updated reference
    • …
    corecore