6,658 research outputs found

    The Degrees of Freedom of Partial Least Squares Regression

    Get PDF
    The derivation of statistical properties for Partial Least Squares regression can be a challenging task. The reason is that the construction of latent components from the predictor variables also depends on the response variable. While this typically leads to good performance and interpretable models in practice, it makes the statistical analysis more involved. In this work, we study the intrinsic complexity of Partial Least Squares Regression. Our contribution is an unbiased estimate of its Degrees of Freedom. It is defined as the trace of the first derivative of the fitted values, seen as a function of the response. We establish two equivalent representations that rely on the close connection of Partial Least Squares to matrix decompositions and Krylov subspace techniques. We show that the Degrees of Freedom depend on the collinearity of the predictor variables: The lower the collinearity is, the higher the Degrees of Freedom are. In particular, they are typically higher than the naive approach that defines the Degrees of Freedom as the number of components. Further, we illustrate how the Degrees of Freedom approach can be used for the comparison of different regression methods. In the experimental section, we show that our Degrees of Freedom estimate in combination with information criteria is useful for model selection.Comment: to appear in the Journal of the American Statistical Associatio

    Use of partial least squares regression to impute SNP genotypes in Italian Cattle breeds

    Get PDF
    Background The objective of the present study was to test the ability of the partial least squares regression technique to impute genotypes from low density single nucleotide polymorphisms (SNP) panels i.e. 3K or 7K to a high density panel with 50K SNP. No pedigree information was used. Methods Data consisted of 2093 Holstein, 749 Brown Swiss and 479 Simmental bulls genotyped with the Illumina 50K Beadchip. First, a single-breed approach was applied by using only data from Holstein animals. Then, to enlarge the training population, data from the three breeds were combined and a multi-breed analysis was performed. Accuracies of genotypes imputed using the partial least squares regression method were compared with those obtained by using the Beagle software. The impact of genotype imputation on breeding value prediction was evaluated for milk yield, fat content and protein content. Results In the single-breed approach, the accuracy of imputation using partial least squares regression was around 90 and 94% for the 3K and 7K platforms, respectively; corresponding accuracies obtained with Beagle were around 85% and 90%. Moreover, computing time required by the partial least squares regression method was on average around 10 times lower than computing time required by Beagle. Using the partial least squares regression method in the multi-breed resulted in lower imputation accuracies than using single-breed data. The impact of the SNP-genotype imputation on the accuracy of direct genomic breeding values was small. The correlation between estimates of genetic merit obtained by using imputed versus actual genotypes was around 0.96 for the 7K chip. Conclusions Results of the present work suggested that the partial least squares regression imputation method could be useful to impute SNP genotypes when pedigree information is not available

    On the derivation of the GKLS equation for weakly coupled systems

    Full text link
    We consider the reduced dynamics of a small quantum system in interaction with a reservoir when the initial state is factorized. We present a rigorous derivation of a GKLS master equation in the weak-coupling limit for a generic bath, which is not assumed to have a bosonic or fermionic nature, and whose reference state is not necessarily thermal. The crucial assumption is a reservoir state endowed with a mixing property: the n-point connected correlation function of the interaction must be asymptotically bounded by the product of two-point functions (clustering property).Comment: 26 pages, 2 figure

    Autumn Storms Trigger Enhanced Export of Iron, Phosphorus, and Carbon from a Forested Vermont Catchment

    Get PDF
    Autumn leaf fall may be an important driver of annual stream loading in forested catchments due to the introduction of large amounts of labile organic matter. In light of climate change projections for an intensification of the autumnal hydrological cycle for northern temperate forests, there is an increasing demand to understand this leaf fall period, and the extent to which it may drive water quality. In this study, we examine the export and biogeochemical coupling of dissolved organic carbon (DOC), iron (Fe), aluminum (Al), and phosphorus (P) during autumn and summer storms to understand the effects of seasonality and storm timing and magnitude on stream loading dynamics. We utilize in situ spectrophotometric sensors to measure UV-Vis light absorbance with high temporal resolution in order to quantify rapid changes in stream chemistry during storm events. We also explore the potential to project concentrations of the aforementioned parameters using partial least squares regression (PLSR) and high frequency absorbance data. Post leaf fall autumn storms resulted in the export of 23% of total study DOC in a 2-week period, as well as the largest fluxes of Fe and Al observed over the study period. These results may have important implications for nutrient loading in the receiving water body, Lake Champlain

    Learning to predict distributions of words across domains

    Get PDF
    Although the distributional hypothesis has been applied successfully in many natural language processing tasks, systems using distributional information have been limited to a single domain because the distribution of a word can vary between domains as the word’s predominant meaning changes. However, if it were possible to predict how the distribution of a word changes from one domain to another, the predictions could be used to adapt a system trained in one domain to work in another. We propose an unsupervised method to predict the distribution of a word in one domain, given its distribution in another domain. We evaluate our method on two tasks: cross-domain part-of-speech tagging and cross-domain sentiment classification. In both tasks, our method significantly outperforms competitive baselines and returns results that are statistically comparable to current state-of-the-art methods, while requiring no task-specific customisations

    REMOTE SENSING OF FOLIAR NITROGEN IN CULTIVATED GRASSLANDS OF HUMAN DOMINATED LANDSCAPES

    Get PDF
    Foliar nitrogen (N) concentration of plant canopies plays a central role in a number of important ecosystem processes and continues to be an active subject in the field of remote sensing. Previous efforts to estimate foliar N at the landscape scale have primarily focused on intact forests and grasslands using aircraft imaging spectrometry and various techniques of statistical calibration and modeling. The present study was designed to extend this work by examining the potential to estimate the foliar N concentration of residential, agricultural and other cultivated grassland areas within a suburbanizing watershed. In conjunction with ground-based vegetation sampling, we developed Partial Least Squares (PLS) models for predicting mass-based foliar N across management types using input from airborne and field based imaging spectrometers. Results yielded strong predictive relationships for both ground- and aircraft-based sensors across sites that included turf grass, grazed pasture, hayfields and fallow fields. We also report on relationships between imaging spectrometer data and other important variables such as canopy height, biomass, and water content, results from which show strong promise for detection with high quality imaging spectrometry data and suggest that cultivated grassland offer opportunity for empirical study of canopy light dynamics. Finally, we discuss the potential for application of our results, and potential challenges, with data from the planned HyspIRI satellite, which will provide global coverage of data useful for vegetation N estimation
    • …
    corecore