6,658 research outputs found
The Degrees of Freedom of Partial Least Squares Regression
The derivation of statistical properties for Partial Least Squares regression
can be a challenging task. The reason is that the construction of latent
components from the predictor variables also depends on the response variable.
While this typically leads to good performance and interpretable models in
practice, it makes the statistical analysis more involved. In this work, we
study the intrinsic complexity of Partial Least Squares Regression. Our
contribution is an unbiased estimate of its Degrees of Freedom. It is defined
as the trace of the first derivative of the fitted values, seen as a function
of the response. We establish two equivalent representations that rely on the
close connection of Partial Least Squares to matrix decompositions and Krylov
subspace techniques. We show that the Degrees of Freedom depend on the
collinearity of the predictor variables: The lower the collinearity is, the
higher the Degrees of Freedom are. In particular, they are typically higher
than the naive approach that defines the Degrees of Freedom as the number of
components. Further, we illustrate how the Degrees of Freedom approach can be
used for the comparison of different regression methods. In the experimental
section, we show that our Degrees of Freedom estimate in combination with
information criteria is useful for model selection.Comment: to appear in the Journal of the American Statistical Associatio
Use of partial least squares regression to impute SNP genotypes in Italian Cattle breeds
Background
The objective of the present study was to test the ability of the partial least squares regression technique to impute genotypes from low density single nucleotide polymorphisms (SNP) panels i.e. 3K or 7K to a high density panel with 50K SNP. No pedigree information was used.
Methods
Data consisted of 2093 Holstein, 749 Brown Swiss and 479 Simmental bulls genotyped with the Illumina 50K Beadchip. First, a single-breed approach was applied by using only data from Holstein animals. Then, to enlarge the training population, data from the three breeds were combined and a multi-breed analysis was performed. Accuracies of genotypes imputed using the partial least squares regression method were compared with those obtained by using the Beagle software. The impact of genotype imputation on breeding value prediction was evaluated for milk yield, fat content and protein content.
Results
In the single-breed approach, the accuracy of imputation using partial least squares regression was around 90 and 94% for the 3K and 7K platforms, respectively; corresponding accuracies obtained with Beagle were around 85% and 90%. Moreover, computing time required by the partial least squares regression method was on average around 10 times lower than computing time required by Beagle. Using the partial least squares regression method in the multi-breed resulted in lower imputation accuracies than using single-breed data. The impact of the SNP-genotype imputation on the accuracy of direct genomic breeding values was small. The correlation between estimates of genetic merit obtained by using imputed versus actual genotypes was around 0.96 for the 7K chip.
Conclusions
Results of the present work suggested that the partial least squares regression imputation method could be useful to impute SNP genotypes when pedigree information is not available
On the derivation of the GKLS equation for weakly coupled systems
We consider the reduced dynamics of a small quantum system in interaction
with a reservoir when the initial state is factorized. We present a rigorous
derivation of a GKLS master equation in the weak-coupling limit for a generic
bath, which is not assumed to have a bosonic or fermionic nature, and whose
reference state is not necessarily thermal. The crucial assumption is a
reservoir state endowed with a mixing property: the n-point connected
correlation function of the interaction must be asymptotically bounded by the
product of two-point functions (clustering property).Comment: 26 pages, 2 figure
Autumn Storms Trigger Enhanced Export of Iron, Phosphorus, and Carbon from a Forested Vermont Catchment
Autumn leaf fall may be an important driver of annual stream loading in forested catchments due to the introduction of large amounts of labile organic matter. In light of climate change projections for an intensification of the autumnal hydrological cycle for northern temperate forests, there is an increasing demand to understand this leaf fall period, and the extent to which it may drive water quality. In this study, we examine the export and biogeochemical coupling of dissolved organic carbon (DOC), iron (Fe), aluminum (Al), and phosphorus (P) during autumn and summer storms to understand the effects of seasonality and storm timing and magnitude on stream loading dynamics. We utilize in situ spectrophotometric sensors to measure UV-Vis light absorbance with high temporal resolution in order to quantify rapid changes in stream chemistry during storm events. We also explore the potential to project concentrations of the aforementioned parameters using partial least squares regression (PLSR) and high frequency absorbance data. Post leaf fall autumn storms resulted in the export of 23% of total study DOC in a 2-week period, as well as the largest fluxes of Fe and Al observed over the study period. These results may have important implications for nutrient loading in the receiving water body, Lake Champlain
Learning to predict distributions of words across domains
Although the distributional hypothesis has been applied successfully in many natural language processing tasks, systems using distributional information have been limited to a single domain because the distribution of a word can vary between domains as the word’s predominant meaning changes. However, if it were possible to predict how the distribution of a word changes from one domain to another, the predictions could be used to adapt a system trained in one domain to work in another. We propose an unsupervised method to predict the distribution of a word in one domain, given its distribution in another domain. We evaluate our method on two tasks: cross-domain part-of-speech tagging and cross-domain sentiment classification. In both tasks, our method significantly outperforms competitive baselines and returns results that are statistically comparable to current state-of-the-art methods, while requiring no task-specific customisations
REMOTE SENSING OF FOLIAR NITROGEN IN CULTIVATED GRASSLANDS OF HUMAN DOMINATED LANDSCAPES
Foliar nitrogen (N) concentration of plant canopies plays a central role in a number of important ecosystem processes and continues to be an active subject in the field of remote sensing. Previous efforts to estimate foliar N at the landscape scale have primarily focused on intact forests and grasslands using aircraft imaging spectrometry and various techniques of statistical calibration and modeling. The present study was designed to extend this work by examining the potential to estimate the foliar N concentration of residential, agricultural and other cultivated grassland areas within a suburbanizing watershed. In conjunction with ground-based vegetation sampling, we developed Partial Least Squares (PLS) models for predicting mass-based foliar N across management types using input from airborne and field based imaging spectrometers. Results yielded strong predictive relationships for both ground- and aircraft-based sensors across sites that included turf grass, grazed pasture, hayfields and fallow fields. We also report on relationships between imaging spectrometer data and other important variables such as canopy height, biomass, and water content, results from which show strong promise for detection with high quality imaging spectrometry data and suggest that cultivated grassland offer opportunity for empirical study of canopy light dynamics. Finally, we discuss the potential for application of our results, and potential challenges, with data from the planned HyspIRI satellite, which will provide global coverage of data useful for vegetation N estimation
- …