3,033 research outputs found
The Degrees of Freedom of Partial Least Squares Regression
The derivation of statistical properties for Partial Least Squares regression
can be a challenging task. The reason is that the construction of latent
components from the predictor variables also depends on the response variable.
While this typically leads to good performance and interpretable models in
practice, it makes the statistical analysis more involved. In this work, we
study the intrinsic complexity of Partial Least Squares Regression. Our
contribution is an unbiased estimate of its Degrees of Freedom. It is defined
as the trace of the first derivative of the fitted values, seen as a function
of the response. We establish two equivalent representations that rely on the
close connection of Partial Least Squares to matrix decompositions and Krylov
subspace techniques. We show that the Degrees of Freedom depend on the
collinearity of the predictor variables: The lower the collinearity is, the
higher the Degrees of Freedom are. In particular, they are typically higher
than the naive approach that defines the Degrees of Freedom as the number of
components. Further, we illustrate how the Degrees of Freedom approach can be
used for the comparison of different regression methods. In the experimental
section, we show that our Degrees of Freedom estimate in combination with
information criteria is useful for model selection.Comment: to appear in the Journal of the American Statistical Associatio
Fermi level pinning induced electrostatic fields and band bending at organic heterojunctions
The energy level alignment at interfaces between organic semiconductors is of direct relevance to understand charge carrier generation and recombination in organic electronic devices. Commonly, work function changes observed upon interface formation are interpreted as interface dipoles. In this study, using ultraviolet and X ray photoelectron spectroscopy, complemented by electrostatic calculations, we find a huge work function decrease of up to 1.4 amp; 8201;eV at the C60 bottom layer zinc phthalocyanine ZnPc, top layer interface prepared on a molybdenum trioxide MoO3 substrate. However, detailed measurements of the energy level shifts and electrostatic calculations reveal that no interface dipole occurs. Instead, upon ZnPc deposition, a linear electrostatic potential gradient is generated across the C60 layer due to Fermi level pinning of ZnPc on the high work function C60 MoO3 substrate, and associated band bending within the ZnPc layer. This finding is generally of importance for understanding organic heterojunctions when Fermi level pinning is involved, as induced electrostatic fields alter the energy level alignment significantl
GABA(A) receptor phospho-dependent modulation is regulated by phospholipase C-related inactive protein type 1, a novel protein phosphatase 1 anchoring protein
GABA(A) receptors are critical in controlling neuronal activity. Here, we examined the role for phospholipase C-related inactive protein type 1 (PRIP-1), which binds and inactivates protein phosphatase 1alpha (PP1alpha) in facilitating GABA(A) receptor phospho-dependent regulation using PRIP-1(-/-) mice. In wild-type animals, robust phosphorylation and functional modulation of GABA(A) receptors containing beta3 subunits by cAMP-dependent protein kinase was evident, which was diminished in PRIP-1(-/-) mice. PRIP-1(-/-) mice exhibited enhanced PP1alpha activity compared with controls. Furthermore, PRIP-1 was able to interact directly with GABA(A) receptor beta subunits, and moreover, these proteins were found to be PP1alpha substrates. Finally, phosphorylation of PRIP-1 on threonine 94 facilitated the dissociation of PP1alpha-PRIP-1 complexes, providing a local mechanism for the activation of PP1alpha. Together, these results suggest an essential role for PRIP-1 in controlling GABA(A) receptor activity via regulating subunit phosphorylation and thereby the efficacy of neuronal inhibition mediated by these receptors
Probability Models for Degree Distributions of Protein Interaction Networks
The degree distribution of many biological and technological networks has
been described as a power-law distribution. While the degree distribution does
not capture all aspects of a network, it has often been suggested that its
functional form contains important clues as to underlying evolutionary
processes that have shaped the network. Generally, the functional form for the
degree distribution has been determined in an ad-hoc fashion, with clear
power-law like behaviour often only extending over a limited range of
connectivities. Here we apply formal model selection techniques to decide which
probability distribution best describes the degree distributions of protein
interaction networks. Contrary to previous studies this well defined approach
suggests that the degree distribution of many molecular networks is often
better described by distributions other than the popular power-law
distribution. This, in turn, suggests that simple, if elegant, models may not
necessarily help in the quantitative understanding of complex biological
processes.
Detecting periodicity in experimental data using linear modeling techniques
Fourier spectral estimates and, to a lesser extent, the autocorrelation
function are the primary tools to detect periodicities in experimental data in
the physical and biological sciences. We propose a new method which is more
reliable than traditional techniques, and is able to make clear identification
of periodic behavior when traditional techniques do not. This technique is
based on an information theoretic reduction of linear (autoregressive) models
so that only the essential features of an autoregressive model are retained.
These models we call reduced autoregressive models (RARM). The essential
features of reduced autoregressive models include any periodicity present in
the data. We provide theoretical and numerical evidence from both experimental
and artificial data, to demonstrate that this technique will reliably detect
periodicities if and only if they are present in the data. There are strong
information theoretic arguments to support the statement that RARM detects
periodicities if they are present. Surrogate data techniques are used to ensure
the converse. Furthermore, our calculations demonstrate that RARM is more
robust, more accurate, and more sensitive, than traditional spectral
techniques.Comment: 10 pages (revtex) and 6 figures. To appear in Phys Rev E. Modified
styl
Analyzing the House Fly's Exploratory Behavior with Autoregression Methods
This paper presents a detailed characterization of the trajectory of a single
housefly with free range of a square cage. The trajectory of the fly was
recorded and transformed into a time series, which was fully analyzed using an
autoregressive model, which describes a stationary time series by a linear
regression of prior state values with the white noise. The main discovery was
that the fly switched styles of motion from a low dimensional regular pattern
to a higher dimensional disordered pattern. This discovered exploratory
behavior is, irrespective of the presence of food, characterized by anomalous
diffusion.Comment: 20 pages, 9 figures, 1 table, full pape
Inferring Social Ties in Academic Networks Using Short-Range Wireless Communications
International audienceWiFi base stations are increasingly deployed in both public spaces and private companies, and the increase in their density poses a significant threat to the privacy of connected users. Prior studies have provided evidence that it is possible to infer the social ties of users from their location and co-location traces but they lack one important component: the comparison of the inference accuracy between an internal attacker (e.g., a curious application running on a mobile device) and a realistic external eavesdropper in the same field trial. In this paper, we experimentally show that such an eavesdropper is able to infer the type of social relationships between mobile users better than an internal attacker. Moreover, our results indicate that by exploiting the underlying social community structure of mobile users, the accuracy of the inference attacks doubles. Based on our findings, we propose countermeasures to help users protect their privacy against eavesdroppers
Statistical methods in cosmology
The advent of large data-set in cosmology has meant that in the past 10 or 20
years our knowledge and understanding of the Universe has changed not only
quantitatively but also, and most importantly, qualitatively. Cosmologists rely
on data where a host of useful information is enclosed, but is encoded in a
non-trivial way. The challenges in extracting this information must be overcome
to make the most of a large experimental effort. Even after having converged to
a standard cosmological model (the LCDM model) we should keep in mind that this
model is described by 10 or more physical parameters and if we want to study
deviations from it, the number of parameters is even larger. Dealing with such
a high dimensional parameter space and finding parameters constraints is a
challenge on itself. Cosmologists want to be able to compare and combine
different data sets both for testing for possible disagreements (which could
indicate new physics) and for improving parameter determinations. Finally,
cosmologists in many cases want to find out, before actually doing the
experiment, how much one would be able to learn from it. For all these reasons,
sophisiticated statistical techniques are being employed in cosmology, and it
has become crucial to know some statistical background to understand recent
literature in the field. I will introduce some statistical tools that any
cosmologist should know about in order to be able to understand recently
published results from the analysis of cosmological data sets. I will not
present a complete and rigorous introduction to statistics as there are several
good books which are reported in the references. The reader should refer to
those.Comment: 31, pages, 6 figures, notes from 2nd Trans-Regio Winter school in
Passo del Tonale. To appear in Lectures Notes in Physics, "Lectures on
cosmology: Accelerated expansion of the universe" Feb 201
Selective Constraints on Amino Acids Estimated by a Mechanistic Codon Substitution Model with Multiple Nucleotide Changes
Empirical substitution matrices represent the average tendencies of
substitutions over various protein families by sacrificing gene-level
resolution. We develop a codon-based model, in which mutational tendencies of
codon, a genetic code, and the strength of selective constraints against amino
acid replacements can be tailored to a given gene. First, selective constraints
averaged over proteins are estimated by maximizing the likelihood of each 1-PAM
matrix of empirical amino acid (JTT, WAG, and LG) and codon (KHG) substitution
matrices. Then, selective constraints specific to given proteins are
approximated as a linear function of those estimated from the empirical
substitution matrices.
Akaike information criterion (AIC) values indicate that a model allowing
multiple nucleotide changes fits the empirical substitution matrices
significantly better. Also, the ML estimates of transition-transversion bias
obtained from these empirical matrices are not so large as previously
estimated. The selective constraints are characteristic of proteins rather than
species. However, their relative strengths among amino acid pairs can be
approximated not to depend very much on protein families but amino acid pairs,
because the present model, in which selective constraints are approximated to
be a linear function of those estimated from the JTT/WAG/LG/KHG matrices, can
provide a good fit to other empirical substitution matrices including cpREV for
chloroplast proteins and mtREV for vertebrate mitochondrial proteins.
The present codon-based model with the ML estimates of selective constraints
and with adjustable mutation rates of nucleotide would be useful as a simple
substitution model in ML and Bayesian inferences of molecular phylogenetic
trees, and enables us to obtain biologically meaningful information at both
nucleotide and amino acid levels from codon and protein sequences.Comment: Table 9 in this article includes corrections for errata in the Table
9 published in 10.1371/journal.pone.0017244. Supporting information is
attached at the end of the article, and a computer-readable dataset of the ML
estimates of selective constraints is available from
10.1371/journal.pone.001724
- …