3,033 research outputs found

    The Degrees of Freedom of Partial Least Squares Regression

    Get PDF
    The derivation of statistical properties for Partial Least Squares regression can be a challenging task. The reason is that the construction of latent components from the predictor variables also depends on the response variable. While this typically leads to good performance and interpretable models in practice, it makes the statistical analysis more involved. In this work, we study the intrinsic complexity of Partial Least Squares Regression. Our contribution is an unbiased estimate of its Degrees of Freedom. It is defined as the trace of the first derivative of the fitted values, seen as a function of the response. We establish two equivalent representations that rely on the close connection of Partial Least Squares to matrix decompositions and Krylov subspace techniques. We show that the Degrees of Freedom depend on the collinearity of the predictor variables: The lower the collinearity is, the higher the Degrees of Freedom are. In particular, they are typically higher than the naive approach that defines the Degrees of Freedom as the number of components. Further, we illustrate how the Degrees of Freedom approach can be used for the comparison of different regression methods. In the experimental section, we show that our Degrees of Freedom estimate in combination with information criteria is useful for model selection.Comment: to appear in the Journal of the American Statistical Associatio

    Fermi level pinning induced electrostatic fields and band bending at organic heterojunctions

    Get PDF
    The energy level alignment at interfaces between organic semiconductors is of direct relevance to understand charge carrier generation and recombination in organic electronic devices. Commonly, work function changes observed upon interface formation are interpreted as interface dipoles. In this study, using ultraviolet and X ray photoelectron spectroscopy, complemented by electrostatic calculations, we find a huge work function decrease of up to 1.4 amp; 8201;eV at the C60 bottom layer zinc phthalocyanine ZnPc, top layer interface prepared on a molybdenum trioxide MoO3 substrate. However, detailed measurements of the energy level shifts and electrostatic calculations reveal that no interface dipole occurs. Instead, upon ZnPc deposition, a linear electrostatic potential gradient is generated across the C60 layer due to Fermi level pinning of ZnPc on the high work function C60 MoO3 substrate, and associated band bending within the ZnPc layer. This finding is generally of importance for understanding organic heterojunctions when Fermi level pinning is involved, as induced electrostatic fields alter the energy level alignment significantl

    GABA(A) receptor phospho-dependent modulation is regulated by phospholipase C-related inactive protein type 1, a novel protein phosphatase 1 anchoring protein

    Get PDF
    GABA(A) receptors are critical in controlling neuronal activity. Here, we examined the role for phospholipase C-related inactive protein type 1 (PRIP-1), which binds and inactivates protein phosphatase 1alpha (PP1alpha) in facilitating GABA(A) receptor phospho-dependent regulation using PRIP-1(-/-) mice. In wild-type animals, robust phosphorylation and functional modulation of GABA(A) receptors containing beta3 subunits by cAMP-dependent protein kinase was evident, which was diminished in PRIP-1(-/-) mice. PRIP-1(-/-) mice exhibited enhanced PP1alpha activity compared with controls. Furthermore, PRIP-1 was able to interact directly with GABA(A) receptor beta subunits, and moreover, these proteins were found to be PP1alpha substrates. Finally, phosphorylation of PRIP-1 on threonine 94 facilitated the dissociation of PP1alpha-PRIP-1 complexes, providing a local mechanism for the activation of PP1alpha. Together, these results suggest an essential role for PRIP-1 in controlling GABA(A) receptor activity via regulating subunit phosphorylation and thereby the efficacy of neuronal inhibition mediated by these receptors

    Probability Models for Degree Distributions of Protein Interaction Networks

    Full text link
    The degree distribution of many biological and technological networks has been described as a power-law distribution. While the degree distribution does not capture all aspects of a network, it has often been suggested that its functional form contains important clues as to underlying evolutionary processes that have shaped the network. Generally, the functional form for the degree distribution has been determined in an ad-hoc fashion, with clear power-law like behaviour often only extending over a limited range of connectivities. Here we apply formal model selection techniques to decide which probability distribution best describes the degree distributions of protein interaction networks. Contrary to previous studies this well defined approach suggests that the degree distribution of many molecular networks is often better described by distributions other than the popular power-law distribution. This, in turn, suggests that simple, if elegant, models may not necessarily help in the quantitative understanding of complex biological processes.

    Detecting periodicity in experimental data using linear modeling techniques

    Get PDF
    Fourier spectral estimates and, to a lesser extent, the autocorrelation function are the primary tools to detect periodicities in experimental data in the physical and biological sciences. We propose a new method which is more reliable than traditional techniques, and is able to make clear identification of periodic behavior when traditional techniques do not. This technique is based on an information theoretic reduction of linear (autoregressive) models so that only the essential features of an autoregressive model are retained. These models we call reduced autoregressive models (RARM). The essential features of reduced autoregressive models include any periodicity present in the data. We provide theoretical and numerical evidence from both experimental and artificial data, to demonstrate that this technique will reliably detect periodicities if and only if they are present in the data. There are strong information theoretic arguments to support the statement that RARM detects periodicities if they are present. Surrogate data techniques are used to ensure the converse. Furthermore, our calculations demonstrate that RARM is more robust, more accurate, and more sensitive, than traditional spectral techniques.Comment: 10 pages (revtex) and 6 figures. To appear in Phys Rev E. Modified styl

    Analyzing the House Fly's Exploratory Behavior with Autoregression Methods

    Full text link
    This paper presents a detailed characterization of the trajectory of a single housefly with free range of a square cage. The trajectory of the fly was recorded and transformed into a time series, which was fully analyzed using an autoregressive model, which describes a stationary time series by a linear regression of prior state values with the white noise. The main discovery was that the fly switched styles of motion from a low dimensional regular pattern to a higher dimensional disordered pattern. This discovered exploratory behavior is, irrespective of the presence of food, characterized by anomalous diffusion.Comment: 20 pages, 9 figures, 1 table, full pape

    Inferring Social Ties in Academic Networks Using Short-Range Wireless Communications

    Get PDF
    International audienceWiFi base stations are increasingly deployed in both public spaces and private companies, and the increase in their density poses a significant threat to the privacy of connected users. Prior studies have provided evidence that it is possible to infer the social ties of users from their location and co-location traces but they lack one important component: the comparison of the inference accuracy between an internal attacker (e.g., a curious application running on a mobile device) and a realistic external eavesdropper in the same field trial. In this paper, we experimentally show that such an eavesdropper is able to infer the type of social relationships between mobile users better than an internal attacker. Moreover, our results indicate that by exploiting the underlying social community structure of mobile users, the accuracy of the inference attacks doubles. Based on our findings, we propose countermeasures to help users protect their privacy against eavesdroppers

    Statistical methods in cosmology

    Full text link
    The advent of large data-set in cosmology has meant that in the past 10 or 20 years our knowledge and understanding of the Universe has changed not only quantitatively but also, and most importantly, qualitatively. Cosmologists rely on data where a host of useful information is enclosed, but is encoded in a non-trivial way. The challenges in extracting this information must be overcome to make the most of a large experimental effort. Even after having converged to a standard cosmological model (the LCDM model) we should keep in mind that this model is described by 10 or more physical parameters and if we want to study deviations from it, the number of parameters is even larger. Dealing with such a high dimensional parameter space and finding parameters constraints is a challenge on itself. Cosmologists want to be able to compare and combine different data sets both for testing for possible disagreements (which could indicate new physics) and for improving parameter determinations. Finally, cosmologists in many cases want to find out, before actually doing the experiment, how much one would be able to learn from it. For all these reasons, sophisiticated statistical techniques are being employed in cosmology, and it has become crucial to know some statistical background to understand recent literature in the field. I will introduce some statistical tools that any cosmologist should know about in order to be able to understand recently published results from the analysis of cosmological data sets. I will not present a complete and rigorous introduction to statistics as there are several good books which are reported in the references. The reader should refer to those.Comment: 31, pages, 6 figures, notes from 2nd Trans-Regio Winter school in Passo del Tonale. To appear in Lectures Notes in Physics, "Lectures on cosmology: Accelerated expansion of the universe" Feb 201

    Selective Constraints on Amino Acids Estimated by a Mechanistic Codon Substitution Model with Multiple Nucleotide Changes

    Get PDF
    Empirical substitution matrices represent the average tendencies of substitutions over various protein families by sacrificing gene-level resolution. We develop a codon-based model, in which mutational tendencies of codon, a genetic code, and the strength of selective constraints against amino acid replacements can be tailored to a given gene. First, selective constraints averaged over proteins are estimated by maximizing the likelihood of each 1-PAM matrix of empirical amino acid (JTT, WAG, and LG) and codon (KHG) substitution matrices. Then, selective constraints specific to given proteins are approximated as a linear function of those estimated from the empirical substitution matrices. Akaike information criterion (AIC) values indicate that a model allowing multiple nucleotide changes fits the empirical substitution matrices significantly better. Also, the ML estimates of transition-transversion bias obtained from these empirical matrices are not so large as previously estimated. The selective constraints are characteristic of proteins rather than species. However, their relative strengths among amino acid pairs can be approximated not to depend very much on protein families but amino acid pairs, because the present model, in which selective constraints are approximated to be a linear function of those estimated from the JTT/WAG/LG/KHG matrices, can provide a good fit to other empirical substitution matrices including cpREV for chloroplast proteins and mtREV for vertebrate mitochondrial proteins. The present codon-based model with the ML estimates of selective constraints and with adjustable mutation rates of nucleotide would be useful as a simple substitution model in ML and Bayesian inferences of molecular phylogenetic trees, and enables us to obtain biologically meaningful information at both nucleotide and amino acid levels from codon and protein sequences.Comment: Table 9 in this article includes corrections for errata in the Table 9 published in 10.1371/journal.pone.0017244. Supporting information is attached at the end of the article, and a computer-readable dataset of the ML estimates of selective constraints is available from 10.1371/journal.pone.001724
    • …
    corecore