11,869 research outputs found
Consistent Estimation of Low-Dimensional Latent Structure in High-Dimensional Data
We consider the problem of extracting a low-dimensional, linear latent
variable structure from high-dimensional random variables. Specifically, we
show that under mild conditions and when this structure manifests itself as a
linear space that spans the conditional means, it is possible to consistently
recover the structure using only information up to the second moments of these
random variables. This finding, specialized to one-parameter exponential
families whose variance function is quadratic in their means, allows for the
derivation of an explicit estimator of such latent structure. This approach
serves as a latent variable model estimator and as a tool for dimension
reduction for a high-dimensional matrix of data composed of many related
variables. Our theoretical results are verified by simulation studies and an
application to genomic data
Statistical significance of variables driving systematic variation
There are a number of well-established methods such as principal components
analysis (PCA) for automatically capturing systematic variation due to latent
variables in large-scale genomic data. PCA and related methods may directly
provide a quantitative characterization of a complex biological variable that
is otherwise difficult to precisely define or model. An unsolved problem in
this context is how to systematically identify the genomic variables that are
drivers of systematic variation captured by PCA. Principal components (and
other estimates of systematic variation) are directly constructed from the
genomic variables themselves, making measures of statistical significance
artificially inflated when using conventional methods due to over-fitting. We
introduce a new approach called the jackstraw that allows one to accurately
identify genomic variables that are statistically significantly associated with
any subset or linear combination of principal components (PCs). The proposed
method can greatly simplify complex significance testing problems encountered
in genomics and can be utilized to identify the genomic variables significantly
associated with latent variables. Using simulation, we demonstrate that our
method attains accurate measures of statistical significance over a range of
relevant scenarios. We consider yeast cell-cycle gene expression data, and show
that the proposed method can be used to straightforwardly identify
statistically significant genes that are cell-cycle regulated. We also analyze
gene expression data from post-trauma patients, allowing the gene expression
data to provide a molecularly-driven phenotype. We find a greater enrichment
for inflammatory-related gene sets compared to using a clinically defined
phenotype. The proposed method provides a useful bridge between large-scale
quantifications of systematic variation and gene-level significance analyses.Comment: 35 pages, 1 table, 6 main figures, 7 supplementary figure
Effects of electrostatic correlations on electrokinetic phenomena
Classical theory of the electric double layer is based on the fundamental
assumption of a dilute solution of point ions. There are a number of situations
such as high applied voltages, high concentration of electrolytes, systems with
multivalent ions, or solvent-free ionic liquids where the classical theory is
often applied but the fundamental assumptions cannot be justified. Perhaps the
most basic assumption underlying continuum models in electrokinetics is the
mean-field approximation, that the electric field acting on each discrete ion
is self-consistently determined by the local mean charge density. This paper
considers situations where the mean-field approximation breaks down and
electrostatic correlations become important. A fourth-order modified Poisson
equation is developed that accounts for electrostatic correlations and captures
the essential features in a simple continuum framework. The theory is derived
variationally as a gradient approximation for non-local electrostatics, in
which the dielectric permittivity becomes a differential operator. The only new
parameter is a characteristic length scale for correlated ion pairs. The model
is able to capture subtle aspects of more detailed simulations based on Monte
Carlo, molecular dynamics, or density functional theory and allows for the
straightforward calculation of electrokinetic flows in correlated liquids, for
the first time. Departures from classical Helmholtz-Smoluchowski theory are
controlled by the dimensionless ratio of the correlation length to the Debye
screening length. Charge-density oscillations tend to reduce electro-osmotic
flow and streaming current, and over-screening of the surface charge can lead
to flow reversal. These effects also help to explain the apparent
charge-induced thickening of double layers in induced-charge electrokinetic
phenomena
Multiple locus linkage analysis of genomewide expression in yeast.
With the ability to measure thousands of related phenotypes from a single biological sample, it is now feasible to genetically dissect systems-level biological phenomena. The genetics of transcriptional regulation and protein abundance are likely to be complex, meaning that genetic variation at multiple loci will influence these phenotypes. Several recent studies have investigated the role of genetic variation in transcription by applying traditional linkage analysis methods to genomewide expression data, where each gene expression level was treated as a quantitative trait and analyzed separately from one another. Here, we develop a new, computationally efficient method for simultaneously mapping multiple gene expression quantitative trait loci that directly uses all of the available data. Information shared across gene expression traits is captured in a way that makes minimal assumptions about the statistical properties of the data. The method produces easy-to-interpret measures of statistical significance for both individual loci and the overall joint significance of multiple loci selected for a given expression trait. We apply the new method to a cross between two strains of the budding yeast Saccharomyces cerevisiae, and estimate that at least 37% of all gene expression traits show two simultaneous linkages, where we have allowed for epistatic interactions. Pairs of jointly linking quantitative trait loci are identified with high confidence for 170 gene expression traits, where it is expected that both loci are true positives for at least 153 traits. In addition, we are able to show that epistatic interactions contribute to gene expression variation for at least 14% of all traits. We compare the proposed approach to an exhaustive two-dimensional scan over all pairs of loci. Surprisingly, we demonstrate that an exhaustive two-dimensional scan is less powerful than the sequential search used here. In addition, we show that a two-dimensional scan does not truly allow one to test for simultaneous linkage, and the statistical significance measured from this existing method cannot be interpreted among many traits
The Effect of Business Regulations on Nascent and Young Business Entrepreneurship
We examine the relationship, across 39 countries, between regulation and entrepreneurship using a new two-equation model. We find the minimum capital requirement required to start a business lowers entrepreneurship rates across countries, as do labour market regulations. However the administrative considerations of starting a business – such as the time, the cost, or the number of procedures required – are unrelated to the formation rate of either nascent or young businesses. Given the explicit link made by Djankov et al. (2002) between the speed and ease with which businesses may be established in a country and its economic performance – and the enthusiasm with which this link has been grasped by European Union policy makers – our findings imply this link needs reconsidering.Global Entrepreneurship Monitor;Nascent Entrepreneurship;Business Regulations;World Bank Doing Business;Young Businesses
Tau-aggregation inhibitor therapy for Alzheimer's disease
Article Accepted Date: 9 December 2013 Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.Peer reviewedPublisher PD
The Optimal Discovery Procedure: A New Approach to Simultaneous Significance Testing
Significance testing is one of the main objectives of statistics. The Neyman-Pearson lemma provides a simple rule for optimally testing a single hypothesis when the null and alternative distributions are known. This result has played a major role in the development of significance testing strategies that are used in practice. Most of the work extending single testing strategies to multiple tests has focused on formulating and estimating new types of significance measures, such as the false discovery rate. These methods tend to be based on p-values that are calculated from each test individually, ignoring information from the other tests. As shrinkage estimation borrows strength across point estimates to improve their overall performance, I show here that borrowing strength across multiple significance tests can improve their performance as well. The optimal discovery procedure (ODP) is introduced, which shows how to maximize the number of expected true positives for each fixed number of expected false positives. The optimality achieved by this procedure is shown to be closely related to optimality in terms of the false discovery rate. The ODP motivates a new approach to testing multiple hypotheses, especially when the tests are related. As a simple example, a new simultaneous procedure for testing several Normal means is defined; this is surprisingly demonstrated to outperform the optimal single test procedure, showing that an optimal method for single tests may no longer be optimal in the multiple test setting. Connections to other concepts in statistics are discussed, including Stein\u27s paradox, shrinkage estimation, and Bayesian classification theory
- …