332,440 research outputs found
Improving population-specific allele frequency estimates by adapting supplemental data: an empirical Bayes approach
Estimation of the allele frequency at genetic markers is a key ingredient in
biological and biomedical research, such as studies of human genetic variation
or of the genetic etiology of heritable traits. As genetic data becomes
increasingly available, investigators face a dilemma: when should data from
other studies and population subgroups be pooled with the primary data? Pooling
additional samples will generally reduce the variance of the frequency
estimates; however, used inappropriately, pooled estimates can be severely
biased due to population stratification. Because of this potential bias, most
investigators avoid pooling, even for samples with the same ethnic background
and residing on the same continent. Here, we propose an empirical Bayes
approach for estimating allele frequencies of single nucleotide polymorphisms.
This procedure adaptively incorporates genotypes from related samples, so that
more similar samples have a greater influence on the estimates. In every
example we have considered, our estimator achieves a mean squared error (MSE)
that is smaller than either pooling or not, and sometimes substantially
improves over both extremes. The bias introduced is small, as is shown by a
simulation study that is carefully matched to a real data example. Our method
is particularly useful when small groups of individuals are genotyped at a
large number of markers, a situation we are likely to encounter in a
genome-wide association study.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS121 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Economic Impacts of Planned Transportation Investments in New Jersey
This report demonstrates that New Jersey's plans to invest in transportation infrastructure over the next decade will result in nearly 27,000 full-time jobs per year. It also shows that the state's transportation investments will generate economic impacts in the form of employment, income, gross domestic product, and state and local tax revenues. The report is the result of a joint study conducted by the Heldrich Center and the Center for Urban Policy Research at Rutgers University's Edward J. Bloustein School of Planning and Public Policy
Network estimation in State Space Model with L1-regularization constraint
Biological networks have arisen as an attractive paradigm of genomic science
ever since the introduction of large scale genomic technologies which carried
the promise of elucidating the relationship in functional genomics. Microarray
technologies coupled with appropriate mathematical or statistical models have
made it possible to identify dynamic regulatory networks or to measure time
course of the expression level of many genes simultaneously. However one of the
few limitations fall on the high-dimensional nature of such data coupled with
the fact that these gene expression data are known to include some hidden
process. In that regards, we are concerned with deriving a method for inferring
a sparse dynamic network in a high dimensional data setting. We assume that the
observations are noisy measurements of gene expression in the form of mRNAs,
whose dynamics can be described by some unknown or hidden process. We build an
input-dependent linear state space model from these hidden states and
demonstrate how an incorporated regularization constraint in an
Expectation-Maximization (EM) algorithm can be used to reverse engineer
transcriptional networks from gene expression profiling data. This corresponds
to estimating the model interaction parameters. The proposed method is
illustrated on time-course microarray data obtained from a well established
T-cell data. At the optimum tuning parameters we found genes TRAF5, JUND, CDK4,
CASP4, CD69, and C3X1 to have higher number of inwards directed connections and
FYB, CCNA2, AKT1 and CASP8 to be genes with higher number of outwards directed
connections. We recommend these genes to be object for further investigation.
Caspase 4 is also found to activate the expression of JunD which in turn
represses the cell cycle regulator CDC2.Comment: arXiv admin note: substantial text overlap with arXiv:1308.359
The Distribution of High Redshift Galaxy Colors: Line of Sight Variations in Neutral Hydrogen Absorption
We model, via Monte Carlo simulations, the distribution of observed U-B, B-V,
V-I galaxy colors in the range 1.75<z<5 caused by variations in the
line-of-sight opacity due to neutral hydrogen (HI). We also include HI internal
to the source galaxies. Even without internal HI absorption, comparison of the
distribution of simulated colors to the analytic approximations of Madau (1995)
and Madau et al (1996) reveals systematically different mean colors and
scatter. Differences arise in part because we use more realistic distributions
of column densities and Doppler parameters. However, there are also
mathematical problems of applying mean and standard deviation opacities, and
such application yields unphysical results. These problems are corrected using
our Monte Carlo approach. Including HI absorption internal to the galaxies
generaly diminishes the scatter in the observed colors at a given redshift, but
for redshifts of interest this diminution only occurs in the colors using the
bluest band-pass. Internal column densities < 10^17 cm^2 do not effect the
observed colors, while column densities > 10^18 cm^2 yield a limiting
distribution of high redshift galaxy colors. As one application of our
analysis, we consider the sample completeness as a function of redshift for a
single spectral energy distribution (SED) given the multi-color selection
boundaries for the Hubble Deep Field proposed by Madau et al (1996). We argue
that the only correct procedure for estimating the z>3 galaxy luminosity
function from color-selected samples is to measure the (observed) distribution
of redshifts and intrinsic SED types, and then consider the variation in color
for each SED and redshift. A similar argument applies to the estimation of the
luminosity function of color-selected, high redshift QSOs.Comment: accepted for publication in ApJ; 25 pages text, 14 embedded figure
- …