Search CORE

332,440 research outputs found

Improving population-specific allele frequency estimates by adapting supplemental data: an empirical Bayes approach

Author: Coram Marc
Tang Hua
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 12/12/2007
Field of study

Estimation of the allele frequency at genetic markers is a key ingredient in biological and biomedical research, such as studies of human genetic variation or of the genetic etiology of heritable traits. As genetic data becomes increasingly available, investigators face a dilemma: when should data from other studies and population subgroups be pooled with the primary data? Pooling additional samples will generally reduce the variance of the frequency estimates; however, used inappropriately, pooled estimates can be severely biased due to population stratification. Because of this potential bias, most investigators avoid pooling, even for samples with the same ethnic background and residing on the same continent. Here, we propose an empirical Bayes approach for estimating allele frequencies of single nucleotide polymorphisms. This procedure adaptively incorporates genotypes from related samples, so that more similar samples have a greater influence on the estimates. In every example we have considered, our estimator achieves a mean squared error (MSE) that is smaller than either pooling or not, and sometimes substantially improves over both extremes. The bias introduced is small, as is shown by a simulation study that is carefully matched to a real data example. Our method is particularly useful when small groups of individuals are genotyped at a large number of markers, a situation we are likely to encounter in a genome-wide association study.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS121 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Economic Impacts of Planned Transportation Investments in New Jersey

Author: Aaron R. Fichtner
Michael L. Lahr
Will Irving
Publication venue: John J. Heldrich Center for Workforce Development
Publication date: 04/04/2008
Field of study

This report demonstrates that New Jersey's plans to invest in transportation infrastructure over the next decade will result in nearly 27,000 full-time jobs per year. It also shows that the state's transportation investments will generate economic impacts in the form of employment, income, gross domestic product, and state and local tax revenues. The report is the result of a joint study conducted by the Heldrich Center and the Center for Urban Policy Research at Rutgers University's Edward J. Bloustein School of Planning and Public Policy

IssueLab

Network estimation in State Space Model with L1-regularization constraint

Author: Lotsi Anani
Wit Ernst
Publication venue
Publication date: 01/01/2013
Field of study

Biological networks have arisen as an attractive paradigm of genomic science ever since the introduction of large scale genomic technologies which carried the promise of elucidating the relationship in functional genomics. Microarray technologies coupled with appropriate mathematical or statistical models have made it possible to identify dynamic regulatory networks or to measure time course of the expression level of many genes simultaneously. However one of the few limitations fall on the high-dimensional nature of such data coupled with the fact that these gene expression data are known to include some hidden process. In that regards, we are concerned with deriving a method for inferring a sparse dynamic network in a high dimensional data setting. We assume that the observations are noisy measurements of gene expression in the form of mRNAs, whose dynamics can be described by some unknown or hidden process. We build an input-dependent linear state space model from these hidden states and demonstrate how an incorporated

L_{1}

regularization constraint in an Expectation-Maximization (EM) algorithm can be used to reverse engineer transcriptional networks from gene expression profiling data. This corresponds to estimating the model interaction parameters. The proposed method is illustrated on time-course microarray data obtained from a well established T-cell data. At the optimum tuning parameters we found genes TRAF5, JUND, CDK4, CASP4, CD69, and C3X1 to have higher number of inwards directed connections and FYB, CCNA2, AKT1 and CASP8 to be genes with higher number of outwards directed connections. We recommend these genes to be object for further investigation. Caspase 4 is also found to activate the expression of JunD which in turn represses the cell cycle regulator CDC2.Comment: arXiv admin note: substantial text overlap with arXiv:1308.359

arXiv.org e-Print Archive

CiteSeerX

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Secondary mathematics guidance papers: summer 2008

Author
Publication venue: Department for Education (DFE)
Publication date: 01/01/2008
Field of study

Digital Education Resource Archive

The Distribution of High Redshift Galaxy Colors: Line of Sight Variations in Neutral Hydrogen Absorption

Author: Deharveng J.-M.
Giallongo E.
Jane C. Charlton
Janet M. Geoffroy
Madau P.
Matthew A. Bershady
Patel K.
Steidel C. C.
Publication venue: 'University of Chicago Press'
Publication date: 01/01/1999
Field of study

We model, via Monte Carlo simulations, the distribution of observed U-B, B-V, V-I galaxy colors in the range 1.75<z<5 caused by variations in the line-of-sight opacity due to neutral hydrogen (HI). We also include HI internal to the source galaxies. Even without internal HI absorption, comparison of the distribution of simulated colors to the analytic approximations of Madau (1995) and Madau et al (1996) reveals systematically different mean colors and scatter. Differences arise in part because we use more realistic distributions of column densities and Doppler parameters. However, there are also mathematical problems of applying mean and standard deviation opacities, and such application yields unphysical results. These problems are corrected using our Monte Carlo approach. Including HI absorption internal to the galaxies generaly diminishes the scatter in the observed colors at a given redshift, but for redshifts of interest this diminution only occurs in the colors using the bluest band-pass. Internal column densities < 10^17 cm^2 do not effect the observed colors, while column densities > 10^18 cm^2 yield a limiting distribution of high redshift galaxy colors. As one application of our analysis, we consider the sample completeness as a function of redshift for a single spectral energy distribution (SED) given the multi-color selection boundaries for the Hubble Deep Field proposed by Madau et al (1996). We argue that the only correct procedure for estimating the z>3 galaxy luminosity function from color-selected samples is to measure the (observed) distribution of redshifts and intrinsic SED types, and then consider the variation in color for each SED and redshift. A similar argument applies to the estimation of the luminosity function of color-selected, high redshift QSOs.Comment: accepted for publication in ApJ; 25 pages text, 14 embedded figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server