103 research outputs found

    Genome-wide estimation of transcript concentrations from spotted cDNA microarray data

    Get PDF
    A method providing absolute transcript concentrations from spotted microarray intensity data is presented. Number of transcripts per µg total RNA, mRNA or per cell, are obtained for each gene, enabling comparisons of transcript levels within and between tissues. The method is based on Bayesian statistical modelling incorporating available information about the experiment from target preparation to image analysis, leading to realistically large confidence intervals for estimated concentrations. The method was validated in experiments using transcripts at known concentrations, showing accuracy and reproducibility of estimated concentrations, which were also in excellent agreement with results from quantitative real-time PCR. We determined the concentration for 10 157 genes in cervix cancers and a pool of cancer cell lines and found values in the range of 10(5)–10(10) transcripts per µg total RNA. The precision of our estimates was sufficiently high to detect significant concentration differences between two tumours and between different genes within the same tumour, comparisons that are not possible with standard intensity ratios. Our method can be used to explore the regulation of pathways and to develop individualized therapies, based on absolute transcript concentrations. It can be applied broadly, facilitating the construction of the transcriptome, continuously updating it by integrating future data

    Extended Generalised Pareto Models for Tail Estimation

    Full text link
    The most popular approach in extreme value statistics is the modelling of threshold exceedances using the asymptotically motivated generalised Pareto distribution. This approach involves the selection of a high threshold above which the model fits the data well. Sometimes, few observations of a measurement process might be recorded in applications and so selecting a high quantile of the sample as the threshold leads to almost no exceedances. In this paper we propose extensions of the generalised Pareto distribution that incorporate an additional shape parameter while keeping the tail behaviour unaffected. The inclusion of this parameter offers additional structure for the main body of the distribution, improves the stability of the modified scale, tail index and return level estimates to threshold choice and allows a lower threshold to be selected. We illustrate the benefits of the proposed models with a simulation study and two case studies.Comment: 18 pages, 7 figure

    Validation of oligoarrays for quantitative exploration of the transcriptome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Oligoarrays have become an accessible technique for exploring the transcriptome, but it is presently unclear how absolute transcript data from this technique compare to the data achieved with tag-based quantitative techniques, such as massively parallel signature sequencing (MPSS) and serial analysis of gene expression (SAGE). By use of the TransCount method we calculated absolute transcript concentrations from spotted oligoarray intensities, enabling direct comparisons with tag counts obtained with MPSS and SAGE. The tag counts were converted to number of transcripts per cell by assuming that the sum of all transcripts in a single cell was 5·10<sup>5</sup>. Our aim was to investigate whether the less resource demanding and more widespread oligoarray technique could provide data that were correlated to and had the same absolute scale as those obtained with MPSS and SAGE.</p> <p>Results</p> <p>A number of 1,777 unique transcripts were detected in common for the three technologies and served as the basis for our analyses. The correlations involving the oligoarray data were not weaker than, but, similar to the correlation between the MPSS and SAGE data, both when the entire concentration range was considered and at high concentrations. The data sets were more strongly correlated at high transcript concentrations than at low concentrations. On an absolute scale, the number of transcripts per cell and gene was generally higher based on oligoarrays than on MPSS and SAGE, and ranged from 1.6 to 9,705 for the 1,777 overlapping genes. The MPSS data were on same scale as the SAGE data, ranging from 0.5 to 3,180 (MPSS) and 9 to1,268 (SAGE) transcripts per cell and gene. The sum of all transcripts per cell for these genes was 3.8·10<sup>5 </sup>(oligoarrays), 1.1·10<sup>5 </sup>(MPSS) and 7.6·10<sup>4 </sup>(SAGE), whereas the corresponding sum for all detected transcripts was 1.1·10<sup>6 </sup>(oligoarrays), 2.8·10<sup>5 </sup>(MPSS) and 3.8·10<sup>5 </sup>(SAGE).</p> <p>Conclusion</p> <p>The oligoarrays and TransCount provide quantitative transcript concentrations that are correlated to MPSS and SAGE data, but, the absolute scale of the measurements differs across the technologies. The discrepancy questions whether the sum of all transcripts within a single cell might be higher than the number of 5·10<sup>5 </sup>suggested in the literature and used to convert tag counts to transcripts per cell. If so, this may explain the apparent higher transcript detection efficiency of the oligoarrays, and has to be clarified before absolute transcript concentrations can be interchanged across the technologies. The ability to obtain transcript concentrations from oligoarrays opens up the possibility of efficient generation of universal transcript databases with low resource demands.</p

    A Bayesian method for calculating real-time quantitative PCR calibration curves using absolute plasmid DNA standards

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In real-time quantitative PCR studies using absolute plasmid DNA standards, a calibration curve is developed to estimate an unknown DNA concentration. However, potential differences in the amplification performance of plasmid DNA compared to genomic DNA standards are often ignored in calibration calculations and in some cases impossible to characterize. A flexible statistical method that can account for uncertainty between plasmid and genomic DNA targets, replicate testing, and experiment-to-experiment variability is needed to estimate calibration curve parameters such as intercept and slope. Here we report the use of a Bayesian approach to generate calibration curves for the enumeration of target DNA from genomic DNA samples using absolute plasmid DNA standards.</p> <p>Results</p> <p>Instead of the two traditional methods (classical and inverse), a Monte Carlo Markov Chain (MCMC) estimation was used to generate single, master, and modified calibration curves. The mean and the percentiles of the posterior distribution were used as point and interval estimates of unknown parameters such as intercepts, slopes and DNA concentrations. The software WinBUGS was used to perform all simulations and to generate the posterior distributions of all the unknown parameters of interest.</p> <p>Conclusion</p> <p>The Bayesian approach defined in this study allowed for the estimation of DNA concentrations from environmental samples using absolute standard curves generated by real-time qPCR. The approach accounted for uncertainty from multiple sources such as experiment-to-experiment variation, variability between replicate measurements, as well as uncertainty introduced when employing calibration curves generated from absolute plasmid DNA standards.</p

    Regularity Properties and Pathologies of Position-Space Renormalization-Group Transformations

    Full text link
    We reconsider the conceptual foundations of the renormalization-group (RG) formalism, and prove some rigorous theorems on the regularity properties and possible pathologies of the RG map. Regarding regularity, we show that the RG map, defined on a suitable space of interactions (= formal Hamiltonians), is always single-valued and Lipschitz continuous on its domain of definition. This rules out a recently proposed scenario for the RG description of first-order phase transitions. On the pathological side, we make rigorous some arguments of Griffiths, Pearce and Israel, and prove in several cases that the renormalized measure is not a Gibbs measure for any reasonable interaction. This means that the RG map is ill-defined, and that the conventional RG description of first-order phase transitions is not universally valid. For decimation or Kadanoff transformations applied to the Ising model in dimension d3d \ge 3, these pathologies occur in a full neighborhood {β>β0,h<ϵ(β)}\{ \beta > \beta_0 ,\, |h| < \epsilon(\beta) \} of the low-temperature part of the first-order phase-transition surface. For block-averaging transformations applied to the Ising model in dimension d2d \ge 2, the pathologies occur at low temperatures for arbitrary magnetic-field strength. Pathologies may also occur in the critical region for Ising models in dimension d4d \ge 4. We discuss in detail the distinction between Gibbsian and non-Gibbsian measures, and give a rather complete catalogue of the known examples. Finally, we discuss the heuristic and numerical evidence on RG pathologies in the light of our rigorous theorems.Comment: 273 pages including 14 figures, Postscript, See also ftp.scri.fsu.edu:hep-lat/papers/9210/9210032.ps.

    Estimated Comparative Integration Hotspots Identify Different Behaviors of Retroviral Gene Transfer Vectors

    Get PDF
    Integration of retroviral vectors in the human genome follows non random patterns that favor insertional deregulation of gene expression and may cause risks of insertional mutagenesis when used in clinical gene therapy. Understanding how viral vectors integrate into the human genome is a key issue in predicting these risks. We provide a new statistical method to compare retroviral integration patterns. We identified the positions where vectors derived from the Human Immunodeficiency Virus (HIV) and the Moloney Murine Leukemia Virus (MLV) show different integration behaviors in human hematopoietic progenitor cells. Non-parametric density estimation was used to identify candidate comparative hotspots, which were then tested and ranked. We found 100 significative comparative hotspots, distributed throughout the chromosomes. HIV hotspots were wider and contained more genes than MLV ones. A Gene Ontology analysis of HIV targets showed enrichment of genes involved in antigen processing and presentation, reflecting the high HIV integration frequency observed at the MHC locus on chromosome 6. Four histone modifications/variants had a different mean density in comparative hotspots (H2AZ, H3K4me1, H3K4me3, H3K9me1), while gene expression within the comparative hotspots did not differ from background. These findings suggest the existence of epigenetic or nuclear three-dimensional topology contexts guiding retroviral integration to specific chromosome areas

    Computation of Steady-State Probability Distributions in Stochastic Models of Cellular Networks

    Get PDF
    Cellular processes are “noisy”. In each cell, concentrations of molecules are subject to random fluctuations due to the small numbers of these molecules and to environmental perturbations. While noise varies with time, it is often measured at steady state, for example by flow cytometry. When interrogating aspects of a cellular network by such steady-state measurements of network components, a key need is to develop efficient methods to simulate and compute these distributions. We describe innovations in stochastic modeling coupled with approaches to this computational challenge: first, an approach to modeling intrinsic noise via solution of the chemical master equation, and second, a convolution technique to account for contributions of extrinsic noise. We show how these techniques can be combined in a streamlined procedure for evaluation of different sources of variability in a biochemical network. Evaluation and illustrations are given in analysis of two well-characterized synthetic gene circuits, as well as a signaling network underlying the mammalian cell cycle entry

    A correction for sample overlap in genome-wide association studies in a polygenic pleiotropy-informed framework.

    Get PDF
    BACKGROUND: There is considerable evidence that many complex traits have a partially shared genetic basis, termed pleiotropy. It is therefore useful to consider integrating genome-wide association study (GWAS) data across several traits, usually at the summary statistic level. A major practical challenge arises when these GWAS have overlapping subjects. This is particularly an issue when estimating pleiotropy using methods that condition the significance of one trait on the signficance of a second, such as the covariate-modulated false discovery rate (cmfdr). RESULTS: We propose a method for correcting for sample overlap at the summary statistic level. We quantify the expected amount of spurious correlation between the summary statistics from two GWAS due to sample overlap, and use this estimated correlation in a simple linear correction that adjusts the joint distribution of test statistics from the two GWAS. The correction is appropriate for GWAS with case-control or quantitative outcomes. Our simulations and data example show that without correcting for sample overlap, the cmfdr is not properly controlled, leading to an excessive number of false discoveries and an excessive false discovery proportion. Our correction for sample overlap is effective in that it restores proper control of the false discovery rate, at very little loss in power. CONCLUSIONS: With our proposed correction, it is possible to integrate GWAS summary statistics with overlapping samples in a statistical framework that is dependent on the joint distribution of the two GWAS
    corecore