46 research outputs found

    ConReg-R: Extrapolative recalibration of the empirical distribution of p-values to improve false discovery rate estimates

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>False discovery rate (FDR) control is commonly accepted as the most appropriate error control in multiple hypothesis testing problems. The accuracy of FDR estimation depends on the accuracy of the estimation of p-values from each test and validity of the underlying assumptions of the distribution. However, in many practical testing problems such as in genomics, the p-values could be under-estimated or over-estimated for many known or unknown reasons. Consequently, FDR estimation would then be influenced and lose its veracity.</p> <p>Results</p> <p>We propose a new extrapolative method called <it>Constrained Regression Recalibration </it>(ConReg-R) to recalibrate the empirical p-values by modeling their distribution to improve the FDR estimates. Our ConReg-R method is based on the observation that accurately estimated p-values from true null hypotheses follow uniform distribution and the observed distribution of p-values is indeed a mixture of distributions of p-values from true null hypotheses and true alternative hypotheses. Hence, ConReg-R recalibrates the observed p-values so that they exhibit the properties of an ideal empirical p-value distribution. The proportion of true null hypotheses (<it>π</it><sub>0</sub>) and FDR are estimated after the recalibration.</p> <p>Conclusions</p> <p>ConReg-R provides an efficient way to improve the FDR estimates. It only requires the p-values from the tests and avoids permutation of the original test data. We demonstrate that the proposed method significantly improves FDR estimation on several gene expression datasets obtained from microarray and RNA-seq experiments.</p> <p>Reviewers</p> <p>The manuscript was reviewed by Prof. Vladimir Kuznetsov, Prof. Philippe Broet, and Prof. Hongfang Liu (nominated by Prof. Yuriy Gusev).</p

    Genome-wide estimation of firing efficiencies of origins of DNA replication from time-course copy number variation data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>DNA replication is a fundamental biological process during S phase of cell division. It is initiated from several hundreds of origins along whole chromosome with different firing efficiencies (or frequency of usage). Direct measurement of origin firing efficiency by techniques such as DNA combing are time-consuming and lack the ability to measure all origins. Recent genome-wide study of DNA replication approximated origin firing efficiency by indirectly measuring other quantities related to replication. However, these approximation methods do not reflect properties of origin firing and may lead to inappropriate estimations.</p> <p>Results</p> <p>In this paper, we develop a probabilistic model - Spanned Firing Time Model (SFTM) to characterize DNA replication process. The proposed model reflects current understandings about DNA replication. Origins in an individual cell may initiate replication randomly within a time window, but the population average exhibits a temporal program with some origins replicated early and the others late. By estimating DNA origin firing time and fork moving velocity from genome-wide time-course S-phase copy number variation data, we could estimate firing efficiency of all origins. The estimated firing efficiency is correlated well with the previous studies in fission and budding yeasts.</p> <p>Conclusions</p> <p>The new probabilistic model enables sensitive identification of origins as well as genome-wide estimation of origin firing efficiency. We have successfully estimated firing efficiencies of all origins in S.cerevisiae, S.pombe and human chromosomes 21 and 22.</p

    Adaptive expression responses in the Pol-γ null strain of S. pombe depleted of mitochondrial genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>DNA polymerase γ(Pol-γ) has been shown to be essential for maintenance of the mitochondrial genome (mtDNA) in the petite-positive budding yeast <it>Saccharomyces cerevisiae</it>. Budding yeast cells lacking mitochondria exhibit a slow-growing or petite-colony phenotype. Petite strains fail to grow on non-fermentable carbon sources. However, it is not clear whether the Pol-γ is required for mtDNA maintenance in the petite-negative fission yeast <it>Schizosaccharomyces pombe</it>.</p> <p>Results</p> <p>We show that disruption of the nuclear gene <it>pog1</it><sup>+ </sup>that encodes Pol-γ is sufficient to deplete mtDNA in <it>S. pombe</it>. Cells bearing <it>pog1Δ </it>allele require substantial growth periods to form petite colonies. Mitotracker assays indicate that <it>pog1Δ </it>cells are defective in mitochondrial function and EM analyses suggest that <it>pog1Δ </it>cells lack normal mitochondrial structures. Depletion of mtDNA in <it>pog1Δ </it>cells is evident from quantitative real-time PCR assays. Genome-wide expression profiles of <it>pog1Δ </it>and other mtDNA-less cells reveal that many genes involved in response to stimulus, energy derivation by oxidation of organic compounds, cellular carbohydrate metabolism, and energy reserve metabolism are induced. Conversely, many genes encoding proteins involved in amino acid metabolism and oxidative phosphorylation are repressed.</p> <p>Conclusion</p> <p>By showing that Pol-γ is essential for mtDNA maintenance and disruption of <it>pog1</it><sup>+ </sup>alters the genome-wide expression profiles, we demonstrated that cells lacking mtDNA exhibit adaptive nuclear gene expression responses in the petite-negative <it>S. pombe</it>.</p

    Transcriptome Analysis of Zebrafish Embryogenesis Using Microarrays

    Get PDF
    Zebrafish (Danio rerio) is a well-recognized model for the study of vertebrate developmental genetics, yet at the same time little is known about the transcriptional events that underlie zebrafish embryogenesis. Here we have employed microarray analysis to study the temporal activity of developmentally regulated genes during zebrafish embryogenesis. Transcriptome analysis at 12 different embryonic time points covering five different developmental stages (maternal, blastula, gastrula, segmentation, and pharyngula) revealed a highly dynamic transcriptional profile. Hierarchical clustering, stage-specific clustering, and algorithms to detect onset and peak of gene expression revealed clearly demarcated transcript clusters with maximum gene activity at distinct developmental stages as well as co-regulated expression of gene groups involved in dedicated functions such as organogenesis. Our study also revealed a previously unidentified cohort of genes that are transcribed prior to the mid-blastula transition, a time point earlier than when the zygotic genome was traditionally thought to become active. Here we provide, for the first time to our knowledge, a comprehensive list of developmentally regulated zebrafish genes and their expression profiles during embryogenesis, including novel information on the temporal expression of several thousand previously uncharacterized genes. The expression data generated from this study are accessible to all interested scientists from our institute resource database (http://giscompute.gis.a-star.edu.sg/~govind/zebrafish/data_download.html)

    Systems consequences of amplicon formation in human breast cancer

    Get PDF
    Chromosomal structural variations play an important role in determining the transcriptional landscape of human breast cancers. To assess the nature of these structural variations, we analyzed eight breast tumor samples with a focus on regions of gene amplification using mate-pair sequencing of long-insert genomic DNA with matched transcriptome profiling. We found that tandem duplications appear to be early events in tumor evolution, especially in the genesis of amplicons. In a detailed reconstruction of events on chromosome 17, we found large unpaired inversions and deletions connect a tandemly duplicated ERBB2 with neighboring 17q21.3 amplicons while simultaneously deleting the intervening BRCA1 tumor suppressor locus. This series of events appeared to be unusually common when examined in larger genomic data sets of breast cancers albeit using approaches with lesser resolution. Using siRNAs in breast cancer cell lines, we showed that the 17q21.3 amplicon harbored a significant number of weak oncogenes that appeared consistently coamplified in primary tumors. Down-regulation of BRCA1 expression augmented the cell proliferation in ERBB2-transfected human normal mammary epithelial cells. Coamplification of other functionally tested oncogenic elements in other breast tumors examined, such as RIPK2 and MYC on chromosome 8, also parallel these findings. Our analyses suggest that structural variations efficiently orchestrate the gain and loss of cancer gene cassettes that engage many oncogenic pathways simultaneously and that such oncogenic cassettes are favored during the evolution of a cancer.Singapore. Agency for Science, Technology and ResearchNational Science Foundation (U.S.) (East Asia and Pacific Summer Institutes (OISE-1108282)

    Differential co-expression framework to quantify goodness of biclusters and compare biclustering algorithms

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Biclustering is an important analysis procedure to understand the biological mechanisms from microarray gene expression data. Several algorithms have been proposed to identify biclusters, but very little effort was made to compare the performance of different algorithms on real datasets and combine the resultant biclusters into one unified ranking.</p> <p>Results</p> <p>In this paper we propose differential co-expression framework and a differential co-expression scoring function to objectively quantify quality or goodness of a bicluster of genes based on the observation that genes in a bicluster are co-expressed in the conditions belonged to the bicluster and not co-expressed in the other conditions. Furthermore, we propose a scoring function to stratify biclusters into three types of co-expression. We used the proposed scoring functions to understand the performance and behavior of the four well established biclustering algorithms on six real datasets from different domains by combining their output into one unified ranking.</p> <p>Conclusions</p> <p>Differential co-expression framework is useful to provide quantitative and objective assessment of the goodness of biclusters of co-expressed genes and performance of biclustering algorithms in identifying co-expression biclusters. It also helps to combine the biclusters output by different algorithms into one unified ranking i.e. meta-biclustering.</p

    REST Regulates Distinct Transcriptional Networks in Embryonic and Neural Stem Cells

    Get PDF
    The maintenance of pluripotency and specification of cellular lineages during embryonic development are controlled by transcriptional regulatory networks, which coordinate specific sets of genes through both activation and repression. The transcriptional repressor RE1-silencing transcription factor (REST) plays important but distinct regulatory roles in embryonic (ESC) and neural (NSC) stem cells. We investigated how these distinct biological roles are effected at a genomic level. We present integrated, comparative genome- and transcriptome-wide analyses of transcriptional networks governed by REST in mouse ESC and NSC. The REST recruitment profile has dual components: a developmentally independent core that is common to ESC, NSC, and differentiated cells; and a large, ESC-specific set of target genes. In ESC, the REST regulatory network is highly integrated into that of pluripotency factors Oct4-Sox2-Nanog. We propose that an extensive, pluripotency-specific recruitment profile lends REST a key role in the maintenance of the ESC phenotype

    Zebrafish Whole-Adult-Organism Chemogenomics for Large-Scale Predictive and Discovery Chemical Biology

    Get PDF
    The ability to perform large-scale, expression-based chemogenomics on whole adult organisms, as in invertebrate models (worm and fly), is highly desirable for a vertebrate model but its feasibility and potential has not been demonstrated. We performed expression-based chemogenomics on the whole adult organism of a vertebrate model, the zebrafish, and demonstrated its potential for large-scale predictive and discovery chemical biology. Focusing on two classes of compounds with wide implications to human health, polycyclic (halogenated) aromatic hydrocarbons [P(H)AHs] and estrogenic compounds (ECs), we generated robust prediction models that can discriminate compounds of the same class from those of different classes in two large independent experiments. The robust expression signatures led to the identification of biomarkers for potent aryl hydrocarbon receptor (AHR) and estrogen receptor (ER) agonists, respectively, and were validated in multiple targeted tissues. Knowledge-based data mining of human homologs of zebrafish genes revealed highly conserved chemical-induced biological responses/effects, health risks, and novel biological insights associated with AHR and ER that could be inferred to humans. Thus, our study presents an effective, high-throughput strategy of capturing molecular snapshots of chemical-induced biological states of a whole adult vertebrate that provides information on biomarkers of effects, deregulated signaling pathways, and possible affected biological functions, perturbed physiological systems, and increased health risks. These findings place zebrafish in a strategic position to bridge the wide gap between cell-based and rodent models in chemogenomics research and applications, especially in preclinical drug discovery and toxicology
    corecore