115 research outputs found

    Integrating Prior Knowledge in Multiple Testing under Dependence with Applications to Detecting Differential DNA Methylation

    Get PDF
    DNA methylation has emerged as an important hallmark of epigenetics. Numerous platforms including tiling arrays and next generation sequencing, and experimental protocols are available for profiling DNA methylation. Similar to other tiling array data, DNA methylation data shares the characteristics of inherent correlation structure among nearby probes. However, unlike gene expression or protein DNA binding data, the varying CpG density which gives rise to CpG island, shore and shelf definition provides exogenous information in detecting differential methylation. This paper aims to introduce a robust testing and probe ranking procedure based on a non-homogeneous hidden Markov model that incorporates the above-mentioned features for detecting differential methylation. We revisit the seminal work of Sun and Cai (2009, J. R. Stat. Soc. B. 71, 393-424) and propose modeling the non-null using a non-parametric symmetric distribution in two-sided hypothesis testing. We show that this model improves probe ranking and is robust to model misspecification based on extensive simulation studies. We further illustrate that our proposed framework achieves good operating characteristics as compared to commonly used methods in real DNA methylation data that aims to detect differential methylation sites

    Determining Physical Constraints in Transcriptional Initiation Complexes Using DNA Sequence Analysis

    Get PDF
    Eukaryotic gene expression is often under the control of cooperatively acting transcription factors whose binding is limited by structural constraints. By determining these structural constraints, we can understand the “rules” that define functional cooperativity. Conversely, by understanding the rules of binding, we can infer structural characteristics. We have developed an information theory based method for approximating the physical limitations of cooperative interactions by comparing sequence analysis to microarray expression data. When applied to the coordinated binding of the sulfur amino acid regulatory protein Met4 by Cbf1 and Met31, we were able to create a combinatorial model that can correctly identify Met4 regulated genes. Interestingly, we found that the major determinant of Met4 regulation was the sum of the strength of the Cbf1 and Met31 binding sites and that the energetic costs associated with spacing appeared to be minimal

    Position specific variation in the rate of evolution in transcription factor binding sites

    Get PDF
    BACKGROUND: The binding sites of sequence specific transcription factors are an important and relatively well-understood class of functional non-coding DNAs. Although a wide variety of experimental and computational methods have been developed to characterize transcription factor binding sites, they remain difficult to identify. Comparison of non-coding DNA from related species has shown considerable promise in identifying these functional non-coding sequences, even though relatively little is known about their evolution. RESULTS: Here we analyse the genome sequences of the budding yeasts Saccharomyces cerevisiae, S. bayanus, S. paradoxus and S. mikatae to study the evolution of transcription factor binding sites. As expected, we find that both experimentally characterized and computationally predicted binding sites evolve slower than surrounding sequence, consistent with the hypothesis that they are under purifying selection. We also observe position-specific variation in the rate of evolution within binding sites. We find that the position-specific rate of evolution is positively correlated with degeneracy among binding sites within S. cerevisiae. We test theoretical predictions for the rate of evolution at positions where the base frequencies deviate from background due to purifying selection and find reasonable agreement with the observed rates of evolution. Finally, we show how the evolutionary characteristics of real binding motifs can be used to distinguish them from artefacts of computational motif finding algorithms. CONCLUSION: As has been observed for protein sequences, the rate of evolution in transcription factor binding sites varies with position, suggesting that some regions are under stronger functional constraint than others. This variation likely reflects the varying importance of different positions in the formation of the protein-DNA complex. The characterization of the pattern of evolution in known binding sites will likely contribute to the effective use of comparative sequence data in the identification of transcription factor binding sites and is an important step toward understanding the evolution of functional non-coding DNA

    MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model

    Get PDF
    We introduce a method (MONKEY) to identify conserved transcription-factor binding sites in multispecies alignments. MONKEY employs probabilistic models of factor specificity and binding-site evolution, on which basis we compute the likelihood that putative sites are conserved and assign statistical significance to each hit. Using genomes from the genus Saccharomyces, we illustrate how the significance of real sites increases with evolutionary distance and explore the relationship between conservation and function

    Conservation and Evolution of Cis-Regulatory Systems in Ascomycete Fungi

    Get PDF
    Relatively little is known about the mechanisms through which gene expression regulation evolves. To investigate this, we systematically explored the conservation of regulatory networks in fungi by examining the cis-regulatory elements that govern the expression of coregulated genes. We first identified groups of coregulated Saccharomyces cerevisiae genes enriched for genes with known upstream or downstream cis-regulatory sequences. Reasoning that many of these gene groups are coregulated in related species as well, we performed similar analyses on orthologs of coregulated S. cerevisiae genes in 13 other ascomycete species. We find that many species-specific gene groups are enriched for the same flanking regulatory sequences as those found in the orthologous gene groups from S. cerevisiae, indicating that those regulatory systems have been conserved in multiple ascomycete species. In addition to these clear cases of regulatory conservation, we find examples of cis-element evolution that suggest multiple modes of regulatory diversification, including alterations in transcription factor-binding specificity, incorporation of new gene targets into an existing regulatory system, and cooption of regulatory systems to control a different set of genes. We investigated one example in greater detail by measuring the in vitro activity of the S. cerevisiae transcription factor Rpn4p and its orthologs from Candida albicans and Neurospora crassa. Our results suggest that the DNA binding specificity of these proteins has coevolved with the sequences found upstream of the Rpn4p target genes and suggest that Rpn4p has a different function in N. crassa

    Cancer gene discovery in hepatocellular carcinoma

    Get PDF
    Hepatocellular carcinoma (HCC) is a deadly cancer, whose incidence is increasing worldwide. Albeit the main risk factors for HCC development have been clearly identified, such as hepatitis B and C virus infection and alcohol abuse, there is still preliminary understanding of the key drivers of this malignancy. Recent data suggest that genomic analysis of cirrhotic tissue - the pre-neoplastic carcinogenic field - may provide a read-out to identify at risk populations for cancer development. Given this contextual complexity, it is of utmost importance to characterize the molecular pathogenesis of this disease, and pinpoint the dominant pathways/drivers by integrative oncogenomic approaches and/or sophisticated experimental models. Identification of the dominant proliferative signals and key aberrations will allow for a more personalized therapy

    Lack of benefits for prevention of cardiovascular disease with aspirin therapy in type 2 diabetic patients - a longitudinal observational study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The risk-benefit ratio of aspirin therapy in prevention of cardiovascular disease (CVD) remains contentious, especially in type 2 diabetes. This study examined the benefit and harm of low-dose aspirin (daily dose < 300 mg) in patients with type 2 diabetes.</p> <p>Methods</p> <p>This is a longitudinal observational study with primary and secondary prevention cohorts based on history of CVD at enrolment. We compared the occurrence of primary composite (non-fatal myocardial infarction or stroke and vascular death) and secondary endpoints (upper GI bleeding and haemorrhagic stroke) between aspirin users and non-users between January 1995 and July 2005.</p> <p>Results</p> <p>Of the 6,454 patients (mean follow-up: median [IQR]: 4.7 [4.4] years), usage of aspirin was 18% (n = 1,034) in the primary prevention cohort (n = 5731) and 81% (n = 585) in the secondary prevention cohort (n = 723). After adjustment for covariates, in the primary prevention cohort, aspirin use was associated with a hazard-ratio of 2.07 (95% CI: 1.66, 2.59, p < 0.001) for primary endpoint. There was no difference in CVD event rate in the secondary prevention cohort. Overall, aspirin use was associated with a hazard-ratio of 2.2 (1.53, 3.15, p < 0.001) of GI bleeding and 1.71 (1.00, 2.95, p = 0.051) of haemorrhagic stroke. The absolute risk of aspirin-related GI bleeding was 10.7 events per 1,000 person-years of treatment.</p> <p>Conclusion</p> <p>In Chinese type 2 diabetic patients, low dose aspirin was associated with a paradoxical increase in CVD risk in primary prevention and did not confer benefits in secondary prevention. In addition, the risk of GI bleeding in aspirin users was rather high.</p

    A Robust Method for Transcript Quantification with RNA-Seq Data

    Get PDF
    The advent of high throughput RNA-seq technology allows deep sampling of the transcriptome, making it possible to characterize both the diversity and the abundance of transcript isoforms. Accurate abundance estimation or transcript quantification of isoforms is critical for downstream differential analysis (e.g., healthy vs. diseased cells) but remains a challenging problem for several reasons. First, while various types of algorithms have been developed for abundance estimation, short reads often do not uniquely identify the transcript isoforms from which they were sampled. As a result, the quantification problem may not be identifiable, i.e., lacks a unique transcript solution even if the read maps uniquely to the reference genome. In this article, we develop a general linear model for transcript quantification that leverages reads spanning multiple splice junctions to ameliorate identifiability. Second, RNA-seq reads sampled from the transcriptome exhibit unknown position-specific and sequence-specific biases. We extend our method to simultaneously learn bias parameters during transcript quantification to improve accuracy. Third, transcript quantification is often provided with a candidate set of isoforms, not all of which are likely to be significantly expressed in a given tissue type or condition. By resolving the linear system with LASSO, our approach can infer an accurate set of dominantly expressed transcripts while existing methods tend to assign positive expression to every candidate isoform. Using simulated RNA-seq datasets, our method demonstrated better quantification accuracy and the inference of dominant set of transcripts than existing methods. The application of our method on real data experimentally demonstrated that transcript quantification is effective for differential analysis of transcriptomes

    MapSplice: Accurate Mapping of RNA-Seq Reads for Splice Junction Discovery

    Get PDF
    The accurate mapping of reads that span splice junctions is a critical component of all analytic techniques that work with RNA-seq data. We introduce a second generation splice detection algorithm, MapSplice, whose focus is high sensitivity and specificity in the detection of splices as well as CPU and memory efficiency. MapSplice can be applied to both short (\u3c75 bp) and long reads (≥75 bp). MapSplice is not dependent on splice site features or intron length, consequently it can detect novel canonical as well as non-canonical splices. MapSplice leverages the quality and diversity of read alignments of a given splice to increase accuracy. We demonstrate that MapSplice achieves higher sensitivity and specificity than TopHat and SpliceMap on a set of simulated RNA-seq data. Experimental studies also support the accuracy of the algorithm. Splice junctions derived from eight breast cancer RNA-seq datasets recapitulated the extensiveness of alternative splicing on a global level as well as the differences between molecular subtypes of breast cancer. These combined results indicate that MapSplice is a highly accurate algorithm for the alignment of RNA-seq reads to splice junctions. Software download URL: http://www.netlab.uky.edu/p/bioinfo/MapSplice
    corecore