29 research outputs found

    Features of mammalian microRNA promoters emerge from polymerase II chromatin immunoprecipitation data

    Get PDF
    Background: MicroRNAs (miRNAs) are short, non-coding RNA regulators of protein coding genes. miRNAs play a very important role in diverse biological processes and various diseases. Many algorithms are able to predict miRNA genes and their targets, but their transcription regulation is still under investigation. It is generally believed that intragenic miRNAs (located in introns or exons of protein coding genes) are co-transcribed with their host genes and most intergenic miRNAs transcribed from their own RNA polymerase II (Pol II) promoter. However, the length of the primary transcripts and promoter organization is currently unknown. Methodology: We performed Pol II chromatin immunoprecipitation (ChIP)-chip using a custom array surrounding regions of known miRNA genes. To identify the true core transcription start sites of the miRNA genes we developed a new tool (CPPP). We showed that miRNA genes can be transcribed from promoters located several kilobases away and that their promoters share the same general features as those of protein coding genes. Finally, we found evidence that as many as 26% of the intragenic miRNAs may be transcribed from their own unique promoters. Conclusion: miRNA promoters have similar features to those of protein coding genes, but miRNA transcript organization is more complex. © 2009 Corcoran et al

    A phylogenetic generalized hidden Markov model for predicting alternatively spliced exons

    Get PDF
    BACKGROUND: An important challenge in eukaryotic gene prediction is accurate identification of alternatively spliced exons. Functional transcripts can go undetected in gene expression studies when alternative splicing only occurs under specific biological conditions. Non-expression based computational methods support identification of rarely expressed transcripts. RESULTS: A non-expression based statistical method is presented to annotate alternatively spliced exons using a single genome sequence and evidence from cross-species sequence conservation. The computational method is implemented in the program ExAlt and an analysis of prediction accuracy is given for Drosophila melanogaster. CONCLUSION: ExAlt identifies the structure of most alternatively spliced exons in the test set and cross-species sequence conservation is shown to improve the precision of predictions. The software package is available to run on Drosophila genomes to search for new cases of alternative splicing

    RBF-TSS: Identification of Transcription Start Site in Human Using Radial Basis Functions Network and Oligonucleotide Positional Frequencies

    Get PDF
    Accurate identification of promoter regions and transcription start sites (TSS) in genomic DNA allows for a more complete understanding of the structure of genes and gene regulation within a given genome. Many recently published methods have achieved high identification accuracy of TSS. However, models providing more accurate modeling of promoters and TSS are needed. A novel identification method for identifying transcription start sites that improves the accuracy of TSS recognition for recently published methods is proposed. This method incorporates a metric feature based on oligonucleotide positional frequencies, taking into account the nature of promoters. A radial basis function neural network for identifying transcription start sites (RBF-TSS) is proposed and employed as a classification algorithm. Using non-overlapping chunks (windows) of size 50 and 500 on the human genome, the proposed method achieves an area under the Receiver Operator Characteristic curve (auROC) of 94.75% and 95.08% respectively, providing increased performance over existing TSS prediction methods

    Automatic Annotation of Spatial Expression Patterns via Sparse Bayesian Factor Models

    Get PDF
    Advances in reporters for gene expression have made it possible to document and quantify expression patterns in 2D–4D. In contrast to microarrays, which provide data for many genes but averaged and/or at low resolution, images reveal the high spatial dynamics of gene expression. Developing computational methods to compare, annotate, and model gene expression based on images is imperative, considering that available data are rapidly increasing. We have developed a sparse Bayesian factor analysis model in which the observed expression diversity of among a large set of high-dimensional images is modeled by a small number of hidden common factors. We apply this approach on embryonic expression patterns from a Drosophila RNA in situ image database, and show that the automatically inferred factors provide for a meaningful decomposition and represent common co-regulation or biological functions. The low-dimensional set of factor mixing weights is further used as features by a classifier to annotate expression patterns with functional categories. On human-curated annotations, our sparse approach reaches similar or better classification of expression patterns at different developmental stages, when compared to other automatic image annotation methods using thousands of hard-to-interpret features. Our study therefore outlines a general framework for large microscopy data sets, in which both the generative model itself, as well as its application for analysis tasks such as automated annotation, can provide insight into biological questions

    The effects of multiple features of alternatively spliced exons on the K(A)/K(S )ratio test

    Get PDF
    BACKGROUND: The evolution of alternatively spliced exons (ASEs) is of primary interest because these exons are suggested to be a major source of functional diversity of proteins. Many exon features have been suggested to affect the evolution of ASEs. However, previous studies have relied on the K(A)/K(S )ratio test without taking into consideration information sufficiency (i.e., exon length > 75 bp, cross-species divergence > 5%) of the studied exons, leading to potentially biased interpretations. Furthermore, which exon feature dominates the results of the K(A)/K(S )ratio test and whether multiple exon features have additive effects have remained unexplored. RESULTS: In this study, we collect two different datasets for analysis – the ASE dataset (which includes lineage-specific ASEs and conserved ASEs) and the ACE dataset (which includes only conserved ASEs). We first show that information sufficiency can significantly affect the interpretation of relationship between exons features and the K(A)/K(S )ratio test results. After discarding exons with insufficient information, we use a Boolean method to analyze the relationship between test results and four exon features (namely length, protein domain overlapping, inclusion level, and exonic splicing enhancer (ESE) frequency) for the ASE dataset. We demonstrate that length and protein domain overlapping are dominant factors, and they have similar impacts on test results of ASEs. In addition, despite the weak impacts of inclusion level and ESE motif frequency when considered individually, combination of these two factors still have minor additive effects on test results. However, the ACE dataset shows a slightly different result in that inclusion level has a marginally significant effect on test results. Lineage-specific ASEs may have contributed to the difference. Overall, in both ASEs and ACEs, protein domain overlapping is the most dominant exon feature while ESE frequency is the weakest one in affecting test results. CONCLUSION: The proposed method can easily find additive effects of individual or multiple factors on the K(A)/K(S )ratio test results of exons. Therefore, the system can analyze complex conditions in evolution where multiple features are involved. More factors can also be added into the system to extend the scope of evolutionary analysis of exons. In addition, our method may be useful when orthologous exons can not be found for the K(A)/K(S )ratio test

    Modeling the Evolution of Regulatory Elements by Simultaneous Detection and Alignment with Phylogenetic Pair HMMs

    Get PDF
    The computational detection of regulatory elements in DNA is a difficult but important problem impacting our progress in understanding the complex nature of eukaryotic gene regulation. Attempts to utilize cross-species conservation for this task have been hampered both by evolutionary changes of functional sites and poor performance of general-purpose alignment programs when applied to non-coding sequence. We describe a new and flexible framework for modeling binding site evolution in multiple related genomes, based on phylogenetic pair hidden Markov models which explicitly model the gain and loss of binding sites along a phylogeny. We demonstrate the value of this framework for both the alignment of regulatory regions and the inference of precise binding-site locations within those regions. As the underlying formalism is a stochastic, generative model, it can also be used to simulate the evolution of regulatory elements. Our implementation is scalable in terms of numbers of species and sequence lengths and can produce alignments and binding-site predictions with accuracy rivaling or exceeding current systems that specialize in only alignment or only binding-site prediction. We demonstrate the validity and power of various model components on extensive simulations of realistic sequence data and apply a specific model to study Drosophila enhancers in as many as ten related genomes and in the presence of gain and loss of binding sites. Different models and modeling assumptions can be easily specified, thus providing an invaluable tool for the exploration of biological hypotheses that can drive improvements in our understanding of the mechanisms and evolution of gene regulation

    Quantitative Epistasis Analysis and Pathway Inference from Genetic Interaction Data

    Get PDF
    Inferring regulatory and metabolic network models from quantitative genetic interaction data remains a major challenge in systems biology. Here, we present a novel quantitative model for interpreting epistasis within pathways responding to an external signal. The model provides the basis of an experimental method to determine the architecture of such pathways, and establishes a new set of rules to infer the order of genes within them. The method also allows the extraction of quantitative parameters enabling a new level of information to be added to genetic network models. It is applicable to any system where the impact of combinatorial loss-of-function mutations can be quantified with sufficient accuracy. We test the method by conducting a systematic analysis of a thoroughly characterized eukaryotic gene network, the galactose utilization pathway in Saccharomyces cerevisiae. For this purpose, we quantify the effects of single and double gene deletions on two phenotypic traits, fitness and reporter gene expression. We show that applying our method to fitness traits reveals the order of metabolic enzymes and the effects of accumulating metabolic intermediates. Conversely, the analysis of expression traits reveals the order of transcriptional regulatory genes, secondary regulatory signals and their relative strength. Strikingly, when the analyses of the two traits are combined, the method correctly infers ∌80% of the known relationships without any false positives

    The Viral and Cellular MicroRNA Targetome in Lymphoblastoid Cell Lines

    Get PDF
    Epstein-Barr virus (EBV) is a ubiquitous human herpesvirus linked to a number of B cell cancers and lymphoproliferative disorders. During latent infection, EBV expresses 25 viral pre-microRNAs (miRNAs) and induces the expression of specific host miRNAs, such as miR-155 and miR-21, which potentially play a role in viral oncogenesis. To date, only a limited number of EBV miRNA targets have been identified; thus, the role of EBV miRNAs in viral pathogenesis and/or lymphomagenesis is not well defined. Here, we used photoactivatable ribonucleoside-enhanced crosslinking and immunoprecipitation (PAR-CLIP) combined with deep sequencing and computational analysis to comprehensively examine the viral and cellular miRNA targetome in EBV strain B95-8-infected lymphoblastoid cell lines (LCLs). We identified 7,827 miRNA-interaction sites in 3,492 cellular 3â€ČUTRs. 531 of these sites contained seed matches to viral miRNAs. 24 PAR-CLIP-identified miRNA:3â€ČUTR interactions were confirmed by reporter assays. Our results reveal that EBV miRNAs predominantly target cellular transcripts during latent infection, thereby manipulating the host environment. Furthermore, targets of EBV miRNAs are involved in multiple cellular processes that are directly relevant to viral infection, including innate immunity, cell survival, and cell proliferation. Finally, we present evidence that myc-regulated host miRNAs from the miR-17/92 cluster can regulate latent viral gene expression. This comprehensive survey of the miRNA targetome in EBV-infected B cells represents a key step towards defining the functions of EBV-encoded miRNAs, and potentially, identifying novel therapeutic targets for EBV-associated malignancies

    Measurement of the cross-section for producing a W boson in association with a single top quark in pp collisions at √s = 13 TeV with ATLAS

    Get PDF
    The inclusive cross-section for the associated production of a W boson and top quark is measured using data from proton-proton collisions at √ s = 13 TeV. The dataset corresponds to an integrated luminosity of 3.2 fb−1 , and was collected in 2015 by the ATLAS detector at the Large Hadron Collider at CERN. Events are selected requiring two opposite sign isolated leptons and at least one jet; they are separated into signal and control regions based on their jet multiplicity and the number of jets that are identified as containing b hadrons. The W t signal is then separated from the ttÂŻ background using boosted decision tree discriminants in two regions. The cross-section is extracted by fitting templates to the data distributions, and is measured to be σW t = 94±10 (stat.) +28 −22 (syst.)±2 (lumi.) pb. The measured value is in good agreement with the SM prediction of σtheory = 71.7±1.8 (scale)± 3.4 (PDF) pb [1]
    corecore