5,180 research outputs found

    Mutual Enrichment in Ranked Lists and the Statistical Assessment of Position Weight Matrix Motifs

    Get PDF
    Statistics in ranked lists is important in analyzing molecular biology measurement data, such as ChIP-seq, which yields ranked lists of genomic sequences. State of the art methods study fixed motifs in ranked lists. More flexible models such as position weight matrix (PWM) motifs are not addressed in this context. To assess the enrichment of a PWM motif in a ranked list we use a PWM induced second ranking on the same set of elements. Possible orders of one ranked list relative to the other are modeled by permutations. Due to sample space complexity, it is difficult to characterize tail distributions in the group of permutations. In this paper we develop tight upper bounds on tail distributions of the size of the intersection of the top of two uniformly and independently drawn permutations and demonstrate advantages of this approach using our software implementation, mmHG-Finder, to study PWMs in several datasets.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

    Features of mammalian microRNA promoters emerge from polymerase II chromatin immunoprecipitation data

    Get PDF
    Background: MicroRNAs (miRNAs) are short, non-coding RNA regulators of protein coding genes. miRNAs play a very important role in diverse biological processes and various diseases. Many algorithms are able to predict miRNA genes and their targets, but their transcription regulation is still under investigation. It is generally believed that intragenic miRNAs (located in introns or exons of protein coding genes) are co-transcribed with their host genes and most intergenic miRNAs transcribed from their own RNA polymerase II (Pol II) promoter. However, the length of the primary transcripts and promoter organization is currently unknown. Methodology: We performed Pol II chromatin immunoprecipitation (ChIP)-chip using a custom array surrounding regions of known miRNA genes. To identify the true core transcription start sites of the miRNA genes we developed a new tool (CPPP). We showed that miRNA genes can be transcribed from promoters located several kilobases away and that their promoters share the same general features as those of protein coding genes. Finally, we found evidence that as many as 26% of the intragenic miRNAs may be transcribed from their own unique promoters. Conclusion: miRNA promoters have similar features to those of protein coding genes, but miRNA transcript organization is more complex. © 2009 Corcoran et al

    Oyster RNA-seq data support the development of Malacoherpesviridae genomics

    Get PDF
    The family of double-stranded DNA (dsDNA) Malacoherpesviridae includes viruses able to infect marine mollusks and detrimental for worldwide aquaculture production. Due to fast-occurring mortality and a lack of permissive cell lines, the available data on the few known Malacoherpesviridae provide only partial support for the study of molecular virus features, life cycle, and evolutionary history. Following thorough data mining of bivalve and gastropod RNA-seq experiments, we used more than five million Malacoherpesviridae reads to improve the annotation of viral genomes and to characterize viral InDels, nucleotide stretches, and SNPs. Both genome and protein domain analyses confirmed the evolutionary diversification and gene uniqueness of known Malacoherpesviridae. However, the presence of Malacoherpesviridae-like sequences integrated within genomes of phylogenetically distant invertebrates indicates broad diffusion of these viruses and indicates the need for confirmatory investigations. The manifest co-occurrence of OsHV-1 genotype variants in single RNA-seq samples of Crassostrea gigas provide further support for the Malacoherpesviridae diversification. In addition to simple sequence motifs inter-punctuating viral ORFs, recombination-inducing sequences were found to be enriched in the OsHV-1 and AbHV1-AUS genomes. Finally, the highly correlated expression of most viral ORFs in multiple oyster samples is consistent with the burst of viral proteins during the lytic phase
    • …
    corecore