174 research outputs found

    Sequence determinants in human polyadenylation site selection

    Get PDF
    BACKGROUND: Differential polyadenylation is a widespread mechanism in higher eukaryotes producing mRNAs with different 3' ends in different contexts. This involves several alternative polyadenylation sites in the 3' UTR, each with its specific strength. Here, we analyze the vicinity of human polyadenylation signals in search of patterns that would help discriminate strong and weak polyadenylation sites, or true sites from randomly occurring signals. RESULTS: We used human genomic sequences to retrieve the region downstream of polyadenylation signals, usually absent from cDNA or mRNA databases. Analyzing 4956 EST-validated polyadenylation sites and their -300/+300 nt flanking regions, we clearly visualized the upstream (USE) and downstream (DSE) sequence elements, both characterized by U-rich (not GU-rich) segments. The presence of a USE and a DSE is the main feature distinguishing true polyadenylation sites from randomly occurring A(A/U)UAAA hexamers. While USEs are indifferently associated with strong and weak poly(A) sites, DSEs are more conspicuous near strong poly(A) sites. We then used the region encompassing the hexamer and DSE as a training set for poly(A) site identification by the ERPIN program and achieved a prediction specificity of 69 to 85% for a sensitivity of 56%. CONCLUSION: The availability of complete genomes and large EST sequence databases now permit large-scale observation of polyadenylation sites. Both U-rich sequences flanking both sides of poly(A) signals contribute to the definition of "true" sites. However, the downstream U-rich sequences may also play an enhancing role. Based on this information, poly(A) site prediction accuracy was moderately but consistently improved compared to the best previously available algorithm

    Predicting RNA secondary structure by the comparative approach: how to select the homologous sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The secondary structure of an RNA must be known before the relationship between its structure and function can be determined. One way to predict the secondary structure of an RNA is to identify covarying residues that maintain the pairings (Watson-Crick, Wobble and non-canonical pairings). This "comparative approach" consists of identifying mutations from homologous sequence alignments. The sequences must covary enough for compensatory mutations to be revealed, but comparison is difficult if they are too different. Thus the choice of homologous sequences is critical. While many possible combinations of homologous sequences may be used for prediction, only a few will give good structure predictions. This can be due to poor quality alignment in stems or to the variability of certain sequences. This problem of sequence selection is currently unsolved.</p> <p>Results</p> <p>This paper describes an algorithm, <it>SSCA</it>, which measures the suitability of sequences for the comparative approach. It is based on evolutionary models with structure constraints, particularly those on sequence variations and stem alignment. We propose three models, based on different constraints on sequence alignments. We show the results of the <it>SSCA </it>algorithm for predicting the secondary structure of several RNAs. <it>SSCA </it>enabled us to choose sets of homologous sequences that gave better predictions than arbitrarily chosen sets of homologous sequences.</p> <p>Conclusion</p> <p><it>SSCA </it>is an algorithm for selecting combinations of RNA homologous sequences suitable for secondary structure predictions with the comparative approach.</p

    Evaluation of Glycine max mRNA clusters

    Get PDF
    BACKGROUND: Clustering the ESTs from a large dataset representing a single species is a convenient starting point for a number of investigations into gene discovery, genome evolution, expression patterns, and alternatively spliced transcripts. Several methods have been developed to accomplish this, the most widely available being UniGene, a public domain collection of gene-oriented clusters for over 45 different species created and maintained by NCBI. The goal is for each cluster to represent a unique gene, but currently it is not known how closely the overall results represent that reality. UniGene's build procedure begins with initial mRNA clusters before joining ESTs. UniGene's results for soybean indicate a significant amount of redundancy among some sequences reported to be unique mRNAs. To establish a valid non-redundant known gene set for Glycine max we applied our algorithm to the clustering of only mRNA sequences. The mRNA dataset was run through the algorithm using two different matching stringencies. The resulting cluster compositions were compared to each other and to UniGene. Clusters exhibiting differences among the three methods were analyzed by 1) nucleotide and amino acid alignment and 2) submitting authors conclusions to determine whether members of a single cluster represented the same gene or not. RESULTS: Of the 12 clusters that were examined closely most contained examples of sequences that did not belong in the same cluster. However, neither the two stringencies of PECT nor UniGene had a significantly greater record of accuracy in placing paralogs into separate clusters. CONCLUSION: Our results reveal that, although each method produces some errors, using multiple stringencies for matching or a sequential hierarchical method of increasing stringencies can provide more reliable results and therefore allow greater confidence in the vast majority of clusters that contain only ESTs and no mRNA sequences

    High incidence of Epstein-Barr virus, cytomegalovirus and human herpesvirus 6 infections in children with cancer

    Get PDF
    BACKGROUND: A prospective single-center study was performed to study infection with lymphotropic herpesviruses (LH) Epstein-Barr virus (EBV), cytomegalovirus (CMV) and human herpesvirus 6 (HHV-6) in children with cancer. METHODS: The group of 186 children was examined for the presence of LH before, during and 2 months after the end of anticancer treatment. Serology of EBV and CMV was monitored in all children, serology of HHV-6 and DNA analysis of all three LH was monitored in 70 children. RESULTS: At the time of cancer diagnosis (pre-treatment), there was no difference between cancer patients and age-matched healthy controls in overall IgG seropositivity for EBV (68.8% vs. 72.0%; p = 0.47) and CMV (37.6% vs. 41.7%; p = 0.36). During anticancer therapy, primary or reactivated EBV and CMV infection was present in 65 (34.9%) and 66 (35.4%) of 186 patients, respectively, leading to increased overall post-treatment IgG seropositivity that was significantly different from controls for EBV (86.6% vs. 72.0%; p = 0.0004) and CMV (67.7% vs. 41.7%; p < 0.0001). Overall pre-treatment IgG seropositivity for HHV-6 was significantly lower in patients than in controls (80.6% vs. 91.3%; p = 0.0231) which may be in agreement with Greaves hypothesis of protective effect of common infections in infancy to cancer development. Primary or reactivated HHV-6 infection was present in 23 (32.9%) of 70 patients during anticancer therapy leading to post-treatment IgG seropositivity that was not significantly different from controls (94.3% vs. 91.3%; p = 0.58). The LH infection occurred independently from leukodepleted blood transfusions given. Combination of serology and DNA analysis in detection of symptomatic EBV or CMV infection was superior to serology alone. CONCLUSION: EBV, CMV and HHV-6 infections are frequently present during therapy of pediatric malignancy

    Structural Constraints Identified with Covariation Analysis in Ribosomal RNA

    Get PDF
    Covariation analysis is used to identify those positions with similar patterns of sequence variation in an alignment of RNA sequences. These constraints on the evolution of two positions are usually associated with a base pair in a helix. While mutual information (MI) has been used to accurately predict an RNA secondary structure and a few of its tertiary interactions, early studies revealed that phylogenetic event counting methods are more sensitive and provide extra confidence in the prediction of base pairs. We developed a novel and powerful phylogenetic events counting method (PEC) for quantifying positional covariation with the Gutell lab’s new RNA Comparative Analysis Database (rCAD). The PEC and MI-based methods each identify unique base pairs, and jointly identify many other base pairs. In total, both methods in combination with an N-best and helix-extension strategy identify the maximal number of base pairs. While covariation methods have effectively and accurately predicted RNAs secondary structure, only a few tertiary structure base pairs have been identified. Analysis presented herein and at the Gutell lab’s Comparative RNA Web (CRW) Site reveal that the majority of these latter base pairs do not covary with one another. However, covariation analysis does reveal a weaker although significant covariation between sets of nucleotides that are in proximity in the three-dimensional RNA structure. This reveals that covariation analysis identifies other types of structural constraints beyond the two nucleotides that form a base pair

    Herpes-Virus Infection in Patients with Langerhans Cell Histiocytosis: A Case-Controlled Sero-Epidemiological Study, and In Situ Analysis

    Get PDF
    BACKGROUND: Langerhans cell histiocytosis (LCH) is a rare disease that affects mainly young children, and which features granulomas containing Langerhans-type dendritic cells. The role of several human herpesviruses (HHV) in the pathogenesis of LCH was suggested by numerous reports but remains debated. Epstein-barr virus (EBV, HHV-4), & Cytomegalovirus (CMV, HHV-5) can infect Langerhans cells, and EBV, CMV and HHV-6 have been proposed to be associated with LCH based on the detection of these viruses in clinical samples. METHODOLOGY: We have investigated the prevalence of EBV, CMV and HHV-6 infection, the characters of antibody response and the plasma viral load in a cohort of 83 patients and 236 age-matched controls, and the presence and cellular localization of the viruses in LCH tissue samples from 19 patients. PRINCIPAL FINDINGS: The results show that prevalence, serological titers, and viral load for EBV, CMV and HHV-6 did not differ between patients and controls. EBV was found by PCR in tumoral sample from 3/19 patients, however, EBV small RNAs EBERs -when positive-, were detected by in situ double staining in bystander B CD20+ CD79a+ lymphocytes and not in CD1a+ LC. HHV-6 genome was detected in the biopsies of 5/19 patients with low copy number and viral Ag could not be detected in biopsies. CMV was not detected by PCR in this series. CONCLUSIONS/SIGNIFICANCE: Therefore, our findings do not support the hypothesis of a role of EBV, CMV, or HHV-6 in the pathogenesis of LCH, and indicate that the frequent detection of Epstein-barr virus (EBV) in Langerhans cell histiocytosis is accounted for by the infection of bystander B lymphocytes in LCH granuloma. The latter observation can be attributed to the immunosuppressive micro environment found in LCH granuloma

    Phylogenetic Analysis of the Complete Mitochondrial Genome of Madurella mycetomatis Confirms Its Taxonomic Position within the Order Sordariales

    Get PDF
    Background: Madurella mycetomatis is the most common cause of human eumycetoma. The genus Madurella has been characterized by overall sterility on mycological media. Due to this sterility and the absence of other reliable morphological and ultrastructural characters, the taxonomic classification of Madurella has long been a challenge. Mitochondria are of monophyletic origin and mitochondrial genomes have been proven to be useful in phylogenetic analyses. Results: The first complete mitochondrial DNA genome of a mycetoma-causative agent was sequenced using 454 sequencing. The mitochondrial genome of M. mycetomatis is a circular DNA molecule with a size of 45,590 bp, encoding for the small and the large subunit rRNAs, 27 tRNAs, 11 genes encoding subunits of respiratory chain complexes, 2 ATP synthase subunits, 5 hypothetical proteins, 6 intronic proteins including the ribosomal protein rps3. In phylogenetic analyses using amino acid sequences of the proteins involved in respiratory chain complexes and the 2 ATP synthases it appeared that M. mycetomatis clustered together with members of the order Sordariales and that it was most closely related to Chaetomium thermophilum. Analyses of the gene order showed that within the order Sordariales a similar gene order is found. Furthermore also the tRNA order seemed mostly conserved. Conclusion: Phylogenetic analyses of fungal mitochondrial genomes confirmed that M. mycetomatis belongs to the order of Sordariales and that it was most closely related to Chaetomium thermophilum, with which it also shared a comparable gene and tRNA order

    Systematic variation in mRNA 3′-processing signals during mouse spermatogenesis

    Get PDF
    Gene expression and processing during mouse male germ cell maturation (spermatogenesis) is highly specialized. Previous reports have suggested that there is a high incidence of alternative 3′-processing in male germ cell mRNAs, including reduced usage of the canonical polyadenylation signal, AAUAAA. We used EST libraries generated from mouse testicular cells to identify 3′-processing sites used at various stages of spermatogenesis (spermatogonia, spermatocytes and round spermatids) and testicular somatic Sertoli cells. We assessed differences in 3′-processing characteristics in the testicular samples, compared to control sets of widely used 3′-processing sites. Using a new method for comparison of degenerate regulatory elements between sequence samples, we identified significant changes in the use of putative 3′-processing regulatory sequence elements in all spermatogenic cell types. In addition, we observed a trend towards truncated 3′-untranslated regions (3′-UTRs), with the most significant differences apparent in round spermatids. In contrast, Sertoli cells displayed a much smaller trend towards 3′-UTR truncation and no significant difference in 3′-processing regulatory sequences. Finally, we identified a number of genes encoding mRNAs that were specifically subject to alternative 3′-processing during meiosis and postmeiotic development. Our results highlight developmental differences in polyadenylation site choice and in the elements that likely control them during spermatogenesis

    nocoRNAc: Characterization of non-coding RNAs in prokaryotes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The interest in non-coding RNAs (ncRNAs) constantly rose during the past few years because of the wide spectrum of biological processes in which they are involved. This led to the discovery of numerous ncRNA genes across many species. However, for most organisms the non-coding transcriptome still remains unexplored to a great extent. Various experimental techniques for the identification of ncRNA transcripts are available, but as these methods are costly and time-consuming, there is a need for computational methods that allow the detection of functional RNAs in complete genomes in order to suggest elements for further experiments. Several programs for the genome-wide prediction of functional RNAs have been developed but most of them predict a genomic locus with no indication whether the element is transcribed or not.</p> <p>Results</p> <p>We present <smcaps>NOCO</smcaps>RNAc, a program for the genome-wide prediction of ncRNA transcripts in bacteria. <smcaps>NOCO</smcaps>RNAc incorporates various procedures for the detection of transcriptional features which are then integrated with functional ncRNA loci to determine the transcript coordinates. We applied RNAz and <smcaps>NOCO</smcaps>RNAc to the genome of <it>Streptomyces coelicolor </it>and detected more than 800 putative ncRNA transcripts most of them located antisense to protein-coding regions. Using a custom design microarray we profiled the expression of about 400 of these elements and found more than 300 to be transcribed, 38 of them are predicted novel ncRNA genes in intergenic regions. The expression patterns of many ncRNAs are similarly complex as those of the protein-coding genes, in particular many antisense ncRNAs show a high expression correlation with their protein-coding partner.</p> <p>Conclusions</p> <p>We have developed <smcaps>NOCO</smcaps>RNAc, a framework that facilitates the automated characterization of functional ncRNAs. <smcaps>NOCO</smcaps>RNAc increases the confidence of predicted ncRNA loci, especially if they contain transcribed ncRNAs. <smcaps>NOCO</smcaps>RNAc is not restricted to intergenic regions, but it is applicable to the prediction of ncRNA transcripts in whole microbial genomes. The software as well as a user guide and example data is available at <url>http://www.zbit.uni-tuebingen.de/pas/nocornac.htm</url>.</p

    Current tools for the identification of miRNA genes and their targets

    Get PDF
    The discovery of microRNAs (miRNAs), almost 10 years ago, changed dramatically our perspective on eukaryotic gene expression regulation. However, the broad and important functions of these regulators are only now becoming apparent. The expansion of our catalogue of miRNA genes and the identification of the genes they regulate owe much to the development of sophisticated computational tools that have helped either to focus or interpret experimental assays. In this article, we review the methods for miRNA gene finding and target identification that have been proposed in the last few years. We identify some problems that current approaches have not yet been able to overcome and we offer some perspectives on the next generation of computational methods
    corecore