152 research outputs found

    Phylogeny and Classification of the Trapdoor Spider Genus Myrmekiaphila: An Integrative Approach to Evaluating Taxonomic Hypotheses

    Get PDF
    Background: Revised by Bond and Platnick in 2007, the trapdoor spider genus Myrmekiaphila comprises 11 species. Species delimitation and placement within one of three species groups was based on modifications of the male copulatory device. Because a phylogeny of the group was not available these species groups might not represent monophyletic lineages; species definitions likewise were untested hypotheses. The purpose of this study is to reconstruct the phylogeny of Myrmekiaphila species using molecular data to formally test the delimitation of species and species-groups. We seek to refine a set of established systematic hypotheses by integrating across molecular and morphological data sets. Methods and Findings: Phylogenetic analyses comprising Bayesian searches were conducted for a mtDNA matrix composed of contiguous 12S rRNA, tRNA-val, and 16S rRNA genes and a nuclear DNA matrix comprising the glutamyl and prolyl tRNA synthetase gene each consisting of 1348 and 481 bp, respectively. Separate analyses of the mitochondrial and nuclear genome data and a concatenated data set yield M. torreya and M. millerae paraphyletic with respect to M. coreyi and M. howelli and polyphyletic fluviatilis and foliata species groups. Conclusions: Despite the perception that molecular data present a solution to a crisis in taxonomy, studies like this demonstrate the efficacy of an approach that considers data from multiple sources. A DNA barcoding approach during the species discovery process would fail to recognize at least two species (M. coreyi and M. howelli) whereas a combine

    Using ESTs to improve the accuracy of de novo gene prediction

    Get PDF
    BACKGROUND: ESTs are a tremendous resource for determining the exon-intron structures of genes, but even extensive EST sequencing tends to leave many exons and genes untouched. Gene prediction systems based exclusively on EST alignments miss these exons and genes, leading to poor sensitivity. De novo gene prediction systems, which ignore ESTs in favor of genomic sequence, can predict such "untouched" exons, but they are less accurate when predicting exons to which ESTs align. TWINSCAN is the most accurate de novo gene finder available for nematodes and N-SCAN is the most accurate for mammals, as measured by exact CDS gene prediction and exact exon prediction. RESULTS: TWINSCAN_EST is a new system that successfully combines EST alignments with TWINSCAN. On the whole C. elegans genome TWINSCAN_EST shows 14% improvement in sensitivity and 13% in specificity in predicting exact gene structures compared to TWINSCAN without EST alignments. Not only are the structures revealed by EST alignments predicted correctly, but these also constrain the predictions without alignments, improving their accuracy. For the human genome, we used the same approach with N-SCAN, creating N-SCAN_EST. On the whole genome, N-SCAN_EST produced a 6% improvement in sensitivity and 1% in specificity of exact gene structure predictions compared to N-SCAN. CONCLUSION: TWINSCAN_EST and N-SCAN_EST are more accurate than TWINSCAN and N-SCAN, while retaining their ability to discover novel genes to which no ESTs align. Thus, we recommend using the EST versions of these programs to annotate any genome for which EST information is available. TWINSCAN_EST and N-SCAN_EST are part of the TWINSCAN open source software package

    Short clones or long clones? A simulation study on the use of paired reads in metagenomics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Metagenomics is the study of environmental samples using sequencing. Rapid advances in sequencing technology are fueling a vast increase in the number and scope of metagenomics projects. Most metagenome sequencing projects so far have been based on Sanger or Roche-454 sequencing, as only these technologies provide long enough reads, while Illumina sequencing has not been considered suitable for metagenomic studies due to a short read length of only 35 bp. However, now that reads of length 75 bp can be sequenced in pairs, Illumina sequencing has become a viable option for metagenome studies.</p> <p>Results</p> <p>This paper addresses the problem of taxonomical analysis of paired reads. We describe a new feature of our metagenome analysis software MEGAN that allows one to process sequencing reads in pairs and makes assignments of such reads based on the combined bit scores of their matches to reference sequences. Using this new software in a simulation study, we investigate the use of Illumina paired-sequencing in taxonomical analysis and compare the performance of single reads, short clones and long clones. In addition, we also compare against simulated Roche-454 sequencing runs.</p> <p>Conclusion</p> <p>This work shows that paired reads perform better than single reads, as expected, but also, perhaps slightly less obviously, that long clones allow more specific assignments than short ones. A new version of the program MEGAN that explicitly takes paired reads into account is available from our website.</p

    A phylogenetic generalized hidden Markov model for predicting alternatively spliced exons

    Get PDF
    BACKGROUND: An important challenge in eukaryotic gene prediction is accurate identification of alternatively spliced exons. Functional transcripts can go undetected in gene expression studies when alternative splicing only occurs under specific biological conditions. Non-expression based computational methods support identification of rarely expressed transcripts. RESULTS: A non-expression based statistical method is presented to annotate alternatively spliced exons using a single genome sequence and evidence from cross-species sequence conservation. The computational method is implemented in the program ExAlt and an analysis of prediction accuracy is given for Drosophila melanogaster. CONCLUSION: ExAlt identifies the structure of most alternatively spliced exons in the test set and cross-species sequence conservation is shown to improve the precision of predictions. The software package is available to run on Drosophila genomes to search for new cases of alternative splicing

    The C. elegans H3K27 Demethylase UTX-1 Is Essential for Normal Development, Independent of Its Enzymatic Activity

    Get PDF
    Epigenetic modifications influence gene expression and provide a unique mechanism for fine-tuning cellular differentiation and development in multicellular organisms. Here we report on the biological functions of UTX-1, the Caenorhabditis elegans homologue of mammalian UTX, a histone demethylase specific for H3K27me2/3. We demonstrate that utx-1 is an essential gene that is required for correct embryonic and postembryonic development. Consistent with its homology to UTX, UTX-1 regulates global levels of H3K27me2/3 in C. elegans. Surprisingly, we found that the catalytic activity is not required for the developmental function of this protein. Biochemical analysis identified UTX-1 as a component of a complex that includes SET-16(MLL), and genetic analysis indicates that the defects associated with loss of UTX-1 are likely mediated by compromised SET-16/UTX-1 complex activity. Taken together, these results demonstrate that UTX-1 is required for many aspects of nematode development; but, unexpectedly, this function is independent of its enzymatic activity

    Modulation of Transcriptional and Inflammatory Responses in Murine Macrophages by the Mycobacterium tuberculosis Mammalian Cell Entry (Mce) 1 Complex

    Get PDF
    The outcome of many infections depends on the initial interactions between agent and host. Aiming at elucidating the effect of the M. tuberculosis Mce1 protein complex on host transcriptional and immunological responses to infection with M. tuberculosis, RNA from murine macrophages at 15, 30, 60 min, 4 and 10 hrs post-infection with M. tuberculosis H37Rv or Δ-mce1 H37Rv was analyzed by whole-genome microarrays and RT-QPCR. Immunological responses were measured using a 23-plex cytokine assay. Compared to uninfected controls, 524 versus 64 genes were up-regulated by 15 min post H37Rv- and Δ-mce1 H37Rv-infection, respectively. By 15 min post-H37Rv infection, a decline of 17 cytokines combined with up-regulation of Ccl24 (26.5-fold), Clec4a2 (23.2-fold) and Pparγ (10.5-fold) indicated an anti-inflammatory response initiated by IL-13. Down-regulation of Il13ra1 combined with up-regulation of Il12b (30.2-fold), suggested switch to a pro-inflammatory response by 4 hrs post H37Rv-infection. Whereas no significant change in cytokine concentration or transcription was observed during the first hour post Δ-mce1 H37Rv-infection, a significant decline of IL-1b, IL-9, IL-13, Eotaxin and GM-CSF combined with increased transcription of Il12b (25.1-fold) and Inb1 (17.9-fold) by 4 hrs, indicated a pro-inflammatory response. The balance between pro-and anti-inflammatory responses during the early stages of infection may have significant bearing on outcome

    CodingQuarry: Highly accurate hidden Markov model gene prediction in fungal genomes using RNA-seq transcripts

    Get PDF
    Background: The impact of gene annotation quality on functional and comparative genomics makes gene prediction an important process, particularly in non-model species, including many fungi. Sets of homologous protein sequences are rarely complete with respect to the fungal species of interest and are often small or unreliable, especially when closely related species have not been sequenced or annotated in detail. In these cases, protein homology-based evidence fails to correctly annotate many genes, or significantly improve ab initio predictions. Generalised hidden Markov models (GHMM) have proven to be invaluable tools in gene annotation and, recently, RNA-seq has emerged as a cost-effective means to significantly improve the quality of automated gene annotation. As these methods do not require sets of homologous proteins, improving gene prediction from these resources is of benefit to fungal researchers. While many pipelines now incorporate RNA-seq data in training GHMMs, there has been relatively little investigation into additionally combining RNA-seq data at the point of prediction, and room for improvement in this area motivates this study. Results: CodingQuarry is a highly accurate, self-training GHMM fungal gene predictor designed to work with assembled, aligned RNA-seq transcripts. RNA-seq data informs annotations both during gene-model training and in prediction. Our approach capitalises on the high quality of fungal transcript assemblies by incorporating predictions made directly from transcript sequences. Correct predictions are made despite transcript assembly problems, including those caused by overlap between the transcripts of adjacent gene loci. Stringent benchmarking against high-confidence annotation subsets showed CodingQuarry predicted 91.3% of Schizosaccharomyces pombe genes and 90.4% of Saccharomyces cerevisiae genes perfectly. These results are 4-5% better than those of AUGUSTUS, the next best performing RNA-seq driven gene predictor tested. Comparisons against whole genome Sc. pombe and S. cerevisiae annotations further substantiate a 4-5% improvement in the number of correctly predicted genes. Conclusions: We demonstrate the success of a novel method of incorporating RNA-seq data into GHMM fungal gene prediction. This shows that a high quality annotation can be achieved without relying on protein homology or a training set of genes. CodingQuarry is freely available (https://sourceforge.net/projects/codingquarry/), and suitable for incorporation into genome annotation pipelines

    University-level practical activities in bioinformatics benefit voluntary groups of pupils in the last 2 years of school

    Get PDF
    This work was supported in part by the Science and Technology Facilities Council under grant ST/M000435/1 to Daniel Barker.Background Bioinformatics—the use of computers in biology—is of major and increasing importance to biological sciences and medicine. We conducted a preliminary investigation of the value of bringing practical, university-level bioinformatics education to the school level. We conducted voluntary activities for pupils at two schools in Scotland (years S5 and S6; pupils aged 15–17). We used material originally developed for an optional final-year undergraduate module and now incorporated into 4273π, a resource for teaching and learning bioinformatics on the low-cost Raspberry Pi computer. Results Pupils’ feedback forms suggested our activities were beneficial. During the course of the activity, they provide strong evidence of increase in the following: pupils’ perception of the value of computers within biology; their knowledge of the Linux operating system and the Raspberry Pi; their willingness to use computers rather than phones or tablets; their ability to program a computer and their ability to analyse DNA sequences with a computer. We found no strong evidence of negative effects. Conclusions Our preliminary study supports the feasibility of bringing university-level, practical bioinformatics activities to school pupils.Publisher PDFPeer reviewe
    corecore