137 research outputs found

    Snpdat: easy and rapid annotation of results from de novo snp discovery projects for model and non-model organisms

    Get PDF
    peer-reviewedBackground: Single nucleotide polymorphisms (SNPs) are the most abundant genetic variant found in vertebrates and invertebrates. SNP discovery has become a highly automated, robust and relatively inexpensive process allowing the identification of many thousands of mutations for model and non-model organisms. Annotating large numbers of SNPs can be a difficult and complex process. Many tools available are optimised for use with organisms densely sampled for SNPs, such as humans. There are currently few tools available that are species non-specific or support non-model organism data. Results: Here we present SNPdat, a high throughput analysis tool that can provide a comprehensive annotation of both novel and known SNPs for any organism with a draft sequence and annotation. Using a dataset of 4,566 SNPs identified in cattle using high-throughput DNA sequencing we demonstrate the annotations performed and the statistics that can be generated by SNPdat. Conclusions: SNPdat provides users with a simple tool for annotation of genomes that are either not supported by other tools or have a small number of annotated SNPs available. SNPdat can also be used to analyse datasets from organisms which are densely sampled for SNPs. As a command line tool it can easily be incorporated into existing SNP discovery pipelines and fills a niche for analyses involving non-model organisms that are not supported by many available SNP annotation tools. SNPdat will be of great interest to scientists involved in SNP discovery and analysis projects, particularly those with limited bioinformatics experience

    Spherical:an iterative workflow for assembling metagenomic datasets

    Get PDF
    BACKGROUND: The consensus emerging from the study of microbiomes is that they are far more complex than previously thought, requiring better assemblies and increasingly deeper sequencing. However, current metagenomic assembly techniques regularly fail to incorporate all, or even the majority in some cases, of the sequence information generated for many microbiomes, negating this effort. This can especially bias the information gathered and the perceived importance of the minor taxa in a microbiome. RESULTS: We propose a simple but effective approach, implemented in Python, to address this problem. Based on an iterative methodology, our workflow (called Spherical) carries out successive rounds of assemblies with the sequencing reads not yet utilised. This approach also allows the user to reduce the resources required for very large datasets, by assembling random subsets of the whole in a "divide and conquer" manner. CONCLUSIONS: We demonstrate the accuracy of Spherical using simulated data based on completely sequenced genomes and the effectiveness of the workflow at retrieving lost information for taxa in three published metagenomics studies of varying sizes. Our results show that Spherical increased the amount of reads utilized in the assembly by up to 109% compared to the base assembly. The additional contigs assembled by the Spherical workflow resulted in a significant (P?<?0.05) changes in the predicted taxonomic profile of all datasets analysed. Spherical is implemented in Python 2.7 and freely available for use under the MIT license. Source code and documentation is hosted publically at: https://github.com/thh32/Spherical .publishersversionPeer reviewe

    Whole genome association study identifies regions of the bovine genome and biological pathways involved in carcass trait performance in Holstein-Friesian cattle

    Get PDF
    peer-reviewedBackground Four traits related to carcass performance have been identified as economically important in beef production: carcass weight, carcass fat, carcass conformation of progeny and cull cow carcass weight. Although Holstein-Friesian cattle are primarily utilized for milk production, they are also an important source of meat for beef production and export. Because of this, there is great interest in understanding the underlying genomic structure influencing these traits. Several genome-wide association studies have identified regions of the bovine genome associated with growth or carcass traits, however, little is known about the mechanisms or underlying biological pathways involved. This study aims to detect regions of the bovine genome associated with carcass performance traits (employing a panel of 54,001 SNPs) using measures of genetic merit (as predicted transmitting abilities) for 5,705 Irish Holstein-Friesian animals. Candidate genes and biological pathways were then identified for each trait under investigation. Results Following adjustment for false discovery (q-value  0.5) with at least one of the four traits. In total, 557 unique bovine genes, which mapped to 426 human orthologs, were within 500kbs of QTL found associated with a trait using the Bayesian approach. Using this information, 24 significantly over-represented pathways were identified across all traits. The most significantly over-represented biological pathway was the peroxisome proliferator-activated receptor (PPAR) signaling pathway. Conclusions A large number of genomic regions putatively associated with bovine carcass traits were detected using two different statistical approaches. Notably, several significant associations were detected in close proximity to genes with a known role in animal growth such as glucagon and leptin. Several biological pathways, including PPAR signaling, were shown to be involved in various aspects of bovine carcass performance. These core genes and biological processes may form the foundation for further investigation to identify causative mutations involved in each trait. Results reported here support previous findings suggesting conservation of key biological processes involved in growth and metabolism

    Genome Phylogenies Indicate a Meaningful a-Proteobacterial Phylogeny and Support a Grouping of the Mitochondria with the Rickettsiales

    Get PDF
    Placement of the mitochondrial branch on the tree of life has been problematic. Sparse sampling, the uncertainty of how lateral gene transfer might overwrite phylogenetic signals, and the uncertainty of phylogenetic inference have all contributed to the issue. Here we address this issue using a supertree approach and completed genomic sequences. We first determine that a sensible a-proteobacterial phylogenetic tree exists and that it can confidently be inferred using orthologous genes. We show that congruence across these orthologous gene trees is significantly better than might be expected by random chance. There is some evidence of horizontal gene transfer within the a-proteobacteria, but it appears to be restricted to a minority of genes (;23%) most of whom (;74%) can be categorized as operational. This means that placement of the mitochondrion should not be excessively hampered by interspecies gene transfer. We then show that there is a consistently strong signal for placement of the mitochondrion on this tree and that this placement is relatively insensitive to methodological approach or data set. A concatenated alignment was created consisting of 15 mitochondrion-encoded proteins that are unlikely to have undergone any lateral gene transfer in the timeline under consideration. This alignment infers that the sister group of the mitochondria, for the taxa that have been sampled, is the order Rickettsiales

    The Opisthokonta and the Ecdysozoa May Not Be Clades: Stronger Support for the Grouping of Plant and Animal than for Animal and Fungi and Stronger Support for the Coelomata than Ecdysozoa

    Get PDF
    In considering the best possible solutions for answering phylogenetic questions from genomic sequences, we have chosen a strategy that we suggest is superior to others that have gone previously. We have ignored multigene families and instead have used single-gene families. This minimizes the inadvertent analysis of paralogs. We have employed strict data controls and have reasoned that if a protein is not capable of recovering the uncontroversial parts of a phylogenetic tree, then why should we use it for the more controversial parts? We have sliced and diced the data in as many ways as possible in order to uncover the signals in that data. Using this strategy, we have tested two controversial hypotheses concerning eukaryotic phylogenetic relationships: the placement of arthropoda and nematodes and the relationships of animals, plants, and fungi. We have constructed phylogenetic trees from 780 single-gene families from 10 completed genomes and amalgamated these into a single supertree. We have also carried out a total evidence analysis on the only universally distributed protein families that can accurately reconstruct the uncontroversial parts of the phylogenetic tree: a total of five families. In doing so, we ignore the majority of single-gene families that are universally distributed as they do not have the appropriate signals to recover the uncontroversial parts of the tree. We have also ignored every protein that has ever been used previously to address this issue, simply because none of them meet our strict criteria. Using these data controls, site stripping, and multiple analyses, 24 out of 26 analyses strongly support the grouping of vertebrates with arthropods (Coelomata hypothesis) and plants with animals. In the other two analyses, the data were ambivalent. The latter finding overturns an 11-year theory of Eukaryotic evolution; the first confirms what has already been said by others. In the light of this new tree, we reanalyze the evolution of intron gain and loss in the rpL14 gene and find that it is much more compatible with the hypothesis presented here than with the Opisthokonta hypothesis

    Determining the culturability of the rumen bacterial microbiome

    Get PDF
    The goal of the Hungate1000 project is to generate a reference set of rumen microbial genome sequences. Toward this goal we have carried out a meta-analysis using information from culture collections, scientific literature, and the NCBI and RDP databases and linked this with a comparative study of several rumen 16S rRNA gene-based surveys. In this way we have attempted to capture a snapshot of rumen bacterial diversity to examine the culturable fraction of the rumen bacterial microbiome. Our analyses have revealed that for cultured rumen bacteria, there are many genera without a reference genome sequence. Our examination of culture-independent studies highlights that there are few novel but many uncultured taxa within the rumen bacterial microbiome. Taken together these results have allowed us to compile a list of cultured rumen isolates that are representative of abundant, novel and core bacterial species in the rumen. In addition, we have identified taxa, particularly within the phylum Bacteroidetes, where further cultivation efforts are clearly required. This information is being used to guide the isolation efforts and selection of bacteria from the rumen microbiota for sequencing through the Hungate1000
    corecore