124 research outputs found

    Recovering complete and draft population genomes from metagenome datasets.

    Get PDF
    Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution

    Recovering complete and draft population genomes from metagenome datasets

    Get PDF
    Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution

    Genome Assembly: Novel Applications by Harnessing Emerging Sequencing Technologies and Graph Algorithms

    Get PDF
    Genome assembly is a critical first step for biological discovery. All current sequencing technologies share the fundamental limitation that segments read from a genome are much shorter than even the smallest genomes. Traditionally, whole- genome shotgun (WGS) sequencing over-samples a single clonal (or inbred) target chromosome with segments from random positions. The amount of over-sampling is known as the coverage. Assembly software then reconstructs the target. So called next-generation (or second-generation) sequencing has reduced the cost and increased throughput exponentially over first-generation sequencing. Unfortunately, next-generation sequences present their own challenges to genome assembly: (1) they require amplification of source DNA prior to sequencing leading to artifacts and biased coverage of the genome; (2) they produce relatively short reads: 100bp- 700bp; (3) the sizeable runtime of most second-generation instruments is prohibitive for applications requiring rapid analysis, with an Illumina HiSeq 2000 instrument requiring 11 days for the sequencing reaction. Recently, successors to the second-generation instruments (third-generation) have become available. These instruments promise to alleviate many of the down- sides of second-generation sequencing and can generate multi-kilobase sequences. The long sequences have the potential to dramatically improve genome and transcriptome assembly. However, the high error rate of these reads is challenging and has limited their use. To address this limitation, we introduce a novel correction algorithm and assembly strategy that utilizes shorter, high-identity sequences to correct the error in single-molecule sequences. Our approach achieves over 99% read accuracy and produces substantially better assemblies than current sequencing strategies. The availability of cheaper sequencing has made new sequencing targets, such as multiple displacement amplified (MDA) single-cells and metagenomes, popular. Current algorithms assume assembly of a single clonal target, an assumption that is violated in these sequencing projects. We developed Bambus 2, a new scaffolder that works for metagenomics and single cell datasets. It can accurately detect repeats without assumptions about the taxonomic composition of a dataset. It can also identify biological variations present in a sample. We have developed a novel end-to-end analysis pipeline leveraging Bambus 2. Due to its modular nature, it is applicable to clonal, metagenomic, and MDA single-cell targets and allows a user to rapidly go from sequences to assembly, annotation, genes, and taxonomic info. We have incorporated a novel viewer, allowing a user to interactively explore the variation present in a genomic project on a laptop. Together, these developments make genome assembly applicable to novel targets while utilizing emerging sequencing technologies. As genome assembly is critical for all aspects of bioinformatics, these developments will enable novel biological discovery

    De novo sequencing, annotation, and characterization of the genome of Lavandula angustifolia (Lavender)

    Get PDF
    Lavender (Lavandula angustifolia) is a perennial plant native to the Mediterranean region, best known for its essential oil (EOs) that have numerous applications in the pharmaceutical, cosmetic and perfume industries. We performed sequencing of the L. angustifolia genome and report a detailed analysis of the assembled genome, focusing on genome size, ploidy, and repeat content. The lavender genome was estimated to be around 870 Mbp (1C=0.96 pg) using a quantitative PCR method. Genome size was further validated through analysis of raw genome sequences using Kmergenie, providing a conclusive end to the lavender genome size dispute. The repeat element composition of the genome was analyzed using de novo (RepeatModeler) and library-based methods (RepeatMasker) and was estimated to be around 45% of the full genome or ~57% of the non-gap genome sequences. Further characterization revealed Long Terminal Repeat (LTRs) retrotransposons as the major repeat type, which contribute to ~18% of the genome, followed by DNA transposons at ~8.5% of the genome. Interestingly, unlike most other plant genomes, the lavender genome has many more Copia than Gypsy elements, both showing a trend of recent increasing activity. Furthermore, these LTRs, especially Copia elements, have shown active participation in gene function including genes for essential oil production, with Copia elements contributing to ~30 % of the coding DNA sequence (CDS) regions, in addition to promoter, intron and untranslated (UTR) regions. The lavender genome also has an unusually high number of miniature inverted-repeat transposable elements (MITEs) compared to other model plant genomes, with the number being ~88,000, which is close to that (~90,000) of the much larger maize genome. Analysis also revealed the lavender genome with a high proportion at polyploidy level, which is strongly biased towards regions containing essential oil genes, with polyploidization events in the lavender genome occurred between 16 to 41 Mya. In conclusion, our results reveal the lavender genome to be highly duplicated and with past and ongoing active retrotransposition, making the genome optimized for EO production

    Comparative genomics of the skin Staphylococci

    Get PDF
    The human skin is a complex ecosystem which supports a diverse population of bacteria. Comparative genomic analyses are increasingly being used to explore the functional potential of this bacterial population . The ubiquity of Staphylococcus on human skin means this genus represents the most well-studied of the microbial skin residents, however most analysis has focussed on the significant clinical pathogenic species S. epidermidis and S. aureus. To investigate the biology of S. hominis, the second most frequent Staphylococcus species isolated from human skin after S. epidermidis, seven isolates were sequenced using Illumina and PacBio technologies. An intraspecies comparative genomic analysis was performed with these and several publically available S. hominis genomes to identify core and accessory genes. The complement of encoded cell wall-anchored proteins was studied using bioinformatics to describe the range of surface-attached proteins and revealed a unique species set. Investigation also revealed the presence of S. hominis genes described as virulence factors in S. aureus and S. epidermidis. This further highlights non-pathogenic staphylococci as a reservoir of genes, which can be exchanged with pathogenic S. aureus, and the potential for recruitment of these genes into virulence pathways. Interspecies comparative analysis of twenty Staphylococcus species, based on clusters of orthologous genes, confirmed the designation of staphylococcal species groups previously established by DNA-DNA hybridisation and single gene analysis methods. The bioinformatic algorithm randomForest was used to identify drivers forming species groups based on the orthologous gene cluster analysis leading to a subset of orthologous clusters defined as being contributory. This interspecies analysis also revealed diversity between the staphylococcal species groups with respect to their response mechanisms for antimicrobial peptide (AMP) resistance. Specifically, the presence or absence of the BraRS two-component system (TCS) was identified to be one of the important drivers differentiating a nine species member group that included S. aureus, S. hominis and S. epidermidis.. Experimental evolution in the presence of the lantibiotic nisin was used to dissect differences in the global response of the BraRS-positive species S. hominis and S. aureus, from the BraRS-negative species S. saprophyticus, . Identified SNPs from the resistance evolution revealed complex relationships between the regulons of staphylococcal TCSs and identified that YurK should be investigated for a potential role in AMP resistance of S. aureus and S. hominis

    Peto\u27s Paradox and the Evolution of Cancer Suppression

    Get PDF
    In order to successfully build and maintain a multicellular body, somatic cells must be constrained from proliferating uncontrollably and destroying the organism. If all mammalian cells were equally susceptible to oncogenic mutations and had identical tumor suppressor mechanisms, one would expect that the risk of cancer would be proportional to the body size and lifespan of a species. This is because a greater number of cells and cell divisions over a lifetime would increase the chance of accumulating mutations that result in malignant transformation. Peto’s paradox is the clash between the theory that cancer incidence should increase with body size and lifespan, and the observation that it does not. In this thesis, I present the first comprehensive survey of empirical evidence across mammals in support of Peto’s paradox in addition to computational models that explore the numerous hypotheses that may help resolve the paradox. I provide a detailed examination of tumor suppression in African elephants (Loxodonta africana) and show that the genome contains redundant copies of the tumor suppressor gene TP53. I give evidence that these redundant copies are actively transcribed and also observe an increased apoptotic response after exposure to ionizing radiation, which may be linked to the expression of these genes. Few genomes of large, long-lived organisms are currently available, which motivated my work to provide the sequence and de novo assembly of the humpback whale (Megaptera novaeangliae) genome. In this genome, I discovered a set of tumor suppressor genes that have evolved at an accelerated rate along the whale lineage, which is suggestive of adaptation. Additionally, I find one gene that has undergone convergent evolution between the African elephant and the humpback whale. The overarching goal of my research is to gain a better understanding of how evolution has suppressed cancer in large, long-lived organisms in the hopes of ultimately developing improved cancer prevention in humans

    A genomic perspective on the potential of Actinobacillus succinogenes for industrial succinate production

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Succinate is produced petrochemically from maleic anhydride to satisfy a small specialty chemical market. If succinate could be produced fermentatively at a price competitive with that of maleic anhydride, though, it could replace maleic anhydride as the precursor of many bulk chemicals, transforming a multi-billion dollar petrochemical market into one based on renewable resources. <it>Actinobacillus succinogenes </it>naturally converts sugars and CO<sub>2 </sub>into high concentrations of succinic acid as part of a mixed-acid fermentation. Efforts are ongoing to maximize carbon flux to succinate to achieve an industrial process.</p> <p>Results</p> <p>Described here is the 2.3 Mb <it>A. succinogenes </it>genome sequence with emphasis on <it>A. succinogenes</it>'s potential for genetic engineering, its metabolic attributes and capabilities, and its lack of pathogenicity. The genome sequence contains 1,690 DNA uptake signal sequence repeats and a nearly complete set of natural competence proteins, suggesting that <it>A. succinogenes </it>is capable of natural transformation. <it>A. succinogenes </it>lacks a complete tricarboxylic acid cycle as well as a glyoxylate pathway, and it appears to be able to transport and degrade about twenty different carbohydrates. The genomes of <it>A. succinogenes </it>and its closest known relative, <it>Mannheimia succiniciproducens</it>, were compared for the presence of known Pasteurellaceae virulence factors. Both species appear to lack the virulence traits of toxin production, sialic acid and choline incorporation into lipopolysaccharide, and utilization of hemoglobin and transferrin as iron sources. Perspectives are also given on the conservation of <it>A. succinogenes </it>genomic features in other sequenced Pasteurellaceae.</p> <p>Conclusions</p> <p>Both <it>A. succinogenes </it>and <it>M. succiniciproducens </it>genome sequences lack many of the virulence genes used by their pathogenic Pasteurellaceae relatives. The lack of pathogenicity of these two succinogens is an exciting prospect, because comparisons with pathogenic Pasteurellaceae could lead to a better understanding of Pasteurellaceae virulence. The fact that the <it>A. succinogenes </it>genome encodes uptake and degradation pathways for a variety of carbohydrates reflects the variety of carbohydrate substrates available in the rumen, <it>A. succinogenes</it>'s natural habitat. It also suggests that many different carbon sources can be used as feedstock for succinate production by <it>A. succinogenes</it>.</p

    Multiple Displacement Amplification and Whole Genome Sequencing for the Diagnosis of Infectious Diseases

    Get PDF
    This project was funded by Roche Diagnostics as a scientific studentship award and completed at Public Health England Centre for Infection.This project was funded by Roche Diagnostics as a scientific studentship award and completed at Public Health England Centre for Infection.Next-generation sequencing technologies are revolutionising our ability to characterise and investigate infectious diseases. Utilising the power of high throughput sequencing, this study reports, the development of a sensitive, non-PCR based, unbiased amplification method. Which allows the rapid and accurate sequencing of multiple microbial pathogens directly from clinical samples. The method employs ɸ29 DNA polymerase, a highly efficient enzyme able to produce strand displacement during the polymerisation process with high fidelity. Problems with DNA secondary structure were overcome and the method optimised to produce sufficient DNA to sequence from a single bacterial cell in two hours. Evidence was also found that the enzyme requires at least six bases of single stranded DNA to initiate replication, and is not capable of amplification from nicks. ɸ29 multiple displacement amplification was shown to be suitable for a range of GC contents and bacterial cell wall types as well as for viral pathogens. The method was shown to be able to provide relative quantification of mixed cells, and a method for quantification of viruses using a known standard was developed. To complement the novel molecular biology workflow, a data analysis pipeline was developed to allow pathogen identification and characterisation without prior knowledge of input. The use of de novo assemblies for annotation was shown to be equivalent to the use of polished reference genomes. Single cell φ29 MDA samples had better assembly and annotation than non-amplification controls, a novel finding which, when combined with the very long DNA fragments produced, has interesting implications for a variety of analytical procedures. A sampling process was developed to allow isolation and amplification of pathogens directly from clinical samples, with good concordance shown between this method and traditional testing. The process was tested on a variety of modelled and real clinical samples showing good application to sterile site infections, particularly bacteraemia models. Within these samples multiple bacterial, viral and parasitic pathogens were identified, showing good application across multiple infection types. Emerging pathogens were identified including Onchocerca volvulus within a CSF sample, and Sneathia sanguinegens within an STI sample. Use of ɸ29 MDA allows rapid and accurate amplification of whole pathogen genomes. When this is coupled with the sample processing developed here it is possible to detect the presence of pathogens in sterile sites with a sensitivity of a single genome copy.This project was funded by Roche Diagnostics as a scientific studentship award and completed at Public Health England Centre for Infection
    corecore