721 research outputs found
Special features of RAD Sequencing data:implications for genotyping
Restriction site-associated DNA Sequencing (RAD-Seq) is an economical and efficient method for SNP discovery and genotyping. As with other sequencing-by-synthesis methods, RAD-Seq produces stochastic count data and requires sensitive analysis to develop or genotype markers accurately. We show that there are several sources of bias specific to RAD-Seq that are not explicitly addressed by current genotyping tools, namely restriction fragment bias, restriction site heterozygosity and PCR GC content bias. We explore the performance of existing analysis tools given these biases and discuss approaches to limiting or handling biases in RAD-Seq data. While these biases need to be taken seriously, we believe RAD loci affected by them can be excluded or processed with relative ease in most cases and that most RAD loci will be accurately genotyped by existing tools
Improving mammalian genome scaffolding using large insert mate-pair next-generation sequencing
BACKGROUND: Paired-tag sequencing approaches are commonly used for the analysis of genome structure. However, mammalian genomes have a complex organization with a variety of repetitive elements that complicate comprehensive genome-wide analyses. RESULTS: Here, we systematically assessed the utility of paired-end and mate-pair (MP) next-generation sequencing libraries with insert sizes ranging from 170 bp to 25 kb, for genome coverage and for improving scaffolding of a mammalian genome (Rattus norvegicus). Despite a lower library complexity, large insert MP libraries (20 or 25 kb) provided very high physical genome coverage and were found to efficiently span repeat elements in the genome. Medium-sized (5, 8 or 15 kb) MP libraries were much more efficient for genome structure analysis than the more commonly used shorter insert paired-end and 3 kb MP libraries. Furthermore, the combination of medium- and large insert libraries resulted in a 3-fold increase in N50 in scaffolding processes. Finally, we show that our data can be used to evaluate and improve contig order and orientation in the current rat reference genome assembly. CONCLUSIONS: We conclude that applying combinations of mate-pair libraries with insert sizes that match the distributions of repetitive elements improves contig scaffolding and can contribute to the finishing of draft genomes
Deconvoluting simulated metagenomes: The performance of hard- and softclustering algorithms applied to metagenomic chromosome conformation capture (3C)
© 2016 DeMaere and Darling. Background. Chromosome conformation capture, coupled with high throughputDNA sequencing in protocols like Hi-C and 3C-seq, has been proposed as a viable means of generating data to resolve the genomes of microorganisms living in naturally occuring environments. Metagenomic Hi-C and 3C-seq datasets have begun to emerge, but the feasibility of resolving genomes when closely related organisms (strain-level diversity) are present in the sample has not yet been systematically characterised. Methods. We developed a computational simulation pipeline for metagenomic 3C and Hi-C sequencing to evaluate the accuracy of genomic reconstructions at, above, and below an operationally defined species boundary. We simulated datasets and measured accuracy over a wide range of parameters. Five clustering algorithms were evaluated (2 hard, 3 soft) using an adaptation of the extended B-cubed validation measure. Results. When all genomes in a sample are below 95% sequence identity, all of the tested clustering algorithms performed well. When sequence data contains genomes above 95% identity (our operational definition of strain-level diversity), a naive soft- clustering extension of the Louvain method achieves the highest performance. Discussion. Previously, only hard-clustering algorithms have been applied to metage- nomic 3C and Hi-C data, yet none of these perform well when strain-level diversity exists in a metagenomic sample. Our simple extension of the Louvain method performed the best in these scenarios, however, accuracy remained well below the levels observed for samples without strain-level diversity. Strain resolution is also highly dependent on the amount of available 3C sequence data, suggesting that depth of sequencing must be carefully considered during experimental design. Finally, there appears to be great scope to improve the accuracy of strain resolution through further algorithm development
Comparative genomics of the pathogenic ciliate Ichthyophthirius multifiliis, its free-living relatives and a host species provide insights into adoption of a parasitic lifestyle and prospects for disease control
BACKGROUND: Ichthyophthirius multifiliis, commonly known as Ich, is a highly pathogenic ciliate responsible for 'white spot', a disease causing significant economic losses to the global aquaculture industry. Options for disease control are extremely limited, and Ich's obligate parasitic lifestyle makes experimental studies challenging. Unlike most well-studied protozoan parasites, Ich belongs to a phylum composed primarily of free-living members. Indeed, it is closely related to the model organism Tetrahymena thermophila. Genomic studies represent a promising strategy to reduce the impact of this disease and to understand the evolutionary transition to parasitism.
RESULTS: We report the sequencing, assembly and annotation of the Ich macronuclear genome. Compared with its free-living relative T. thermophila, the Ich genome is reduced approximately two-fold in length and gene density and three-fold in gene content. We analyzed in detail several gene classes with diverse functions in behavior, cellular function and host immunogenicity, including protein kinases, membrane transporters, proteases, surface antigens and cytoskeletal components and regulators. We also mapped by orthology Ich's metabolic pathways in comparison with other ciliates and a potential host organism, the zebrafish Danio rerio.
CONCLUSIONS: Knowledge of the complete protein-coding and metabolic potential of Ich opens avenues for rational testing of therapeutic drugs that target functions essential to this parasite but not to its fish hosts. Also, a catalog of surface protein-encoding genes will facilitate development of more effective vaccines. The potential to use T. thermophila as a surrogate model offers promise toward controlling 'white spot' disease and understanding the adaptation to a parasitic lifestyle
A Genomic Investigation of Divergence Between Tuna Species
Effective management and conservation of marine pelagic fishes is heavily dependent on a robust understanding of their population structure, their evolutionary history, and the delineation of appropriate management units. The Yellowfin tuna (Thunnus albacares) and the Blackfin tuna (Thunnus atlanticus) are two exploited epipelagic marine species with overlapping ranges in the tropical and sub-tropical Atlantic Ocean. This work analyzed genome-wide genetic variation of both species in the Atlantic basin to investigate the occurrence of population subdivision and adaptive variation. A de novo assembly of the Blackfin tuna genome was generated using Illumina paired-end sequencing data and applied as a reference for population genomic analysis of specimens from 9 localities spanning most of the Blackfin tuna range. Analysis suggested the presence of four weakly differentiated units corresponding to the northwestern Atlantic Ocean, Gulf of Mexico, Caribbean Sea, and southwestern Atlantic Ocean, respectively. Significant spatial autocorrelation of genotypes was observed for specimens collected within 800 km of each other. A high-quality genome assembly generated for the Yellowfin tuna using PacBio and Illumina sequences was scaffolded by a linkage map developed through analysis of the segregation of genome wide Single Nucleotide Polymorphisms in 164 larvae offspring from a single pair produced by controlled breeding. The genome assembly was used as a reference for population genomic analysis of juvenile specimens from the 4 main nursery areas hypothesized in the Atlantic Ocean basin. Analyses corroborated previously reported population subdivision between the east and west Atlantic Ocean, but also suggested subdivision associated with individual nursery areas within the east and west regions. Draft reference assemblies were generated for Albacore, Bigeye and Longtail tunas and used in combination with the Yellowfin and Blackfin tuna genomes obtained in this work and existing assemblies for bluefin tunas in preliminary analyses of genome wide variation between species of the Thunnus genus. Whole-genome derived SNP-based phylogenetic analysis of the Thunnus genus suggests phylogenetic relationships may be more complex than suggested in earlier work based on Restriction-site Associated DNA sequencing or muscle transcriptome sequencing and prompt for further analysis of the genus using a more comprehensive sampling of taxa in each oceanic basin
Genomic characterizations of Xanthomonas cucurbitae and using comparative genomics to predict novel microbe-associated molecular patterns in Xanthomonas
Bacterial spot is a major plant disease caused by many plant-pathogenic members of the genus Xanthomonas. While each species is narrow in host range, bacterial spot Xanthomonads infect a large variety of plant hosts, leading to large economic losses for farmers around the world. Although Xanthomonas utilizes a wide array of virulence and pathogenicity factors to infect their hosts, plants have a range of methods to recognize invaders and prevent infection. Understanding the genomic and molecular interactions between Xanthomonas and their hosts are an important part of developing effective crop protection strategies and breeding plants for resistance.
While X. cucurbitae has been identified as the causal agent of bacterial spot on cucurbits, no genomic-level analyses have been carried out regarding the pathogen. Using the first reference quality X. cucurbitae genome assembly, an RNA-seq analysis was carried out to assess virulence characteristics of the pathogen. By analyzing the X. cucurbitae transcriptome, we observed behavioral changes between nutrient-sufficient and host-mimicking conditions, as well as the upregulation of genes related to virulence and pathogenicity. We also identified virulence genes likely to be essential in successful bacterial spot infection. In addition, a RAD-seq analysis was performed to characterize populations clusters of X. cucurbitae isolated throughout the Midwestern United States. We revealed multiple populations of X. cucurbitae present throughout the region and demonstrated clear genetic differences between these populations using population genetics analyses. These studies demonstrate clear value in future genomic studies regarding X. cucurbitae.
X. euvesicatoria and X. perforans are two bacterial spot Xanthomonads affecting tomatoes and peppers. We conducted a comparative genomics study in X. euvesicatoria and X. perforans populations to identify genes under selection pressure, and to characterize potential genes involved in plant-pathogen interactions. By calculating the test statistic Tajima’s D, we found evidence of purifying selection throughout the genomes of both bacterial spot Xanthomonads. In addition, Tajima’s D was successfully able to detect known microbe-associated molecular patterns (MAMPs), and we were able to characterize the recognition of these MAMPs between species in luminol-based reactive oxygen species (ROS) assays. While this study was not yet able to identify novel MAMPs, we show that Tajima’s D is a powerful tool in detecting genes that are important to plant-pathogen interactions
Recommended from our members
Computational methods for single cell RNA and genome assembly resolution using genetic variation
Genetic variation and natural selection have driven the evolutionary history on this planet and are responsible for creating us and all other life as we know it. Over the past several decades, the genomic revolution has allowed us to assess population variation across humans and other species and use that to link genotypes with phenotypes and infer evolutionary histories. In this thesis, I explore computational methods for using genetic variation to demultiplex and disambiguate complex data.
In single cell RNAseq, problems of batch effects, doublets, and ambient RNA are each sources of noise that impede our ability to infer the functional states of cells and compare them between experiments. One new popular new experimental design promising to solve each of these while also reducing experimental costs is mixturing multiple individuals' cells into a single experiment. In chapter 2, I present a method for clustering cells by genotype, calling doublets, and using the cross-genotype signal in singletons to estimate and remove ambient RNA. I compare this methods to other existing methods including one that requires \textit{a priori} information about the genotypes, and two which do not. I find that my method outperforms each of these methods across a wide range of data parameters and sample types.
In genome assembly, the recent higher throughput and lower cost of long read sequencing has revolutionized our ability to create reference quality genomes and has revitalized the assembly community. Now, massive efforts are taking place in the Darwin Tree of Life project and the Earth Biogenome project to create reference genomes for all multicelular eukaryotic life. This will create a scientific resource for the next generation of biological science, will serve as a conservation of data that could otherwise be lost in this time of mass extinction, and will allow for a much more broad understanding of evolution and the evolutionary history of life on Earth. While much progress has been made in data quality and assembly algorithms, some problems still exist. Until recently, the DNA input requirements for long read sequencing technologies made it impossible to sequence single individuals of these species with long reads. Also, high heterozygosity makes assembly more difficult due to the inherent ambiguity between heterozygous sequence versus paralogous sequence when confronted with inexact homology. One solution to the DNA input requirements would be to pool individuals, but this only increases the heterozygosity of the sample and reduces assembly quality. In chapter 3, we present the first high quality assembly of a single mosquito using new library preparation methods with reduced DNA requirements. This reduces the number of haplotypes to two, improving the assembly quality. In chapter 4, we further address the problems brought on by heterozygosity in assembly. I present a suite of tools that use the phasing consistency of multiple heterozygous sequences as a signal for physical linkage, thus using genetic variation to our advantage rather than as a challenge to overcome. This tool creates phased, linked assemblies and phasing aware scaffolding. Further, I provide a tool for phasing aware scaffolding on existing assemblies. This includes a novel haplotype phasing algorithm with some unique beneficial properties. It is robust to non-heterozygous variants as input and can detect and correct those genotypes. And it naturally extends to polyploid genomes.Wellcome Trus
Prevalence and relationship of endosymbiotic Wolbachia in the butterfly genus Erebia
Wolbachia is an endosymbiont common to most invertebrates, which can have significant evolutionary implications for its host species by acting as a barrier to gene flow. Despite the importance of Wolbachia, still little is known about its prevalence and diversification pattern among closely related host species. Wolbachia strains may phylogenetically coevolve with their hosts, unless horizontal host-switches are particularly common. We address these issues in the genus Erebia, one of the most diverse Palearctic butterfly genera.; We sequenced the Wolbachia genome from a strain infecting Erebia cassioides and showed that it belongs to the Wolbachia supergroup B, capable of infecting arthropods from different taxonomic orders. The prevalence of Wolbachia across 13 closely related Erebia host species based on extensive population-level genetic data revealed that multiple Wolbachia strains jointly infect all investigated taxa, but with varying prevalence. Finally, the phylogenetic relationships of Wolbachia strains are in some cases significantly associated to that of their hosts, especially among the most closely related Erebia species, demonstrating mixed evidence for phylogenetic coevolution.; Closely related host species can be infected by closely related Wolbachia strains, evidencing some phylogenetic coevolution, but the actual pattern of infection more often reflects historical or contemporary geographic proximity among host species. Multiple processes, including survival in distinct glacial refugia, recent host shifts in sympatry, and a loss of Wolbachia during postglacial range expansion seem to have jointly shaped the complex interactions between Wolbachia evolution and the diversification of its host among our studied Erebia species
Initial sequencing and analysis of the human genome
The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/62798/1/409860a0.pd
- …