1,479 research outputs found

    Pleistocene range dynamics in the eastern Greater Cape Floristic Region: A case study of the Little Karoo endemic Berkheya cuneata (Asteraceae)

    Get PDF
    AbstractThe glacial–interglacial climate cycles of the Pleistocene played a significant role in dramatically altering species distributions across the globe. However, the climate of the Greater Cape Floristic Region is thought to have been decoupled from global fluctuations and the current Mediterranean climate remained relatively buffered during this period. Here we explore the roles of climate stability and the topographic complexity of the region on the range history of an endemic Little Karoo plant, Berkheya cuneata, using ensemble species distribution modelling and multi-locus phylogeography. The species distribution models projected onto downscaled climate simulation of the Last Glacial Maximum demonstrated a considerable range contraction and fragmentation into the western and eastern Little Karoo, separated by the Rooiberg inselberg. This population fragmentation is mirrored in the phylogeographic structuring of both chloroplast and nuclear DNA. These results suggest that sufficient climatic buffering coupled with regionally complex topography ensured the localised population persistence during Pleistocene climate cycles but these features have also promoted population vicariance in this, and likely other, Little Karoo lowland species

    Direct maximum parsimony phylogeny reconstruction from genotype data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Maximum parsimony phylogenetic tree reconstruction from genetic variation data is a fundamental problem in computational genetics with many practical applications in population genetics, whole genome analysis, and the search for genetic predictors of disease. Efficient methods are available for reconstruction of maximum parsimony trees from haplotype data, but such data are difficult to determine directly for autosomal DNA. Data more commonly is available in the form of genotypes, which consist of conflated combinations of pairs of haplotypes from homologous chromosomes. Currently, there are no general algorithms for the direct reconstruction of maximum parsimony phylogenies from genotype data. Hence phylogenetic applications for autosomal data must therefore rely on other methods for first computationally inferring haplotypes from genotypes.</p> <p>Results</p> <p>In this work, we develop the first practical method for computing maximum parsimony phylogenies directly from genotype data. We show that the standard practice of first inferring haplotypes from genotypes and then reconstructing a phylogeny on the haplotypes often substantially overestimates phylogeny size. As an immediate application, our method can be used to determine the minimum number of mutations required to explain a given set of observed genotypes.</p> <p>Conclusion</p> <p>Phylogeny reconstruction directly from unphased data is computationally feasible for moderate-sized problem instances and can lead to substantially more accurate tree size inferences than the standard practice of treating phasing and phylogeny construction as two separate analysis stages. The difference between the approaches is particularly important for downstream applications that require a lower-bound on the number of mutations that the genetic region has undergone.</p

    Computational Approaches To Anti-Toxin Therapies And Biomarker Identification

    Get PDF
    This work describes the fundamental study of two bacterial toxins with computational methods, the rational design of a potent inhibitor using molecular dynamics, as well as the development of two bioinformatic methods for mining genomic data. Clostridium difficile is an opportunistic bacillus which produces two large glucosylating toxins. These toxins, TcdA and TcdB cause severe intestinal damage. As Clostridium difficile harbors considerable antibiotic resistance, one treatment strategy is to prevent the tissue damage that the toxins cause. The catalytic glucosyltransferase domain of TcdA and TcdB was studied using molecular dynamics in the presence of both a protein-protein binding partner and several substrates. These experiments were combined with lead optimization techniques to create a potent irreversible inhibitor which protects 95% of cells in vitro. Dynamics studies on a TcdB cysteine protease domain were performed to an allosteric communication pathway. Comparative analysis of the static and dynamic properties of the TcdA and TcdB glucosyltransferase domains were carried out to determine the basis for the differential lethality of these toxins. Large scale biological data is readily available in the post-genomic era, but it can be difficult to effectively use that data. Two bioinformatics methods were developed to process whole-genome data. Software was developed to return all genes containing a motif in single genome. This provides a list of genes which may be within the same regulatory network or targeted by a specific DNA binding factor. A second bioinformatic method was created to link the data from genome-wide association studies (GWAS) to specific genes. GWAS studies are frequently subjected to statistical analysis, but mutations are rarely investigated structurally. HyDn-SNP-S allows a researcher to find mutations in a gene that correlate to a GWAS studied phenotype. Across human DNA polymerases, this resulted in strongly predictive haplotypes for breast and prostate cancer. Molecular dynamics applied to DNA Polymerase Lambda suggested a structural explanation for the decrease in polymerase fidelity with that mutant. When applied to Histone Deacetylases, mutations were found that alter substrate binding, and post-translational modification

    Linkage disequilibrium based genotype calling from low-coverage shotgun sequencing reads

    Get PDF
    Background Recent technology advances have enabled sequencing of individual genomes, promising to revolutionize biomedical research. However, deep sequencing remains more expensive than microarrays for performing whole-genome SNP genotyping. Results In this paper we introduce a new multi-locus statistical model and computationally efficient genotype calling algorithms that integrate shotgun sequencing data with linkage disequilibrium (LD) information extracted from reference population panels such as Hapmap or the 1000 genomes project. Experiments on publicly available 454, Illumina, and ABI SOLiD sequencing datasets suggest that integration of LD information results in genotype calling accuracy comparable to that of microarray platforms from sequencing data of low-coverage. A software package implementing our algorithm, released under the GNU General Public License, is available at http://dna.engr.uconn.edu/software/GeneSeq/. Conclusions Integration of LD information leads to significant improvements in genotype calling accuracy compared to prior LD-oblivious methods, rendering low-coverage sequencing as a viable alternative to microarrays for conducting large-scale genome-wide association studies

    Genome-wide inference of ancestral recombination graphs

    Get PDF
    The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the "ancestral recombination graph" (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of n chromosomes conditional on an ARG of n-1 chromosomes, an operation we call "threading." Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the true posterior distribution and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. Preliminary results also indicate that our methods can be used to gain insight into complex features of human population structure, even with a noninformative prior distribution.Comment: 88 pages, 7 main figures, 22 supplementary figures. This version contains a substantially expanded genomic data analysi

    Special features of RAD Sequencing data:implications for genotyping

    Get PDF
    Restriction site-associated DNA Sequencing (RAD-Seq) is an economical and efficient method for SNP discovery and genotyping. As with other sequencing-by-synthesis methods, RAD-Seq produces stochastic count data and requires sensitive analysis to develop or genotype markers accurately. We show that there are several sources of bias specific to RAD-Seq that are not explicitly addressed by current genotyping tools, namely restriction fragment bias, restriction site heterozygosity and PCR GC content bias. We explore the performance of existing analysis tools given these biases and discuss approaches to limiting or handling biases in RAD-Seq data. While these biases need to be taken seriously, we believe RAD loci affected by them can be excluded or processed with relative ease in most cases and that most RAD loci will be accurately genotyped by existing tools
    corecore