111 research outputs found

    Direct maximum parsimony phylogeny reconstruction from genotype data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Maximum parsimony phylogenetic tree reconstruction from genetic variation data is a fundamental problem in computational genetics with many practical applications in population genetics, whole genome analysis, and the search for genetic predictors of disease. Efficient methods are available for reconstruction of maximum parsimony trees from haplotype data, but such data are difficult to determine directly for autosomal DNA. Data more commonly is available in the form of genotypes, which consist of conflated combinations of pairs of haplotypes from homologous chromosomes. Currently, there are no general algorithms for the direct reconstruction of maximum parsimony phylogenies from genotype data. Hence phylogenetic applications for autosomal data must therefore rely on other methods for first computationally inferring haplotypes from genotypes.</p> <p>Results</p> <p>In this work, we develop the first practical method for computing maximum parsimony phylogenies directly from genotype data. We show that the standard practice of first inferring haplotypes from genotypes and then reconstructing a phylogeny on the haplotypes often substantially overestimates phylogeny size. As an immediate application, our method can be used to determine the minimum number of mutations required to explain a given set of observed genotypes.</p> <p>Conclusion</p> <p>Phylogeny reconstruction directly from unphased data is computationally feasible for moderate-sized problem instances and can lead to substantially more accurate tree size inferences than the standard practice of treating phasing and phylogeny construction as two separate analysis stages. The difference between the approaches is particularly important for downstream applications that require a lower-bound on the number of mutations that the genetic region has undergone.</p

    Algorithms for Analysis of Heterogeneous Cancer and Viral Populations Using High-Throughput Sequencing Data

    Get PDF
    Next-generation sequencing (NGS) technologies experienced giant leaps in recent years. Short read samples reach millions of reads, and the number of samples has been growing enormously in the wake of the COVID-19 pandemic. This data can expose essential aspects of disease transmission and development and reveal the key to its treatment. At the same time, single-cell sequencing saw the progress of getting from dozens to tens of thousands of cells per sample. These technological advances bring new challenges for computational biology and require the development of scalable, robust methods to deal with a wide range of problems varying from epidemiology to cancer studies. The first part of this work is focused on processing virus NGS data. It proposes algorithms that can facilitate the initial data analysis steps by filtering genetically related sequencing and the tool investigating intra-host virus diversity vital for biomedical research and epidemiology. The second part addresses single-cell data in cancer studies. It develops evolutionary cancer models involving new quantitative parameters of cancer subclones to understand the underlying processes of cancer development better

    A human genome-wide library of local phylogeny predictions for whole-genome inference problems

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many common inference problems in computational genetics depend on inferring aspects of the evolutionary history of a data set given a set of observed modern sequences. Detailed predictions of the full phylogenies are therefore of value in improving our ability to make further inferences about population history and sources of genetic variation. Making phylogenetic predictions on the scale needed for whole-genome analysis is, however, extremely computationally demanding.</p> <p>Results</p> <p>In order to facilitate phylogeny-based predictions on a genomic scale, we develop a library of maximum parsimony phylogenies within local regions spanning all autosomal human chromosomes based on Haplotype Map variation data. We demonstrate the utility of this library for population genetic inferences by examining a tree statistic we call 'imperfection,' which measures the reuse of variant sites within a phylogeny. This statistic is significantly predictive of recombination rate, shows additional regional and population-specific conservation, and allows us to identify outlier genes likely to have experienced unusual amounts of variation in recent human history.</p> <p>Conclusion</p> <p>Recent theoretical advances in algorithms for phylogenetic tree reconstruction have made it possible to perform large-scale inferences of local maximum parsimony phylogenies from single nucleotide polymorphism (SNP) data. As results from the imperfection statistic demonstrate, phylogeny predictions encode substantial information useful for detecting genomic features and population history. This data set should serve as a platform for many kinds of inferences one may wish to make about human population history and genetic variation.</p

    An efficient parallel algorithm for haplotype inference based on rule based approach and consensus methods.

    Get PDF

    Phylogenetic origin of primary and secondary metabolic pathway genes revealed by C. maxima and C. reticulata diagnostic SNPs

    Get PDF
    Modern cultivated Citrus species and varieties result from interspecific hybridization between four ancestral taxa. Among them, Citrus maxima and Citrus reticulata, closely associated with the pummelo and mandarin horticultural groups, respectively, were particularly important as the progenitors of sour and sweet oranges (Citrus aurantium and Citrus sinensis), grapefruits (Citrus paradisi), and hybrid types resulting from modern breeding programs (tangors, tangelos, and orangelos). The differentiation between the four ancestral taxa and the phylogenomic structure of modern varieties widely drive the phenotypic diversity's organization. In particular, strong phenotypic differences exist in the coloration and sweetness and represent important criteria for breeders. In this context, focusing on the genes of the sugar, carotenoid, and chlorophyll biosynthesis pathways, the aim of this work was to develop a set of diagnostic single-nucleotide polymorphism (SNP) markers to distinguish the ancestral haplotypes of C. maxima and C. reticulata and to provide information at the intraspecific diversity level (within C. reticulata or C. maxima). In silico analysis allowed the identification of 3,347 SNPs from selected genes. Among them, 1,024 were detected as potential differentiation markers between C. reticulata and C. maxima. A total of 115 SNPs were successfully developed using a competitive PCR technology. Their transferability among all Citrus species and the true citrus genera was very good, with only 0.87% of missing data. The ancestral alleles of the SNPs were identified, and we validated the usefulness of the developed markers for tracing the ancestral haplotype in large germplasm collections and sexually recombined progeny issued from the C. reticulata/C. maxima admixture gene pool. These markers will pave the way for targeted association studies based on ancestral haplotypes
    • …
    corecore