Search CORE

111 research outputs found

Direct maximum parsimony phylogeny reconstruction from genotype data

Author: Fumei Lam
Guy E Blelloch
R Ravi
Russell Schwartz
Sridhar Srinath
Srinath aiidhar
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Maximum parsimony phylogenetic tree reconstruction from genetic variation data is a fundamental problem in computational genetics with many practical applications in population genetics, whole genome analysis, and the search for genetic predictors of disease. Efficient methods are available for reconstruction of maximum parsimony trees from haplotype data, but such data are difficult to determine directly for autosomal DNA. Data more commonly is available in the form of genotypes, which consist of conflated combinations of pairs of haplotypes from homologous chromosomes. Currently, there are no general algorithms for the direct reconstruction of maximum parsimony phylogenies from genotype data. Hence phylogenetic applications for autosomal data must therefore rely on other methods for first computationally inferring haplotypes from genotypes. Results In this work, we develop the first practical method for computing maximum parsimony phylogenies directly from genotype data. We show that the standard practice of first inferring haplotypes from genotypes and then reconstructing a phylogeny on the haplotypes often substantially overestimates phylogeny size. As an immediate application, our method can be used to determine the minimum number of mutations required to explain a given set of observed genotypes. Conclusion Phylogeny reconstruction directly from unphased data is computationally feasible for moderate-sized problem instances and can lead to substantially more accurate tree size inferences than the standard practice of treating phasing and phylogeny construction as two separate analysis stages. The difference between the approaches is particularly important for downstream applications that require a lower-bound on the number of mutations that the genetic region has undergone.</p

Crossref

Springer

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Algorithms for Analysis of Heterogeneous Cancer and Viral Populations Using High-Throughput Sequencing Data

Author: Tsyvina Viachaslau
Publication venue: ScholarWorks @ Georgia State University
Publication date: 13/12/2021
Field of study

Next-generation sequencing (NGS) technologies experienced giant leaps in recent years. Short read samples reach millions of reads, and the number of samples has been growing enormously in the wake of the COVID-19 pandemic. This data can expose essential aspects of disease transmission and development and reveal the key to its treatment. At the same time, single-cell sequencing saw the progress of getting from dozens to tens of thousands of cells per sample. These technological advances bring new challenges for computational biology and require the development of scalable, robust methods to deal with a wide range of problems varying from epidemiology to cancer studies. The first part of this work is focused on processing virus NGS data. It proposes algorithms that can facilitate the initial data analysis steps by filtering genetically related sequencing and the tool investigating intra-host virus diversity vital for biomedical research and epidemiology. The second part addresses single-cell data in cancer studies. It develops evolutionary cancer models involving new quantitative parameters of cancer subclones to understand the underlying processes of cancer development better

ScholarWorks @ Georgia State University

A human genome-wide library of local phylogeny predictions for whole-genome inference problems

Author: Schwartz Russell
Sridhar Srinath
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Many common inference problems in computational genetics depend on inferring aspects of the evolutionary history of a data set given a set of observed modern sequences. Detailed predictions of the full phylogenies are therefore of value in improving our ability to make further inferences about population history and sources of genetic variation. Making phylogenetic predictions on the scale needed for whole-genome analysis is, however, extremely computationally demanding. Results In order to facilitate phylogeny-based predictions on a genomic scale, we develop a library of maximum parsimony phylogenies within local regions spanning all autosomal human chromosomes based on Haplotype Map variation data. We demonstrate the utility of this library for population genetic inferences by examining a tree statistic we call 'imperfection,' which measures the reuse of variant sites within a phylogeny. This statistic is significantly predictive of recombination rate, shows additional regional and population-specific conservation, and allows us to identify outlier genes likely to have experienced unusual amounts of variation in recent human history. Conclusion Recent theoretical advances in algorithms for phylogenetic tree reconstruction have made it possible to perform large-scale inferences of local maximum parsimony phylogenies from single nucleotide polymorphism (SNP) data. As results from the imperfection statistic demonstrate, phylogeny predictions encode substantial information useful for detecting genomic features and population history. This data set should serve as a platform for many kinds of inferences one may wish to make about human population history and genetic variation.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

An efficient parallel algorithm for haplotype inference based on rule based approach and consensus methods.

Author: Saeed Qamar
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2007
Field of study

Scholarship at UWindsor

Phylogenetic origin of primary and secondary metabolic pathway genes revealed by C. maxima and C. reticulata diagnostic SNPs

Author: Barbosa de Paula Márcia Fabiana
Da Silva Gesteira Abelmon
De Andrade Silva Edson Mario
Do Amaral Santos Milena
Garcia Dominique
Luro François
Micheli Fabienne
Ollitrault Frédérique
Ollitrault Patrick
Rivallan Ronan
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2019
Field of study

Modern cultivated Citrus species and varieties result from interspecific hybridization between four ancestral taxa. Among them, Citrus maxima and Citrus reticulata, closely associated with the pummelo and mandarin horticultural groups, respectively, were particularly important as the progenitors of sour and sweet oranges (Citrus aurantium and Citrus sinensis), grapefruits (Citrus paradisi), and hybrid types resulting from modern breeding programs (tangors, tangelos, and orangelos). The differentiation between the four ancestral taxa and the phylogenomic structure of modern varieties widely drive the phenotypic diversity's organization. In particular, strong phenotypic differences exist in the coloration and sweetness and represent important criteria for breeders. In this context, focusing on the genes of the sugar, carotenoid, and chlorophyll biosynthesis pathways, the aim of this work was to develop a set of diagnostic single-nucleotide polymorphism (SNP) markers to distinguish the ancestral haplotypes of C. maxima and C. reticulata and to provide information at the intraspecific diversity level (within C. reticulata or C. maxima). In silico analysis allowed the identification of 3,347 SNPs from selected genes. Among them, 1,024 were detected as potential differentiation markers between C. reticulata and C. maxima. A total of 115 SNPs were successfully developed using a competitive PCR technology. Their transferability among all Citrus species and the true citrus genera was very good, with only 0.87% of missing data. The ancestral alleles of the SNPs were identified, and we validated the usefulness of the developed markers for tracing the ancestral haplotype in large germplasm collections and sexually recombined progeny issued from the C. reticulata/C. maxima admixture gene pool. These markers will pave the way for targeted association studies based on ancestral haplotypes

Agritrop

ProdInra

HAL-CIRAD