2,503 research outputs found

    Direct maximum parsimony phylogeny reconstruction from genotype data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Maximum parsimony phylogenetic tree reconstruction from genetic variation data is a fundamental problem in computational genetics with many practical applications in population genetics, whole genome analysis, and the search for genetic predictors of disease. Efficient methods are available for reconstruction of maximum parsimony trees from haplotype data, but such data are difficult to determine directly for autosomal DNA. Data more commonly is available in the form of genotypes, which consist of conflated combinations of pairs of haplotypes from homologous chromosomes. Currently, there are no general algorithms for the direct reconstruction of maximum parsimony phylogenies from genotype data. Hence phylogenetic applications for autosomal data must therefore rely on other methods for first computationally inferring haplotypes from genotypes.</p> <p>Results</p> <p>In this work, we develop the first practical method for computing maximum parsimony phylogenies directly from genotype data. We show that the standard practice of first inferring haplotypes from genotypes and then reconstructing a phylogeny on the haplotypes often substantially overestimates phylogeny size. As an immediate application, our method can be used to determine the minimum number of mutations required to explain a given set of observed genotypes.</p> <p>Conclusion</p> <p>Phylogeny reconstruction directly from unphased data is computationally feasible for moderate-sized problem instances and can lead to substantially more accurate tree size inferences than the standard practice of treating phasing and phylogeny construction as two separate analysis stages. The difference between the approaches is particularly important for downstream applications that require a lower-bound on the number of mutations that the genetic region has undergone.</p

    TrAp: a Tree Approach for Fingerprinting Subclonal Tumor Composition

    Full text link
    Revealing the clonal composition of a single tumor is essential for identifying cell subpopulations with metastatic potential in primary tumors or with resistance to therapies in metastatic tumors. Sequencing technologies provide an overview of an aggregate of numerous cells, rather than subclonal-specific quantification of aberrations such as single nucleotide variants (SNVs). Computational approaches to de-mix a single collective signal from the mixed cell population of a tumor sample into its individual components are currently not available. Herein we propose a framework for deconvolving data from a single genome-wide experiment to infer the composition, abundance and evolutionary paths of the underlying cell subpopulations of a tumor. The method is based on the plausible biological assumption that tumor progression is an evolutionary process where each individual aberration event stems from a unique subclone and is present in all its descendants subclones. We have developed an efficient algorithm (TrAp) for solving this mixture problem. In silico analyses show that TrAp correctly deconvolves mixed subpopulations when the number of subpopulations and the measurement errors are moderate. We demonstrate the applicability of the method using tumor karyotypes and somatic hypermutation datasets. We applied TrAp to SNV frequency profile from Exome-Seq experiment of a renal cell carcinoma tumor sample and compared the mutational profile of the inferred subpopulations to the mutational profiles of twenty single cells of the same tumor. Despite the large experimental noise, specific co-occurring mutations found in clones inferred by TrAp are also present in some of these single cells. Finally, we deconvolve Exome-Seq data from three distinct metastases from different body compartments of one melanoma patient and exhibit the evolutionary relationships of their subpopulations

    Congruence of chloroplast- and nuclear-encoded DNA sequence variations used to assess species boundaries in the soil microalga Heterococcus (Stramenopiles, Xanthophyceae).

    Get PDF
    BackgroundHeterococcus is a microalgal genus of Xanthophyceae (Stramenopiles) that is common and widespread in soils, especially from cold regions. Species are characterized by extensively branched filaments produced when grown on agarized culture medium. Despite the large number of species described exclusively using light microscopic morphology, the assessment of species diversity is hampered by extensive morphological plasticity.ResultsTwo independent types of molecular data, the chloroplast-encoded psbA/rbcL spacer complemented by rbcL gene and the internal transcribed spacer 2 of the nuclear rDNA cistron (ITS2), congruently recovered a robust phylogenetic structure. With ITS2 considerable sequence and secondary structure divergence existed among the eight species, but a combined sequence and secondary structure phylogenetic analysis confined to helix II of ITS2 corroborated relationships as inferred from the rbcL gene phylogeny. Intra-genomic divergence of ITS2 sequences was revealed in many strains. The 'monophyletic species concept', appropriate for microalgae without known sexual reproduction, revealed eight different species. Species boundaries established using the molecular-based monophyletic species concept were more conservative than the traditional morphological species concept. Within a species, almost identical chloroplast marker sequences (genotypes) were repeatedly recovered from strains of different origins. At least two species had widespread geographical distributions; however, within a given species, genotypes recovered from Antarctic strains were distinct from those in temperate habitats. Furthermore, the sequence diversity may correspond to adaptation to different types of habitats or climates.ConclusionsWe established a method and a reference data base for the unambiguous identification of species of the common soil microalgal genus Heterococcus which uses DNA sequence variation in markers from plastid and nuclear genomes. The molecular data were more reliable and more conservative than morphological data

    Phylodynamics of Hepatitis C Virus Subtype 2c in the Province of Córdoba, Argentina

    Get PDF
    The Hepatitis C Virus Genotype 2 subtype 2c (HCV-2c) is detected as a low prevalence subtype in many countries, except in Southern Europe and Western Africa. The current epidemiology of HCV in Argentina, a low-prevalence country, shows the expected low prevalence for this subtype. However, this subtype is the most prevalent in the central province of Córdoba. Cruz del Eje (CdE), a small rural city of this province, shows a prevalence for HCV infections of 5%, being 90% of the samples classified as HCV-2c. In other locations of Córdoba Province (OLC) with lower prevalence for HCV, HCV-2c was recorded in about 50% of the samples. The phylogenetic analysis of samples from Córdoba Province consistently conformed a monophyletic group with HCV-2c sequences from all the countries where HCV-2c has been sequenced. The phylogeographic analysis showed an overall association between geographical traits and phylogeny, being these associations significant (α = 0.05) for Italy, France, Argentina (places other than Córdoba), Martinique, CdE and OLC. The coalescence analysis for samples from CdE, OLC and France yielded a Time for the Most Common Recent Ancestor of about 140 years, whereas its demographic reconstruction showed a “lag” phase in the viral population until 1880 and then an exponential growth until 1940. These results were also obtained when each geographical area was analyzed separately, suggesting that HCV-2c came into Córdoba province during the migration process, mainly from Europe, which is compatible with the history of Argentina of the early 20th century. This also suggests that the spread of HCV-2c occurred in Europe and South America almost simultaneously, possibly as a result of the advances in medicine technology of the first half of the 20th century

    The Evolution of the Major Hepatitis C Genotypes Correlates with Clinical Response to Interferon Therapy

    Get PDF
    Patients chronically infected with hepatitis C virus (HCV) require significantly different durations of therapy and achieve substantially different sustained virologic response rates to interferon-based therapies, depending on the HCV genotype with which they are infected. There currently exists no systematic framework that explains these genotype-specific response rates. Since humans are the only known natural hosts for HCV-a virus that is at least hundreds of years old-one possibility is that over the time frame of this relationship, HCV accumulated adaptive mutations that confer increasing resistance to the human immune system. Given that interferon therapy functions by triggering an immune response, we hypothesized that clinical response rates are a reflection of viral evolutionary adaptations to the immune system.We have performed the first phylogenetic analysis to include all available full-length HCV genomic sequences (n = 345). This resulted in a new cladogram of HCV. This tree establishes for the first time the relative evolutionary ages of the major HCV genotypes. The outcome data from prospective clinical trials that studied interferon and ribavirin therapy was then mapped onto this new tree. This mapping revealed a correlation between genotype-specific responses to therapy and respective genotype age. This correlation allows us to predict that genotypes 5 and 6, for which there currently are no published prospective trials, will likely have intermediate response rates, similar to genotype 3. Ancestral protein sequence reconstruction was also performed, which identified the HCV proteins E2 and NS5A as potential determinants of genotype-specific clinical outcome. Biochemical studies have independently identified these same two proteins as having genotype-specific abilities to inhibit the innate immune factor double-stranded RNA-dependent protein kinase (PKR).An evolutionary analysis of all available HCV genomes supports the hypothesis that immune selection was a significant driving force in the divergence of the major HCV genotypes and that viral factors that acquired the ability to inhibit the immune response may play a role in determining genotype-specific response rates to interferon therapy

    An efficient parallel algorithm for haplotype inference based on rule based approach and consensus methods.

    Get PDF

    New avian paramyxoviruses type I strains identified in Africa provide new outcomes for phylogeny reconstruction and genotype classification

    Get PDF
    Newcastle disease (ND) is one of the most lethal diseases of poultry worldwide. It is caused by an avian paramyxovirus 1 that has high genomic diversity. In the framework of an international surveillance program launched in 2007, several thousand samples from domestic and wild birds in Africa were collected and analyzed. ND viruses (NDV) were detected and isolated in apparently healthy fowls and wild birds. However, two thirds of the isolates collected in this study were classified as virulent strains of NDV based on the molecular analysis of the fusion protein and experimental in vivo challenges with two representative isolates. Phylogenetic analysis based on the F and HN genes showed that isolates recovered from poultry in Mali and Ethiopia form new groups, herein proposed as genotypes XIV and sub-genotype VIf with reference to the new nomenclature described by Diel's group. In Madagascar, the circulation of NDV strains of genotype XI, originally reported elsewhere, is also confirmed. Full genome sequencing of five African isolates was generated and an extensive phylogeny reconstruction was carried out based on the nucleotide sequences. The evolutionary distances between groups and the specific amino acid signatures of each cluster allowed us to refine the genotype nomenclature. (Résumé d'auteur
    corecore