15 research outputs found

    Maximum likelihood models and algorithms for gene tree evolution with duplications and losses

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The abundance of new genomic data provides the opportunity to map the location of gene duplication and loss events on a species phylogeny. The first methods for mapping gene duplications and losses were based on a parsimony criterion, finding the mapping that minimizes the number of duplication and loss events. Probabilistic modeling of gene duplication and loss is relatively new and has largely focused on birth-death processes.</p> <p>Results</p> <p>We introduce a new maximum likelihood model that estimates the speciation and gene duplication and loss events in a gene tree within a species tree with branch lengths. We also provide an, in practice, efficient algorithm that computes optimal evolutionary scenarios for this model. We implemented the algorithm in the program DrML and verified its performance with empirical and simulated data.</p> <p>Conclusions</p> <p>In test data sets, DrML finds optimal gene duplication and loss scenarios within minutes, even when the gene trees contain sequences from several hundred species. In many cases, these optimal scenarios differ from the lca-mapping that results from a parsimony gene tree reconciliation. Thus, DrML provides a new, practical statistical framework on which to study gene duplication.</p

    Evolution through segmental duplications and losses : A Super-Reconciliation approach

    Get PDF
    The classical gene and species tree reconciliation, used to infer the history of gene gain and loss explaining the evolution of gene families, assumes an independent evolution for each family. While this assumption is reasonable for genes that are far apart in the genome, it is not appropriate for genes grouped into syntenic blocks, which are more plausibly the result of a concerted evolution. Here, we introduce the Super-Reconciliation problem which consists in inferring a history of segmental duplication and loss events (involving a set of neighboring genes) leading to a set of present-day syntenies from a single ancestral one. In other words, we extend the traditional Duplication-Loss reconciliation problem of a single gene tree, to a set of trees, accounting for segmental duplications and losses. Existency of a Super-Reconciliation depends on individual gene tree consistency. In addition, ignoring rearrangements implies that existency also depends on gene order consistency. We first show that the problem of reconstructing a most parsimonious Super-Reconciliation, if any, is NP-hard and give an exact exponential-time algorithm to solve it. Alternatively, we show that accounting for rearrangements in the evolutionary model, but still only minimizing segmental duplication and loss events, leads to an exact polynomial-time algorithm. We finally assess time efficiency of the former exponential time algorithm for the Duplication-Loss model on simulated datasets, and give a proof of concept on the opioid receptor genes

    Evaluating Ortholog Prediction Algorithms in a Yeast Model Clade

    Get PDF
    RSD, respectively, so that they can predict orthologs across multiple taxa) against a set of 2,723 groups of high-quality curated orthologs from 6 Saccharomycete yeasts in the Yeast Gene Order Browser. of all algorithms dramatically increased in these traps.) for evolutionary and functional genomics studies where the objective is the accurate inference of single-copy orthologs (e.g., molecular phylogenetics), but that all algorithms fail to accurately predict orthologs when paralogy is rampant

    progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement

    Get PDF
    Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms.We describe a new method to align two or more genomes that have undergone rearrangements due to recombination and substantial amounts of segmental gain and loss (flux). We demonstrate that the new method can accurately align regions conserved in some, but not all, of the genomes, an important case not handled by our previous work. The method uses a novel alignment objective score called a sum-of-pairs breakpoint score, which facilitates accurate detection of rearrangement breakpoints when genomes have unequal gene content. We also apply a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences, which are commonly observed in other genome alignment methods. We describe new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The new genome alignment algorithm demonstrates high accuracy in situations where genomes have undergone biologically feasible amounts of genome rearrangement, segmental gain and loss. We apply the new algorithm to a set of 23 genomes from the genera Escherichia, Shigella, and Salmonella. Analysis of whole-genome multiple alignments allows us to extend the previously defined concepts of core- and pan-genomes to include not only annotated genes, but also non-coding regions with potential regulatory roles. The 23 enterobacteria have an estimated core-genome of 2.46Mbp conserved among all taxa and a pan-genome of 15.2Mbp. We document substantial population-level variability among these organisms driven by segmental gain and loss. Interestingly, much variability lies in intergenic regions, suggesting that the Enterobacteriacae may exhibit regulatory divergence.The multiple genome alignments generated by our software provide a platform for comparative genomic and population genomic studies. Free, open-source software implementing the described genome alignment approach is available from http://gel.ahabs.wisc.edu/mauve

    Simultaneous Bayesian gene tree reconstruction and reconciliation analysis

    No full text
    We present GSR, a probabilistic model integrating gene duplication, sequence evolution, and a relaxed molecular clock for substitution rates, that enables genomewide analysis of gene families. The gene duplication and loss process is a major cause for incongruence between gene and species tree, and deterministic methods have been developed to explain such differences through tree reconciliations. Although probabilistic methods for phylogenetic inference have been around for decades, probabilistic reconciliation methods are far less established. Based on our model, we have implemented a Bayesian analysis tool, PrIME-GSR, for gene tree inference that takes a known species tree into account. Our implementation is sound and we demonstrate its utility for genomewide gene-family analysis by applying it to recently presented yeast data. We validate PrIME-GSR by comparing with previous analyses of these data that take advantage of gene order information. In a case study we apply our method to the ADH gene family and are able to draw biologically relevant conclusions concerning gene duplications creating key yeast phenotypes. On a higher level this shows the biological relevance of our method. The obtained results demonstrate the value of a relaxed molecular clock. Our good performance will extend to species where gene order conservation is insufficient

    What constitutes best supportive care in the treatment of advanced non-small cell lung cancer patients?-Results from the lung cancer economics and outcomes research (LUCEOR) study

    No full text
    WOS: 000325833600020PubMed ID: 23910909Background: A significant proportion of advanced non-small cell lung cancer (NSCLC) patients receive supportive treatments to manage disease-related symptoms either separately or combined with systemic anti-cancer therapy (SACT). This supportive treatment is commonly referred to as best supportive care (BSC). Definition of BSC in clinical trials and its description in published comparative and real-life NSCLC studies is limited. The lack of a consensus BSC definition makes detailed evaluations of clinical trials and comparisons between clinical trials problematic. Methods: Data were collected as part of the lung cancer economics and outcomes research (LUCEOR) study. Information on treatment and treatment outcomes from deceased stage IIIb/IV NSCLC patients across ten countries was retrospectively collected from medical records. BSC was defined as the best care available as judged by the attending physicians. Results: A total of 1327 patients' data were analyzed. Of those, 774/1327 (58%), 316/631 (50%), 123/259 (47%), 25/56(45%) and 15/26(58%) were administered treatment defined as BSC with first, second, third, fourth and fifth-line SACT respectively. In total, 346/678(51%), 149/335 (45%), 86/176(49%), 11/28 (39%) and 13/25 (52%) of patients were administered treatment defined as BSC in the end-of-life setting after finishing first, second, third, fourth and fifth-line SACT respectively. BSC therapies could be grouped into 24 different categories. The most common elements did not vary substantially whether given with SACT (irrespective of treatment line), in the end-of-life setting, or between countries. The commonest categories of BSC were narcotic and non-narcotic analgesics, corticosteroids and gastrointestinal medication. Conclusion: There were no major differences in what constituted BSC. BSC included in all instances narcotic and non-narcotic analgesics, corticosteroids and gastrointestinal medication. To our knowledge this is the first study attempting to describe BSC in routine clinical practice. This study's results could help define a practical, up to date, evidence-based definition of BSC. (C) 2013 Elsevier Ireland Ltd. All rights reserved.Boehringer Ingelheim Pharma GmbHBoehringer IngelheimJuliane Lungershausen is employed at Boehringer Ingelheim Pharma GmbH and Henrik Finnern at Boehringer Ingelheim Pharmaceuticals Inc. Orjan Akerborg are, and Anna De Geer were at the time of study conduct, employed at OptumInsight, a research organization acting as consultants to the pharmaceutical industry. The study was funded by an unrestricted research grant from Boehringer Ingelheim Pharma GmbH
    corecore