46 research outputs found

    Bayesian and maximum likelihood phylogenetic analyses of protein sequence data under relative branch-length differences and model violation

    Get PDF
    BACKGROUND: Bayesian phylogenetic inference holds promise as an alternative to maximum likelihood, particularly for large molecular-sequence data sets. We have investigated the performance of Bayesian inference with empirical and simulated protein-sequence data under conditions of relative branch-length differences and model violation. RESULTS: With empirical protein-sequence data, Bayesian posterior probabilities provide more-generous estimates of subtree reliability than does the nonparametric bootstrap combined with maximum likelihood inference, reaching 100% posterior probability at bootstrap proportions around 80%. With simulated 7-taxon protein-sequence datasets, Bayesian posterior probabilities are somewhat more generous than bootstrap proportions, but do not saturate. Compared with likelihood, Bayesian phylogenetic inference can be as or more robust to relative branch-length differences for datasets of this size, particularly when among-sites rate variation is modeled using a gamma distribution. When the (known) correct model was used to infer trees, Bayesian inference recovered the (known) correct tree in 100% of instances in which one or two branches were up to 20-fold longer than the others. At ratios more extreme than 20-fold, topological accuracy of reconstruction degraded only slowly when only one branch was of relatively greater length, but more rapidly when there were two such branches. Under an incorrect model of sequence change, inaccurate trees were sometimes observed at less extreme branch-length ratios, and (particularly for trees with single long branches) such trees tended to be more inaccurate. The effect of model violation on accuracy of reconstruction for trees with two long branches was more variable, but gamma-corrected Bayesian inference nonetheless yielded more-accurate trees than did either maximum likelihood or uncorrected Bayesian inference across the range of conditions we examined. Assuming an exponential Bayesian prior on branch lengths did not improve, and under certain extreme conditions significantly diminished, performance. The two topology-comparison metrics we employed, edit distance and Robinson-Foulds symmetric distance, yielded different but highly complementary measures of performance. CONCLUSIONS: Our results demonstrate that Bayesian inference can be relatively robust against biologically reasonable levels of relative branch-length differences and model violation, and thus may provide a promising alternative to maximum likelihood for inference of phylogenetic trees from protein-sequence data

    Identification and single-base gene-editing functional validation of a cis-EPO variant as a genetic predictor for EPO-increasing therapies

    Get PDF
    Hypoxia-inducible factor prolyl hydroxylase inhibitors (HIF-PHIs) are currently under clinical development for treating anemia in chronic kidney disease (CKD), but it is important to monitor their cardiovascular safety. Genetic variants can be used as predictors to help inform the potential risk of adverse effects associated with drug treatments. We therefore aimed to use human genetics to help assess the risk of adverse cardiovascular events associated with therapeutically altered EPO levels to help inform clinical trials studying the safety of HIF-PHIs. By performing a genome-wide association meta-analysis of EPO (n = 6,127), we identified a cis-EPO variant (rs1617640) lying in the EPO promoter region. We validated this variant as most likely causal in controlling EPO levels by using genetic and functional approaches, including single-base gene editing. Using this variant as a partial predictor for therapeutic modulation of EPO and large genome-wide association data in Mendelian randomization tests, we found no evidence (at p < 0.05) that genetically predicted long-term rises in endogenous EPO, equivalent to a 2.2-unit increase, increased risk of coronary artery disease (CAD, OR [95% CI] = 1.01 [0.93, 1.07]), myocardial infarction (MI, OR [95% CI] = 0.99 [0.87, 1.15]), or stroke (OR [95% CI] = 0.97 [0.87, 1.07]). We could exclude increased odds of 1.15 for cardiovascular disease for a 2.2-unit EPO increase. A combination of genetic and functional studies provides a powerful approach to investigate the potential therapeutic profile of EPO-increasing therapies for treating anemia in CKD

    Clinical Characteristics and Predictors of Outcomes of Hospitalized Patients With Coronavirus Disease 2019 in a Multiethnic London National Health Service Trust: A Retrospective Cohort Study.

    Get PDF
    BACKGROUND: Emerging evidence suggests ethnic minorities are disproportionately affected by coronavirus disease 2019 (COVID-19). Detailed clinical analyses of multicultural hospitalized patient cohorts remain largely undescribed. METHODS: We performed regression, survival, and cumulative competing risk analyses to evaluate factors associated with mortality in patients admitted for COVID-19 in 3 large London hospitals between 25 February and 5 April, censored as of 1 May 2020. RESULTS: Of 614 patients (median age, 69 [interquartile range, 25] years) and 62% male), 381 (62%) were discharged alive, 178 (29%) died, and 55 (9%) remained hospitalized at censoring. Severe hypoxemia (adjusted odds ratio [aOR], 4.25 [95% confidence interval {CI}, 2.36-7.64]), leukocytosis (aOR, 2.35 [95% CI, 1.35-4.11]), thrombocytopenia (aOR [1.01, 95% CI, 1.00-1.01], increase per 109 decrease), severe renal impairment (aOR, 5.14 [95% CI, 2.65-9.97]), and low albumin (aOR, 1.06 [95% CI, 1.02-1.09], increase per gram decrease) were associated with death. Forty percent (n = 244) were from black, Asian, and other minority ethnic (BAME) groups, 38% (n = 235) were white, and ethnicity was unknown for 22% (n = 135). BAME patients were younger and had fewer comorbidities. Although the unadjusted odds of death did not differ by ethnicity, when adjusting for age, sex, and comorbidities, black patients were at higher odds of death compared to whites (aOR, 1.69 [95% CI, 1.00-2.86]). This association was stronger when further adjusting for admission severity (aOR, 1.85 [95% CI, 1.06-3.24]). CONCLUSIONS: BAME patients were overrepresented in our cohort; when accounting for demographic and clinical profile of admission, black patients were at increased odds of death. Further research is needed into biologic drivers of differences in COVID-19 outcomes by ethnicity

    A hybrid clustering approach to recognition of protein families in 114 microbial genomes

    Get PDF
    Background: Grouping proteins into sequence-based clusters is a fundamental step in many bioinformatic analyses (e.g., homology-based prediction of structure or function). Standard clustering methods such as single-linkage clustering capture a history of cluster topologies as a function of threshold, but in practice their usefulness is limited because unrelated sequences join clusters before biologically meaningful families are fully constituted, e.g. as the result of matches to so-called promiscuous domains. Use of the Markov Cluster algorithm avoids this non-specificity, but does not preserve topological or threshold information about protein families

    Do different surrogate methods detect lateral genetic transfer events of different relative ages?

    No full text
    Non-tree-based ('surrogate') methods have been used to identify instances of lateral genetic transfer in microbial genomes but agreement among predictions of different methods can be poor. It has been proposed that this disagreement arises because different surrogate methods are biased towards the detection of certain types of transfer events. This conjecture is supported by a rigorous phylogenetic analysis of 3776 proteins in Escherichia coli K12 MG1655 to map the ages of transfer events relative to one another

    Ancient origin of the divergent forms of leucyl-tRNA synthetases in the Halobacteriales

    Get PDF
    Abstract Background Horizontal gene transfer (HGT) has greatly impacted the genealogical history of many lineages, particularly for prokaryotes, with genes frequently moving in and out of a line of descent. Many genes that were acquired by a lineage in the past likely originated from ancestral relatives that have since gone extinct. During the course of evolution, HGT has played an essential role in the origin and dissemination of genetic and metabolic novelty. Results Three divergent forms of leucyl-tRNA synthetase (LeuRS) exist in the archaeal order Halobacteriales, commonly known as haloarchaea. Few haloarchaeal genomes have the typical archaeal form of this enzyme and phylogenetic analysis indicates it clusters within the Euryarchaeota as expected. The majority of sequenced halobacterial genomes possess a bacterial form of LeuRS. Phylogenetic reconstruction puts this larger group of haloarchaea at the base of the bacterial domain. The most parsimonious explanation is that an ancient transfer of LeuRS took place from an organism related to the ancestor of the bacterial domain to the haloarchaea. The bacterial form of LeuRS further underwent gene duplications and/or gene transfers within the haloarchaea, with some genomes possessing two distinct types of bacterial LeuRS. The cognate tRNALeu also reveals two distinct clusters for the haloarchaea; however, these tRNALeu clusters do not coincide with the groupings found in the LeuRS tree, revealing that LeuRS evolved independently of its cognate tRNA. Conclusions The study of leucyl-tRNA synthetase in haloarchaea illustrates the importance of gene transfer originating in lineages that went extinct since the transfer occurred. The haloarchaeal LeuRS and tRNALeu did not co-evolve.</p

    Numbers of maximally representative clusters (≥ 4) as a function of number of bacterial "phyla" (second-order NCBI classications, Aquificales, Bacteriodetes, etc

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "A hybrid clustering approach to recognition of protein families in 114 microbial genomes"</p><p>BMC Bioinformatics 2004;5():45-45.</p><p>Published online 29 Apr 2004</p><p>PMCID:PMC420232.</p><p>Copyright © 2004 Harlow et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.</p>) represented in each

    (a) Number of clusters of ≥ 4 members each produced by hybrid (Markov followed by single-linkage) clustering of proteins in 114 microbial genomes, as a function of S'threshold

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "A hybrid clustering approach to recognition of protein families in 114 microbial genomes"</p><p>BMC Bioinformatics 2004;5():45-45.</p><p>Published online 29 Apr 2004</p><p>PMCID:PMC420232.</p><p>Copyright © 2004 Harlow et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.</p> Compare the value at the right-most point on the distribution (S'0.01) with that in Figure to see the effect of the prior Markov clustering step; (b) number of proteins in hybrid clusters (≥ 4), as a function of threshold; (c) number of proteins in the largest hybrid cluster, as a function of threshold. Note that the vertical axis is scaled differently than in Figure
    corecore