226 research outputs found

    Bayesian modeling of recombination events in bacterial populations

    Get PDF
    Background: We consider the discovery of recombinant segments jointly with their origins within multilocus DNA sequences from bacteria representing heterogeneous populations of fairly closely related species. The currently available methods for recombination detection capable of probabilistic characterization of uncertainty have a limited applicability in practice as the number of strains in a data set increases. Results: We introduce a Bayesian spatial structural model representing the continuum of origins over sites within the observed sequences, including a probabilistic characterization of uncertainty related to the origin of any particular site. To enable a statistically accurate and practically feasible approach to the analysis of large-scale data sets representing a single genus, we have developed a novel software tool (BRAT, Bayesian Recombination Tracker) implementing the model and the corresponding learning algorithm, which is capable of identifying the posterior optimal structure and to estimate the marginal posterior probabilities of putative origins over the sites. Conclusion: A multitude of challenging simulation scenarios and an analysis of real data from seven housekeeping genes of 120 strains of genus Burkholderia are used to illustrate the possibilities offered by our approach. The software is freely available for download at URL http://web.abo.fi/fak/ mnf//mate/jc/software/brat.html

    Evolutionary distances in the twilight zone -- a rational kernel approach

    Get PDF
    Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.Comment: to appear in PLoS ON

    A multi-stage genome-wide association study of bladder cancer identifies multiple susceptibility loci.

    Get PDF
    We conducted a multi-stage, genome-wide association study of bladder cancer with a primary scan of 591,637 SNPs in 3,532 affected individuals (cases) and 5,120 controls of European descent from five studies followed by a replication strategy, which included 8,382 cases and 48,275 controls from 16 studies. In a combined analysis, we identified three new regions associated with bladder cancer on chromosomes 22q13.1, 19q12 and 2q37.1: rs1014971, (P = 8 × 10⁻¹²) maps to a non-genic region of chromosome 22q13.1, rs8102137 (P = 2 × 10⁻¹¹) on 19q12 maps to CCNE1 and rs11892031 (P = 1 × 10⁻⁷) maps to the UGT1A cluster on 2q37.1. We confirmed four previously identified genome-wide associations on chromosomes 3q28, 4p16.3, 8q24.21 and 8q24.3, validated previous candidate associations for the GSTM1 deletion (P = 4 × 10⁻¹¹) and a tag SNP for NAT2 acetylation status (P = 4 × 10⁻¹¹), and found interactions with smoking in both regions. Our findings on common variants associated with bladder cancer risk should provide new insights into the mechanisms of carcinogenesis

    MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons

    Get PDF
    Until now the most efficient solution to align nucleotide sequences containing open reading frames was to use indirect procedures that align amino acid translation before reporting the inferred gap positions at the codon level. There are two important pitfalls with this approach. Firstly, any premature stop codon impedes using such a strategy. Secondly, each sequence is translated with the same reading frame from beginning to end, so that the presence of a single additional nucleotide leads to both aberrant translation and alignment

    Incorporating progesterone receptor expression into the PREDICT breast prognostic model

    Get PDF
    Background: Predict Breast (www.predict.nhs.uk) is an online prognostication and treatment benefit tool for early invasive breast cancer. The aim of this study was to incorporate the prognostic effect of progesterone receptor (PR) status into a new version of PREDICT and to compare its performance to the current version (2.2).Method: The prognostic effect of PR status was based on the analysis of data from 45,088 European patients with breast cancer from 49 studies in the Breast Cancer Association Consortium. Cox proportional hazard models were used to estimate the hazard ratio for PR status. Data from a New Zealand study of 11,365 patients with early invasive breast cancer were used for external validation. Model calibration and discrimination were used to test the model performance.Results: Having a PR-positive tumour was associated with a 23% and 28% lower risk of dying from breast cancer for women with oestrogen receptor (ER)-negative and ER-positive breast cancer, respectively. The area under the ROC curve increased with the addition of PR status from 0.807 to 0.809 for patients with ER-negative tumours (p = 0.023) and from 0.898 to 0. 902 for patients with ER-positive tumours (p = 2.3 x 10(-6)) in the New Zealand cohort. Model calibration was modest with 940 observed deaths compared to 1151 predicted.Conclusion: The inclusion of the prognostic effect of PR status to PREDICT Breast has led to an improvement of model performance and more accurate absolute treatment benefit predic-tions for individual patients. Further studies should determine whether the baseline hazard function requires recalibration. (C) 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).Peer reviewe

    A community-based geological reconstruction of Antarctic Ice Sheet deglaciation since the Last Glacial Maximum

    Get PDF
    A robust understanding of Antarctic Ice Sheet deglacial history since the Last Glacial Maximum is important in order to constrain ice sheet and glacial-isostatic adjustment models, and to explore the forcing mechanisms responsible for ice sheet retreat. Such understanding can be derived from a broad range of geological and glaciological datasets and recent decades have seen an upsurge in such data gathering around the continent and Sub-Antarctic islands. Here, we report a new synthesis of those datasets, based on an accompanying series of reviews of the geological data, organised by sector. We present a series of timeslice maps for 20ka, 15ka, 10ka and 5ka, including grounding line position and ice sheet thickness changes, along with a clear assessment of levels of confidence. The reconstruction shows that the Antarctic Ice sheet did not everywhere reach the continental shelf edge at its maximum, that initial retreat was asynchronous, and that the spatial pattern of deglaciation was highly variable, particularly on the inner shelf. The deglacial reconstruction is consistent with a moderate overall excess ice volume and with a relatively small Antarctic contribution to meltwater pulse 1a. We discuss key areas of uncertainty both around the continent and by time interval, and we highlight potential priorit. © 2014 The Authors

    Aggregation tests identify new gene associations with breast cancer in populations with diverse ancestry

    Get PDF
    Low-frequency variants play an important role in breast cancer (BC) susceptibility. Gene-based methods can increase power by combining multiple variants in the same gene and help identify target genes. We evaluated the potential of gene-based aggregation in the Breast Cancer Association Consortium cohorts including 83,471 cases and 59,199 controls. Low-frequency variants were aggregated for individual genes' coding and regulatory regions. Association results in European ancestry samples were compared to single-marker association results in the same cohort. Gene-based associations were also combined in meta-analysis across individuals with European, Asian, African, and Latin American and Hispanic ancestry. In European ancestry samples, 14 genes were significantly associated (q < 0.05) with BC. Of those, two genes, FMNL3 (P = 6.11 × 10 ) and AC058822.1 (P = 1.47 × 10 ), represent new associations. High FMNL3 expression has previously been linked to poor prognosis in several other cancers. Meta-analysis of samples with diverse ancestry discovered further associations including established candidate genes ESR1 and CBLB. Furthermore, literature review and database query found further support for a biologically plausible link with cancer for genes CBLB, FMNL3, FGFR2, LSP1, MAP3K1, and SRGAP2C. Using extended gene-based aggregation tests including coding and regulatory variation, we report identification of plausible target genes for previously identified single-marker associations with BC as well as the discovery of novel genes implicated in BC development. Including multi ancestral cohorts in this study enabled the identification of otherwise missed disease associations as ESR1 (P = 1.31 × 10 ), demonstrating the importance of diversifying study cohorts. [Abstract copyright: © 2023. The Author(s).
    corecore