13 research outputs found

    Independent large scale duplications in multiple M. tuberculosis lineages overlapping the same genomic region

    Get PDF
    Mycobacterium tuberculosis, the causative agent of most human tuberculosis, infects one third of the world's population and kills an estimated 1.7 million people a year. With the world-wide emergence of drug resistance, and the finding of more functional genetic diversity than previously expected, there is a renewed interest in understanding the forces driving genome evolution of this important pathogen. Genetic diversity in M. tuberculosis is dominated by single nucleotide polymorphisms and small scale gene deletion, with little or no evidence for large scale genome rearrangements seen in other bacteria. Recently, a single report described a large scale genome duplication that was suggested to be specific to the Beijing lineage. We report here multiple independent large-scale duplications of the same genomic region of M. tuberculosis detected through whole-genome sequencing. The duplications occur in strains belonging to both M. tuberculosis lineage 2 and 4, and are thus not limited to Beijing strains. The duplications occur in both drug-resistant and drug susceptible strains. The duplicated regions also have substantially different boundaries in different strains, indicating different originating duplication events. We further identify a smaller segmental duplication of a different genomic region of a lab strain of H37Rv. The presence of multiple independent duplications of the same genomic region suggests either instability in this region, a selective advantage conferred by the duplication, or both. The identified duplications suggest that large-scale gene duplication may be more common in M. tuberculosis than previously considere

    WeatherBench 2: A benchmark for the next generation of data-driven global weather models

    Full text link
    WeatherBench 2 is an update to the global, medium-range (1-14 day) weather forecasting benchmark proposed by Rasp et al. (2020), designed with the aim to accelerate progress in data-driven weather modeling. WeatherBench 2 consists of an open-source evaluation framework, publicly available training, ground truth and baseline data as well as a continuously updated website with the latest metrics and state-of-the-art models: https://sites.research.google/weatherbench. This paper describes the design principles of the evaluation framework and presents results for current state-of-the-art physical and data-driven weather models. The metrics are based on established practices for evaluating weather forecasts at leading operational weather centers. We define a set of headline scores to provide an overview of model performance. In addition, we also discuss caveats in the current evaluation setup and challenges for the future of data-driven weather forecasting

    Comparative analysis of mycobacterium and related actinomycetes yields insight into the evolution of mycobacterium tuberculosis pathogenesis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The sequence of the pathogen <it>Mycobacterium tuberculosis </it>(<it>Mtb</it>) strain <it>H37Rv </it>has been available for over a decade, but the biology of the pathogen remains poorly understood. Genome sequences from other <it>Mtb </it>strains and closely related bacteria present an opportunity to apply the power of comparative genomics to understand the evolution of <it>Mtb </it>pathogenesis. We conducted a comparative analysis using 31 genomes from the Tuberculosis Database (TBDB.org), including 8 strains of <it>Mtb </it>and <it>M. bovis</it>, 11 additional Mycobacteria, 4 Corynebacteria, 2 Streptomyces, <it>Rhodococcus jostii RHA1, Nocardia farcinia, Acidothermus cellulolyticus, Rhodobacter sphaeroides, Propionibacterium acnes</it>, and <it>Bifidobacterium longum</it>.</p> <p>Results</p> <p>Our results highlight the functional importance of lipid metabolism and its regulation, and reveal variation between the evolutionary profiles of genes implicated in saturated and unsaturated fatty acid metabolism. It also suggests that DNA repair and molybdopterin cofactors are important in pathogenic Mycobacteria. By analyzing sequence conservation and gene expression data, we identify nearly 400 conserved noncoding regions. These include 37 predicted promoter regulatory motifs, of which 14 correspond to previously validated motifs, as well as 50 potential noncoding RNAs, of which we experimentally confirm the expression of four.</p> <p>Conclusions</p> <p>Our analysis of protein evolution highlights gene families that are associated with the adaptation of environmental Mycobacteria to obligate pathogenesis. These families include fatty acid metabolism, DNA repair, and molybdopterin biosynthesis. Our analysis reinforces recent findings suggesting that small noncoding RNAs are more common in Mycobacteria than previously expected. Our data provide a foundation for understanding the genome and biology of <it>Mtb </it>in a comparative context, and are available online and through TBDB.org.</p

    WeatherBench 2: A Benchmark for the Next Generation of Data‐Driven Global Weather Models

    No full text
    Abstract WeatherBench 2 is an update to the global, medium‐range (1–14 days) weather forecasting benchmark proposed by (Rasp et al., 2020, https://doi.org/10.1029/2020ms002203), designed with the aim to accelerate progress in data‐driven weather modeling. WeatherBench 2 consists of an open‐source evaluation framework, publicly available training, ground truth and baseline data as well as a continuously updated website with the latest metrics and state‐of‐the‐art models: https://sites.research.google/weatherbench. This paper describes the design principles of the evaluation framework and presents results for current state‐of‐the‐art physical and data‐driven weather models. The metrics are based on established practices for evaluating weather forecasts at leading operational weather centers. We define a set of headline scores to provide an overview of model performance. In addition, we also discuss caveats in the current evaluation setup and challenges for the future of data‐driven weather forecasting

    Summary of Large Scale Duplication Boundaries.

    No full text
    <p>*Numbers in parenthesis are nucleotide positions for the duplication boundaries derived from PCR amplification.</p>§<p>Genes in square brackets correspond to the partial duplication boundaries for T67 (L2 and R2 in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0026038#pone-0026038-g003" target="_blank">Figure 3</a>) - see text for details.</p

    PCR verification of tandem duplications.

    No full text
    <p>(A) Schematic representation of primer design strategy. The duplicated region is shown as a grey box, present as a single copy in the reference genome (top) and as a tandem duplication below. Primers A and B amplify the left flank and C and D amplify the right flank. Only when C and B and brought into close proximity by a tandem duplication can a product be generated using primers C and B. (B) 0.9% agarose gel loaded with 5 ul of PCR reactions using primers C and B specific for isolate M141 (top), M41 (middle) and cdc606 (bottom). Lane 1 contains the 2 log ladder (New England Biolabs, N3200). (C) 0.9% agarose gel loaded with 5 ul of PCR reactions using primers A and B (lanes 2–5) or C and D (lanes 7–10) specific for each isolate as indicated on the left. Lanes 1 and 6 contain the 2 log ladder (New England Biolabs, N3200).</p

    Schematic of duplication boundaries and junctions for CDC606.

    No full text
    <p>Color coding is identical to <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0026038#pone-0026038-g005" target="_blank">Figure 5</a> with the exception that genes present in strain CDC606 but missing in the reference genome sequence for H37Rv are colored in red.</p
    corecore