25 research outputs found

    Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach

    Get PDF
    Among proteins, orthologs are defined as those that are derived by vertical descent from a single progenitor in the last common ancestor of their host organisms. Our goal is to compute a complete set of protein orthologs derived from all currently available complete bacterial and archaeal genomes. Traditional approaches typically rely on all-against-all BLAST searching which is prohibitively expensive in terms of hardware requirements or computational time (requiring an estimated 18 months or more on a typical server). Here, we present xBASE-Orth, a system for ongoing ortholog annotation, which applies a “divide and conquer” approach and adopts a pragmatic scheme that trades accuracy for speed. Starting at species level, xBASE-Orth carefully constructs and uses pan-genomes as proxies for the full collections of coding sequences at each level as it progressively climbs the taxonomic tree using the previously computed data. This leads to a significant decrease in the number of alignments that need to be performed, which translates into faster computation, making ortholog computation possible on a global scale. Using xBASE-Orth, we analyzed an NCBI collection of 1,288 bacterial and 94 archaeal complete genomes with more than 4 million coding sequences in 5 weeks and predicted more than 700 million ortholog pairs, clustered in 175,531 orthologous groups. We have also identified sets of highly conserved bacterial and archaeal orthologs and in so doing have highlighted anomalies in genome annotation and in the proposed composition of the minimal bacterial genome. In summary, our approach allows for scalable and efficient computation of the bacterial and archaeal ortholog annotations. In addition, due to its hierarchical nature, it is suitable for incorporating novel complete genomes and alternative genome annotations. The computed ortholog data and a continuously evolving set of applications based on it are integrated in the xBASE database, available at http://www.xbase.ac.uk/

    Defining bacterial species in the genomic era : insights from the genus Acinetobacter

    Get PDF
    BackgroundMicrobial taxonomy remains a conservative discipline, relying on phenotypic information derived from growth in pure culture and techniques that are time-consuming and difficult to standardize, particularly when compared to the ease of modern high-throughput genome sequencing. Here, drawing on the genus Acinetobacter as a test case, we examine whether bacterial taxonomy could abandon phenotypic approaches and DNA-DNA hybridization and, instead, rely exclusively on analyses of genome sequence data.ResultsIn pursuit of this goal, we generated a set of thirteen new draft genome sequences, representing ten species, combined them with other publically available genome sequences and analyzed these 38 strains belonging to the genus. We found that analyses based on 16S rRNA gene sequences were not capable of delineating accepted species. However, a core genome phylogenetic tree proved consistent with the currently accepted taxonomy of the genus, while also identifying three misclassifications of strains in collections or databases. Among rapid distance-based methods, we found average-nucleotide identity (ANI) analyses delivered results consistent with traditional and phylogenetic classifications, whereas gene content based approaches appear to be too strongly influenced by the effects of horizontal gene transfer to agree with previously accepted species.ConclusionWe believe a combination of core genome phylogenetic analysis and ANI provides an appropriate method for bacterial species delineation, whereby bacterial species are defined as monophyletic groups of isolates with genomes that exhibit at least 95% pair-wise ANI. The proposed method is backwards compatible; it provides a scalable and uniform approach that works for both culturable and non-culturable species; is faster and cheaper than traditional taxonomic methods; is easily replicable and transferable among research institutions; and lastly, falls in line with Darwin’s vision of classification becoming, as far as is possible, genealogical

    Regionally enriched rare deleterious exonic variants in the UK and Ireland

    Get PDF
    It is unclear how patterns of regional genetic differentiation in the UK and Ireland might impact the protein-coding fraction of the genome. We exploit UK Biobank (UKB) and Viking Genes whole exome sequencing data to study regional genetic differentiation across the UK and Ireland in protein coding genes, encompassing 44,696 unrelated individuals from 20 regions of origin. We demonstrate substantial exonic differentiation among Shetlanders, Orcadians, individuals with full or partial Ashkenazi Jewish ancestry and in several mainland regions (particularly north and south Wales, southeast Scotland and Ireland). With stringent filtering criteria, we find 67 regionally enriched (≄5-fold) variants likely to have adverse biomedical consequences in homozygous individuals. Here, we show that regional genetic variation across the UK and Ireland should be considered in the design of genetic studies and may inform effective genetic screening and counselling

    Flexible and scalable diagnostic filtering of genomic variants using G2P with Ensembl VEP.

    Get PDF
    We aimed to develop an efficient, flexible and scalable approach to diagnostic genome-wide sequence analysis of genetically heterogeneous clinical presentations. Here we present G2P ( www.ebi.ac.uk/gene2phenotype ) as an online system to establish, curate and distribute datasets for diagnostic variant filtering via association of allelic requirement and mutational consequence at a defined locus with phenotypic terms, confidence level and evidence links. An extension to Ensembl Variant Effect Predictor (VEP), VEP-G2P was used to filter both disease-associated and control whole exome sequence (WES) with Developmental Disorders G2P (G2PDD; 2044 entries). VEP-G2PDD shows a sensitivity/precision of 97.3%/33% for de novo and 81.6%/22.7% for inherited pathogenic genotypes respectively. Many of the missing genotypes are likely false-positive pathogenic assignments. The expected number and discriminative features of background genotypes are defined using control WES. Using only human genetic data VEP-G2P performs well compared to other freely-available diagnostic systems and future phenotypic matching capabilities should further enhance performance

    Identification and functional modelling of plausibly causative cis-regulatory variants in a highly-selected cohort with X-linked intellectual disability.

    Get PDF
    Identifying causative variants in cis-regulatory elements (CRE) in neurodevelopmental disorders has proven challenging. We have used in vivo functional analyses to categorize rigorously filtered CRE variants in a clinical cohort that is plausibly enriched for causative CRE mutations: 48 unrelated males with a family history consistent with X-linked intellectual disability (XLID) in whom no detectable cause could be identified in the coding regions of the X chromosome (chrX). Targeted sequencing of all chrX CRE identified six rare variants in five affected individuals that altered conserved bases in CRE targeting known XLID genes and segregated appropriately in families. Two of these variants, FMR1CRE and TENM1CRE, showed consistent site- and stage-specific differences of enhancer function in the developing zebrafish brain using dual-color fluorescent reporter assay. Mouse models were created for both variants. In male mice Fmr1CRE induced alterations in neurodevelopmental Fmr1 expression, olfactory behavior and neurophysiological indicators of FMRP function. The absence of another likely causative variant on whole genome sequencing further supported FMR1CRE as the likely basis of the XLID in this family. Tenm1CRE mice showed no phenotypic anomalies. Following the release of gnomAD 2.1, reanalysis showed that TENM1CRE exceeded the maximum plausible population frequency of a XLID causative allele. Assigning causative status to any ultra-rare CRE variant remains problematic and requires disease-relevant in vivo functional data from multiple sources. The sequential and bespoke nature of such analyses renders them time-consuming and challenging to scale for routine clinical use

    A recurrent de novo mutation in ACTG1 causes isolated ocular coloboma

    Get PDF
    Ocular coloboma (OC) is a defect in optic fissure closure and is a common cause of severe congenital visual impairment. Bilateral OC is primarily genetically determined and shows marked locus heterogeneity. Whole-exome sequencing (WES) was used to analyze 12 trios (child affected with OC and both unaffected parents). This identified de novo mutations in 10 different genes in eight probands. Three of these genes encoded proteins associated with actin cytoskeleton dynamics: ACTG1, TWF1, and LCP1. Proband-only WES identified a second unrelated individual with isolated OC carrying the same ACTG1 allele, encoding p.(Pro70Leu). Both individuals have normal neurodevelopment with no extra-ocular signs of Baraitser–Winter syndrome. We found this mutant protein to be incapable of incorporation into F-actin. The LCP1 and TWF1 variants each resulted in only minor disturbance of actin interactions, and no further plausibly causative variants were identified in these genes on resequencing 380 unrelated individuals with OC

    Genomic epidemiology of a protracted hospital outbreak caused by multidrug-resistant Acinetobacter baumannii in Birmingham, England

    Get PDF
    BACKGROUND: Multidrug-resistant Acinetobacter baumannii commonly causes hospital outbreaks. However, within an outbreak, it can be difficult to identify the routes of cross-infection rapidly and accurately enough to inform infection control. Here, we describe a protracted hospital outbreak of multidrug-resistant A. baumannii, in which whole-genome sequencing (WGS) was used to obtain a high-resolution view of the relationships between isolates. METHODS: To delineate and investigate the outbreak, we attempted to genome-sequence 114 isolates that had been assigned to the A. baumannii complex by the Vitek2 system and obtained informative draft genome sequences from 102 of them. Genomes were mapped against an outbreak reference sequence to identify single nucleotide variants (SNVs). RESULTS: We found that the pulsotype 27 outbreak strain was distinct from all other genome-sequenced strains. Seventy-four isolates from 49 patients could be assigned to the pulsotype 27 outbreak on the basis of genomic similarity, while WGS allowed 18 isolates to be ruled out of the outbreak. Among the pulsotype 27 outbreak isolates, we identified 31 SNVs and seven major genotypic clusters. In two patients, we documented within-host diversity, including mixtures of unrelated strains and within-strain clouds of SNV diversity. By combining WGS and epidemiological data, we reconstructed potential transmission events that linked all but 10 of the patients and confirmed links between clinical and environmental isolates. Identification of a contaminated bed and a burns theatre as sources of transmission led to enhanced environmental decontamination procedures. CONCLUSIONS: WGS is now poised to make an impact on hospital infection prevention and control, delivering cost-effective identification of routes of infection within a clinically relevant timeframe and allowing infection control teams to track, and even prevent, the spread of drug-resistant hospital pathogens

    TRAIP promotes DNA damage response during genome replication and is mutated in primordial dwarfism.

    Get PDF
    DNA lesions encountered by replicative polymerases threaten genome stability and cell cycle progression. Here we report the identification of mutations in TRAIP, encoding an E3 RING ubiquitin ligase, in patients with microcephalic primordial dwarfism. We establish that TRAIP relocalizes to sites of DNA damage, where it is required for optimal phosphorylation of H2AX and RPA2 during S-phase in response to ultraviolet (UV) irradiation, as well as fork progression through UV-induced DNA lesions. TRAIP is necessary for efficient cell cycle progression and mutations in TRAIP therefore limit cellular proliferation, providing a potential mechanism for microcephaly and dwarfism phenotypes. Human genetics thus identifies TRAIP as a component of the DNA damage response to replication-blocking DNA lesions.This work was supported by funding from the Medical Research Council and the European Research Council (ERC, 281847) (A.P.J.), the Lister Institute for Preventative Medicine (A.P.J. and G.S.S.), Medical Research Scotland (L.S.B.), German Federal Ministry of Education and Research (BMBF, 01GM1404) and E-RARE network EuroMicro (B.W), Wellcome Trust (M. Hurles), CMMC (P.N.), Cancer Research UK (C17183/A13030) (G.S.S. and M.R.H), Swiss National Science Foundation (P2ZHP3_158709) (O.M.), AIRC (12710) and ERC/EU FP7 (CIG_303806) (S.S.), Cancer Research UK (C6/A11224) and ERC/EU FP7 (HEALTH-F2- 2010-259893) (A.N.B. and S.P.J.).This is the author accepted manuscript. The final version is available from NPG via http://dx.doi.org/10.1038/ng.345

    HEATR2 Plays a Conserved Role in Assembly of the Ciliary Motile Apparatus

    Get PDF
    Cilia are highly conserved microtubule-based structures that perform a variety of sensory and motility functions during development and adult homeostasis. In humans, defects specifically affecting motile cilia lead to chronic airway infections, infertility and laterality defects in the genetically heterogeneous disorder Primary Ciliary Dyskinesia (PCD). Using the comparatively simple Drosophila system, in which mechanosensory neurons possess modified motile cilia, we employed a recently elucidated cilia transcriptional RFX-FOX code to identify novel PCD candidate genes. Here, we report characterization of CG31320/HEATR2, which plays a conserved critical role in forming the axonemal dynein arms required for ciliary motility in both flies and humans. Inner and outer arm dyneins are absent from axonemes of CG31320 mutant flies and from PCD individuals with a novel splice-acceptor HEATR2 mutation. Functional conservation of closely arranged RFX-FOX binding sites upstream of HEATR2 orthologues may drive higher cytoplasmic expression of HEATR2 during early motile ciliogenesis. Immunoprecipitation reveals HEATR2 interacts with DNAI2, but not HSP70 or HSP90, distinguishing it from the client/chaperone functions described for other cytoplasmic proteins required for dynein arm assembly such as DNAAF1-4. These data implicate CG31320/HEATR2 in a growing intracellular pre-assembly and transport network that is necessary to deliver functional dynein machinery to the ciliary compartment for integration into the motile axoneme
    corecore