22 research outputs found

    Modeling islet enhancers using deep learning identifies candidate causal variants at loci associated with T2D and glycemic traits.

    Get PDF
    Genetic association studies have identified hundreds of independent signals associated with type 2 diabetes (T2D) and related traits. Despite these successes, the identification of specific causal variants underlying a genetic association signal remains challenging. In this study, we describe a deep learning (DL) method to analyze the impact of sequence variants on enhancers. Focusing on pancreatic islets, a T2D relevant tissue, we show that our model learns islet-specific transcription factor (TF) regulatory patterns and can be used to prioritize candidate causal variants. At 101 genetic signals associated with T2D and related glycemic traits where multiple variants occur in linkage disequilibrium, our method nominates a single causal variant for each association signal, including three variants previously shown to alter reporter activity in islet-relevant cell types. For another signal associated with blood glucose levels, we biochemically test all candidate causal variants from statistical fine-mapping using a pancreatic islet beta cell line and show biochemical evidence of allelic effects on TF binding for the model-prioritized variant. To aid in future research, we publicly distribute our model and islet enhancer perturbation scores across ~67 million genetic variants. We anticipate that DL methods like the one presented in this study will enhance the prioritization of candidate causal variants for functional studies

    The Organization of the Quorum Sensing luxI/R Family Genes in Burkholderia

    Get PDF
    Members of the Burkholderia genus of Proteobacteria are capable of living freely in the environment and can also colonize human, animal and plant hosts. Certain members are considered to be clinically important from both medical and veterinary perspectives and furthermore may be important modulators of the rhizosphere. Quorum sensing via N-acyl homoserine lactone signals (AHL QS) is present in almost all Burkholderia species and is thought to play important roles in lifestyle changes such as colonization and niche invasion. Here we present a census of AHL QS genes retrieved from public databases and indicate that the local arrangement (topology) of QS genes, their location within chromosomes and their gene neighborhoods show characteristic patterns that differ between the known Burkholderia clades. In sequence phylogenies, AHL QS genes seem to cluster according to the local gene topology rather than according to the species, which suggests that the basic topology types were present prior to the appearance of current Burkholderia species. The data are available at http://net.icgeb.org/burkholderia/

    Phylogenomics of Cas4 family nucleases

    No full text
    Abstract Background The Cas4 family endonuclease is a component of the adaptation module in many variants of CRISPR-Cas adaptive immunity systems. Unlike most of the other Cas proteins, Cas4 is often encoded outside CRISPR-cas loci (solo-Cas4) and is also found in mobile genetic elements (MGE-Cas4). Results As part of our ongoing investigation of CRISPR-Cas evolution, we explored the phylogenomics of the Cas4 family. About 90% of the archaeal genomes encode Cas4 compared to only about 20% of the bacterial genomes. Many archaea encode both the CRISPR-associated form (CAS-Cas4) and solo-Cas4, whereas in bacteria, this combination is extremely rare. The solo-cas4 genes are over-represented in environmental bacteria and archaea with small genomes that typically lack CRISPR-Cas, suggesting that Cas4 could perform uncharacterized defense or repair functions in these microbes. Phylogenomic analysis indicates that both the CRISPR-associated cas4 genes are often transferred horizontally but almost exclusively, as part of the adaptation module. The evolutionary integrity of the adaptation module sharply contrasts the rampant shuffling of CRISPR-cas modules whereby a given variant of the adaptation module can combine with virtually any effector module. The solo-cas4 genes evolve primarily via vertical inheritance and are subject only to occasional horizontal transfer. The selection pressure on cas4 genes does not substantially differ between CAS-Cas4 and solo-cas4, and is close to the genomic median. Thus, cas4 genes, similarly to cas1 and cas2, evolve similarly to ‘regular’ microbial genes involved in various cellular functions, showing no evidence of direct involvement in virus-host arms races. A notable feature of the Cas4 family evolution is the frequent recruitment of cas4 genes by various mobile genetic elements (MGE), particularly, archaeal viruses. The functions of Cas4 in these elements are unknown and potentially might involve anti-defense roles. Conclusions Unlike most of the other Cas proteins, Cas4 family members are as often encoded by stand-alone genes as they are incorporated in CRISPR-Cas systems. In addition, cas4 genes were repeatedly recruited by MGE, perhaps, for anti-defense functions. Experimental characterization of the solo and MGE-encoded Cas4 nucleases is expected to reveal currently uncharacterized defense and anti-defense systems and their interactions with CRISPR-Cas systems

    Additional file 7: Table S1 of Phylogenomics of Cas4 family nucleases

    No full text
    Worksheet “loci” provides detailed information on all cas4 loci in completely sequenced and draft genomes of archaea and bacteria. Annotation for the proteins encoded in the loci is based on Cas protein and CDD assignments using PSI-BLAST program (see Methods for details). Worksheet “tree order and assignments” provides information of the order of the Cas4 in the tree (Supplementary file 1), Cas4 assignments to distinct groups of CAS-Cas4, solo-Cas4 and others. (TXT 28 kb

    Additional file 3: Information File S1. of Phylogenomics of Cas4 family nucleases

    No full text
    Complete tree for Cas4-like set in Newick format. Cas4 assignments are included in the leaf name. (TXT 463 kb

    Additional file 2: Figure S2. of Phylogenomics of Cas4 family nucleases

    No full text
    Nucleotide sequence comparisons of CRISPR-Cas loci encoded in closely related strains. On the axes, the labels contain the name of the source genome, contig ID and the coordinates of the respective loci. The annotations for CRISPR-Cas loci were taken from the Additional file 7: Table S1, “Loci” worksheet. The cartoons on the axes represent the genes and CRISPR repeats encoded in these loci. The sizes of the cartoons are proportional to the actual sizes of these genes. Colors: black are CRISPR arrays, blue are cas genes, green - cas4 gene, shaded area are the regions which have >70% sequence identity level. Left: Comparison of two I-C systems from Marinobacter strains. Right: Comparison of two I-B systems from Campylobacter strains. (DOCX 606 kb
    corecore