5 research outputs found

    Census of halide-binding sites in protein structures

    Get PDF
    Motivation: Halides are negatively charged ions of halogens, forming fluorides (F-), chlorides (Cl-), bromides (Br-) and iodides (I-). These anions are quite reactive and interact both specifically and non-specifically with proteins. Despite their ubiquitous presence and important roles in protein function, little is known about the preferences of halides binding to proteins. To address this problem, we performed the analysis of halide-protein interactions, based on the entries in the Protein Data Bank. Results: We have compiled a pipeline for the quick analysis of halide-binding sites in proteins using the available software. Our analysis revealed that all of halides are strongly attracted by the guanidinium moiety of arginine side chains, however, there are also certain preferences among halides for other partners. Furthermore, there is a certain preference for coordination numbers in the binding sites, with a correlation between coordination numbers and amino acid composition. This pipeline can be used as a tool for the analysis of specific halide-protein interactions and assist phasing experiments relying on halides as anomalous scatters

    No More Tears: Mining Sequencing Data for Novel Bt Cry Toxins with CryProcessor

    No full text
    Bacillus thuringiensis (Bt) is a natural pathogen of insects and some other groups of invertebrates that produces three-domain Cry (3d-Cry) toxins, which are highly host-specific pesticidal proteins. These proteins represent the most commonly used bioinsecticides in the world and are used for commercial purposes on the market of insecticides, being convergent with the paradigm of sustainable growth and ecological development. Emerging resistance to known toxins in pests stresses the need to expand the list of known toxins to broaden the horizons of insecticidal approaches. For this purpose, we have elaborated a fast and user-friendly tool called CryProcessor, which allows productive and precise mining of 3d-Cry toxins. The only existing tool for mining Cry toxins, called a BtToxin_scanner, has significant limitations such as limited query size, lack of accuracy and an outdated database. In order to find a proper solution to these problems, we have developed a robust pipeline, capable of precise 3d-Cry toxin mining. The unique feature of the pipeline is the ability to search for Cry toxins sequences directly on assembly graphs, providing an opportunity to analyze raw sequencing data and overcoming the problem of fragmented assemblies. Moreover, CryProcessor is able to predict precisely the domain layout in arbitrary sequences, allowing the retrieval of sequences of definite domains beyond the bounds of a limited number of toxins presented in CryGetter. Our algorithm has shown efficiency in all its work modes and outperformed its analogues on large amounts of data. Here, we describe its main features and provide information on its benchmarking against existing analogues. CryProcessor is a novel, fast, convenient, open source (https://github.com/lab7arriam/cry_processor), platform-independent, and precise instrument with a console version and elaborated web interface (https://lab7.arriam.ru/tools/cry_processor). Its major merits could make it possible to carry out massive screening for novel 3d-Cry toxins and obtain sequences of specific domains for further comprehensive in silico experiments in constructing artificial toxins

    Whole-exome sequencing provides insights into monogenic disease prevalence in Northwest Russia

    No full text
    Abstract Background Allele frequency data from large exome and genome aggregation projects such as the Genome Aggregation Database (gnomAD) are of ultimate importance to the interpretation of medical resequencing data. However, allele frequencies might significantly differ in poorly studied populations that are underrepresented in large‐scale projects, such as the Russian population. Methods In this work, we leveraged our access to a large dataset of 694 exome samples to analyze genetic variation in the Northwest Russia. We compared the spectrum of genetic variants to the dbSNP build 151, and made estimates of ClinVar‐based autosomal recessive (AR) disease allele prevalence as compared to gnomAD r. 2.1. Results An estimated 9.3% of discovered variants were not present in dbSNP. We report statistically significant overrepresentation of pathogenic variants for several Mendelian disorders, including phenylketonuria (PAH, rs5030858), Wilson's disease (ATP7B, rs76151636), factor VII deficiency (F7, rs36209567), kyphoscoliosis type of Ehlers‐Danlos syndrome (FKBP14, rs542489955), and several other recessive pathologies. We also make primary estimates of monogenic disease incidence in the population, with retinal dystrophy, cystic fibrosis, and phenylketonuria being the most frequent AR pathologies. Conclusion Our observations demonstrate the utility of population‐specific allele frequency data to the diagnosis of monogenic disorders using high‐throughput technologies
    corecore