6 research outputs found

    DeephageTP: a convolutional neural network framework for identifying phage-specific proteins from metagenomic sequencing data

    No full text
    Bacteriophages (phages) are the most abundant and diverse biological entity on Earth. Due to the lack of universal gene markers and database representatives, there about 50–90% of genes of phages are unable to assign functions. This makes it a challenge to identify phage genomes and annotate functions of phage genes efficiently by homology search on a large scale, especially for newly phages. Portal (portal protein), TerL (large terminase subunit protein), and TerS (small terminase subunit protein) are three specific proteins of Caudovirales phage. Here, we developed a CNN (convolutional neural network)-based framework, DeephageTP, to identify the three specific proteins from metagenomic data. The framework takes one-hot encoding data of original protein sequences as the input and automatically extracts predictive features in the process of modeling. To overcome the false positive problem, a cutoff-loss-value strategy is introduced based on the distributions of the loss values of protein sequences within the same category. The proposed model with a set of cutoff-loss-values demonstrates high performance in terms of Precision in identifying TerL and Portal sequences (94% and 90%, respectively) from the mimic metagenomic dataset. Finally, we tested the efficacy of the framework using three real metagenomic datasets, and the results shown that compared to the conventional alignment-based methods, our proposed framework had a particular advantage in identifying the novel phage-specific protein sequences of portal and TerL with remote homology to their counterparts in the training datasets. In summary, our study for the first time develops a CNN-based framework for identifying the phage-specific protein sequences with high complexity and low conservation, and this framework will help us find novel phages in metagenomic sequencing data. The DeephageTP is available at https://github.com/chuym726/DeephageTP

    Grafting of R4N+-Bearing Organosilane on Kaolinite, Montmorillonite, and Zeolite for Simultaneous Adsorption of Ammonium and Nitrate

    No full text
    Modification of aluminosilicate minerals using a R4N+-bearing organic modifier, through the formation of covalent bonds, is an applicable way to eliminate the modifier release and to maintain the ability to remove cationic pollutants. In this study, trimethyl [3-(trimethoxysilyl) propyl] ammonium chloride (TM) and/or dimethyl octadecyl [3-(trimethoxysilyl) propyl] ammonium chloride (DMO) were used to graft three aluminosilicate minerals, including calcined kaolinite (Kaol), montmorillonite (Mt), and zeolite (Zeol), and the obtained composites were deployed to assess their performance in regard to ammonium (NH4+) and nitrate (NO3−) adsorption. Grafting of TM and/or DMO had little influence on the crystal structures of Kaol and Zeol, but it increased the interlayer distance of Mt due to the intercalation. Compared to Kaol and Zeol, Mt had a substantially greater grafting concentration of organosilane. For Mt, the highest amount of loaded organosilane was observed when TM and DMO were used simultaneously, whereas for Kaol and Zeol, this occurred when only DMO was employed. 29Si-NMR spectra revealed that TM and/or DMO were covalently bonded on Mt. As opposed to NO3−, the amount of adsorbed NH4+ was reduced after TM and/or DMO grafting while having little effect on the adsorption rate. For the grafted Kaol and Zeol, the adsorption of NH4+ and NO3− was non-interfering. This is different from the grafted Mt where NH4+ uptake was aided by the presence of NO3−. The higher concentration of DMO accounted for the larger NO3− uptake, which was accompanied by improved affinity. The results provide a reference for grafting aluminosilicate minerals and designing efficient adsorbents for the co-adsorption of NH4+ and NO3−

    Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2021

    No full text
    The National Genomics Data Center (NGDC), part of the China National Center for Bioinformation (CNCB), provides a suite of database resources to support worldwide research activities in both academia and industry. With the explosive growth of multiomics data, CNCB-NGDC is continually expanding, updating and enriching its core database resources through big data deposition, integration and translation. In the past year, considerable efforts have been devoted to 2019nCoVR, a newly established resource providing a global landscape of SARS-CoV-2 genomic sequences, variants, and haplotypes, as well as Aging Atlas, BrainBase, GTDB (Glycosyltransferases Database), LncExpDB, and TransCirc (Translation potential for circular RNAs). Meanwhile, a series of resources have been updated and improved, including BioProject, BioSample, GWH (Genome Warehouse), GVM (Genome Variation Map), GEN (Gene Expression Nebulas) as well as several biodiversity and plant resources. Particularly, BIG Search, a scalable, one-stop, cross-database search engine, has been significantly updated by providing easy access to a large number of internal and external biological resources from CNCB-NGDC, our partners, EBI and NCBI. All of these resources along with their services are publicly accessible at https://bigd.big.ac.cn
    corecore