996 research outputs found

    RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach

    Get PDF
    BACKGROUND: RNAs play key roles in cells through the interactions with proteins known as the RNA-binding proteins (RBP) and their binding motifs enable crucial understanding of the post-transcriptional regulation of RNAs. How the RBPs correctly recognize the target RNAs and why they bind specific positions is still far from clear. Machine learning-based algorithms are widely acknowledged to be capable of speeding up this process. Although many automatic tools have been developed to predict the RNA-protein binding sites from the rapidly growing multi-resource data, e.g. sequence, structure, their domain specific features and formats have posed significant computational challenges. One of current difficulties is that the cross-source shared common knowledge is at a higher abstraction level beyond the observed data, resulting in a low efficiency of direct integration of observed data across domains. The other difficulty is how to interpret the prediction results. Existing approaches tend to terminate after outputting the potential discrete binding sites on the sequences, but how to assemble them into the meaningful binding motifs is a topic worth of further investigation. RESULTS: In viewing of these challenges, we propose a deep learning-based framework (iDeep) by using a novel hybrid convolutional neural network and deep belief network to predict the RBP interaction sites and motifs on RNAs. This new protocol is featured by transforming the original observed data into a high-level abstraction feature space using multiple layers of learning blocks, where the shared representations across different domains are integrated. To validate our iDeep method, we performed experiments on 31 large-scale CLIP-seq datasets, and our results show that by integrating multiple sources of data, the average AUC can be improved by 8% compared to the best single-source-based predictor; and through cross-domain knowledge integration at an abstraction level, it outperforms the state-of-the-art predictors by 6%. Besides the overall enhanced prediction performance, the convolutional neural network module embedded in iDeep is also able to automatically capture the interpretable binding motifs for RBPs. Large-scale experiments demonstrate that these mined binding motifs agree well with the experimentally verified results, suggesting iDeep is a promising approach in the real-world applications. CONCLUSION: The iDeep framework not only can achieve promising performance than the state-of-the-art predictors, but also easily capture interpretable binding motifs. iDeep is available at http://www.csbio.sjtu.edu.cn/bioinf/iDeep ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1561-8) contains supplementary material, which is available to authorized users

    FoldExplorer: Fast and Accurate Protein Structure Search with Sequence-Enhanced Graph Embedding

    Full text link
    The advent of highly accurate protein structure prediction methods has fueled an exponential expansion of the protein structure database. Consequently, there is a rising demand for rapid and precise structural homolog search. Traditional alignment-based methods are dedicated to precise comparisons between pairs, exhibiting high accuracy. However, their sluggish processing speed is no longer adequate for managing the current massive volume of data. In response to this challenge, we propose a novel deep-learning approach FoldExplorer. It harnesses the powerful capabilities of graph attention neural networks and protein large language models for protein structures and sequences data processing to generate embeddings for protein structures. The structural embeddings can be used for fast and accurate protein search. The embeddings also provide insights into the protein space. FoldExplorer demonstrates a substantial performance improvement of 5% to 8% over the current state-of-the-art algorithm on the benchmark datasets. Meanwhile, FoldExplorer does not compromise on search speed and excels particularly in searching on a large-scale dataset.Comment: 14 pages, 8 figure

    Plant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization

    Get PDF
    One of the fundamental goals in proteomics and cell biology is to identify the functions of proteins in various cellular organelles and pathways. Information of subcellular locations of proteins can provide useful insights for revealing their functions and understanding how they interact with each other in cellular network systems. Most of the existing methods in predicting plant protein subcellular localization can only cover three or four location sites, and none of them can be used to deal with multiplex plant proteins that can simultaneously exist at two, or move between, two or more different location sites. Actually, such multiplex proteins might have special biological functions worthy of particular notice. The present study was devoted to improve the existing plant protein subcellular location predictors from the aforementioned two aspects. A new predictor called “Plant-mPLoc” is developed by integrating the gene ontology information, functional domain information, and sequential evolutionary information through three different modes of pseudo amino acid composition. It can be used to identify plant proteins among the following 12 location sites: (1) cell membrane, (2) cell wall, (3) chloroplast, (4) cytoplasm, (5) endoplasmic reticulum, (6) extracellular, (7) Golgi apparatus, (8) mitochondrion, (9) nucleus, (10) peroxisome, (11) plastid, and (12) vacuole. Compared with the existing methods for predicting plant protein subcellular localization, the new predictor is much more powerful and flexible. Particularly, it also has the capacity to deal with multiple-location proteins, which is beyond the reach of any existing predictors specialized for identifying plant protein subcellular localization. As a user-friendly web-server, Plant-mPLoc is freely accessible at http://www.csbio.sjtu.edu.cn/bioinf/plant-multi/. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results. It is anticipated that the Plant-mPLoc predictor as presented in this paper will become a very useful tool in plant science as well as all the relevant areas

    Many Local Pattern Texture Features: Which Is Better for Image-Based Multilabel Human Protein Subcellular Localization Classification?

    Get PDF
    Human protein subcellular location prediction can provide critical knowledge for understanding a protein’s function. Since significant progress has been made on digital microscopy, automated image-based protein subcellular location classification is urgently needed. In this paper, we aim to investigate more representative image features that can be effectively used for dealing with the multilabel subcellular image samples. We prepared a large multilabel immunohistochemistry (IHC) image benchmark from the Human Protein Atlas database and tested the performance of different local texture features, including completed local binary pattern, local tetra pattern, and the standard local binary pattern feature. According to our experimental results from binary relevance multilabel machine learning models, the completed local binary pattern, and local tetra pattern are more discriminative for describing IHC images when compared to the traditional local binary pattern descriptor. The combination of these two novel local pattern features and the conventional global texture features is also studied. The enhanced performance of final binary relevance classification model trained on the combined feature space demonstrates that different features are complementary to each other and thus capable of improving the accuracy of classification

    A New Method for Predicting the Subcellular Localization of Eukaryotic Proteins with Both Single and Multiple Sites: Euk-mPLoc 2.0

    Get PDF
    Information of subcellular locations of proteins is important for in-depth studies of cell biology. It is very useful for proteomics, system biology and drug development as well. However, most existing methods for predicting protein subcellular location can only cover 5 to 12 location sites. Also, they are limited to deal with single-location proteins and hence failed to work for multiplex proteins, which can simultaneously exist at, or move between, two or more location sites. Actually, multiplex proteins of this kind usually posses some important biological functions worthy of our special notice. A new predictor called “Euk-mPLoc 2.0” is developed by hybridizing the gene ontology information, functional domain information, and sequential evolutionary information through three different modes of pseudo amino acid composition. It can be used to identify eukaryotic proteins among the following 22 locations: (1) acrosome, (2) cell wall, (3) centriole, (4) chloroplast, (5) cyanelle, (6) cytoplasm, (7) cytoskeleton, (8) endoplasmic reticulum, (9) endosome, (10) extracell, (11) Golgi apparatus, (12) hydrogenosome, (13) lysosome, (14) melanosome, (15) microsome (16) mitochondria, (17) nucleus, (18) peroxisome, (19) plasma membrane, (20) plastid, (21) spindle pole body, and (22) vacuole. Compared with the existing methods for predicting eukaryotic protein subcellular localization, the new predictor is much more powerful and flexible, particularly in dealing with proteins with multiple locations and proteins without available accession numbers. For a newly-constructed stringent benchmark dataset which contains both single- and multiple-location proteins and in which none of proteins has pairwise sequence identity to any other in a same location, the overall jackknife success rate achieved by Euk-mPLoc 2.0 is more than 24% higher than those by any of the existing predictors. As a user-friendly web-server, Euk-mPLoc 2.0 is freely accessible at http://www.csbio.sjtu.edu.cn/bioinf/euk-multi-2/. For a query protein sequence of 400 amino acids, it will take about 15 seconds for the web-server to yield the predicted result; the longer the sequence is, the more time it may usually need. It is anticipated that the novel approach and the powerful predictor as presented in this paper will have a significant impact to Molecular Cell Biology, System Biology, Proteomics, Bioinformatics, and Drug Development

    Efficacy of Ultrasound-guided Radiofrequency Ablation of Parathyroid Hyperplasia: Single Session vs. Two-Session for Effect on Hypocalcemia

    Get PDF
    To evaluate safety and efficacy of one- vs. two-session radiofrequency ablation (RFA) of parathyroid hyperplasia for patients with secondary hyperparathyroidism (SHPT) and to compare the outcome of both methods on hypocalcemia. Patients with secondary hyperparathyroidism underwent ultrasound guided RFA of parathyroid hyperplasia. Patients were alternately assigned to either group 1 (n = 28) with RFA of all 4 glands in one session or group 2 (n = 28) with RFA of 2 glands in a first session and other 2 glands in a second session. Serum parathyroid hormone (PTH), calcium, phosphorus and alkaline phosphatase (ALP) values were measured at a series of time points after RFA. RFA parameters, including operation duration and ablation time and hospitalization length and cost, were compared between the two groups. Mean PTH decreased in group 1 from 1865.18 ± 828.93 pg/ml to 145.72 ± 119.27 pg/ml at 1 day after RFA and in group 2 from 2256.64 ± 1021.72 pg/ml to 1388.13 ± 890.15 pg/ml at 1 day after first RFA and to 137.26 ± 107.12 pg/ml at 1 day after second RFA. Group 1\u27s calcium level decreased to 1.79 ± 0.31 mmol/L at day 1 after RFA and group 2 decreased to 1.89 ± 0.26 mmol/L at day 1 after second session RFA (P \u3c 0.05). Multivariate analysis showed that hypocalcemia was related to serum ALP. Patients with ALP ≥ 566 U/L had lower calcium compared to patients with ALP \u3c 566 U/L up to a month after RFA (P \u3c 0.05). Group 1\u27s RFA time and hospitalization were shorter and had lower cost compared with Group 2. US-guided RFA of parathyroid hyperplasia is a safe and effective method for treating secondary hyperparathyroidism. Single-session RFA was more cost-effective and resulted in a shorter hospital stay compared to two sessions. However, patients with two-session RFA had less hypocalcemia, especially those with high ALP

    Experimental observation of non-Hermitian higher-order skin interface states in topological electric circuits

    Full text link
    The study of topological states has developed rapidly in electric circuits, which permits flexible fabrications of non-Hermitian systems by introducing non-Hermitian terms. Here, nonreciprocal coupling terms are realized by utilizing a voltage follower module in non-Hermitian topological electric circuits. We report the experimental realization of one- and two- dimensional non-Hermitian skin interface states in electric circuits, where interface states induced by non-Hermitian skin effects are localized at the interface of different domains carrying different winding numbers. Our electric circuit system provides a readily accessible platform to explore non-Hermitian-induced topological phases, and paves a new road for device applications

    Improving accuracy of protein contact prediction using balanced network deconvolution

    Full text link
    Residue contact map is essential for protein three‐dimensional structure determination. But most of the current contact prediction methods based on residue co‐evolution suffer from high false‐positives as introduced by indirect and transitive contacts (i.e., residues A–B and B–C are in contact, but A–C are not). Built on the work by Feizi et al. (Nat Biotechnol 2013; 31:726–733), which demonstrated a general network model to distinguish direct dependencies by network deconvolution, this study presents a new balanced network deconvolution (BND) algorithm to identify optimized dependency matrix without limit on the eigenvalue range in the applied network systems. The algorithm was used to filter contact predictions of five widely used co‐evolution methods. On the test of proteins from three benchmark datasets of the 9th critical assessment of protein structure prediction (CASP9), CASP10, and PSICOV (precise structural contact prediction using sparse inverse covariance estimation) database experiments, the BND can improve the medium‐ and long‐range contact predictions at the L/5 cutoff by 55.59% and 47.68%, respectively, without additional central processing unit cost. The improvement is statistically significant, with a P‐value < 5.93 × 10−3 in the Student's t‐test. A further comparison with the ab initio structure predictions in CASPs showed that the usefulness of the current co‐evolution‐based contact prediction to the three‐dimensional structure modeling relies on the number of homologous sequences existing in the sequence databases. BND can be used as a general contact refinement method, which is freely available at: http://www.csbio.sjtu.edu.cn/bioinf/BND/. Proteins 2015; 83:485–496. © 2014 Wiley Periodicals, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/110720/1/prot24744.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/110720/2/prot24744-sup-0001-suppinfo.pd
    corecore