814 research outputs found

    Model-based classification for subcellular localization prediction of proteins

    Get PDF

    Improved functional prediction of proteins by learning kernel combinations in multilabel settings

    Get PDF
    Background We develop a probabilistic model for combining kernel matrices to predict the function of proteins. It extends previous approaches in that it can handle multiple labels which naturally appear in the context of protein function. Results Explicit modeling of multilabels significantly improves the capability of learning protein function from multiple kernels. The performance and the interpretability of the inference model are further improved by simultaneously predicting the subcellular localization of proteins and by combining pairwise classifiers to consistent class membership estimates. Conclusion For the purpose of functional prediction of proteins, multilabels provide valuable information that should be included adequately in the training process of classifiers. Learning of functional categories gains from co-prediction of subcellular localization. Pairwise separation rules allow very detailed insights into the relevance of different measurements like sequence, structure, interaction data, or expression data. A preliminary version of the software can be downloaded from http://www.inf.ethz.ch/personal/vroth/KernelHMM/.ISSN:1471-210

    Ensemble deep learning: A review

    Get PDF
    Ensemble learning combines several individual models to obtain better generalization performance. Currently, deep learning models with multilayer processing architecture is showing better performance as compared to the shallow or traditional classification models. Deep ensemble learning models combine the advantages of both the deep learning models as well as the ensemble learning such that the final model has better generalization performance. This paper reviews the state-of-art deep ensemble models and hence serves as an extensive summary for the researchers. The ensemble models are broadly categorised into ensemble models like bagging, boosting and stacking, negative correlation based deep ensemble models, explicit/implicit ensembles, homogeneous /heterogeneous ensemble, decision fusion strategies, unsupervised, semi-supervised, reinforcement learning and online/incremental, multilabel based deep ensemble models. Application of deep ensemble models in different domains is also briefly discussed. Finally, we conclude this paper with some future recommendations and research directions

    BOOST THE DSICOVERY OF MRP7/ABCC10 SUBSTRATES AND INHIBITORS: ESTABLISHMENT OF NEW IN VITRO AND IN SILICO MODELS

    Get PDF
    ATP-binding cassette (ABC) transporters are responsible for the efflux of structurally distinct endo- and xenobiotics energized by ATP hydrolysis. MRP7/ABCC10 belongs to the 10th member of subfamily C and responsible for mediating MDR against a series of chemotherapeutic drugs such as taxanes, epothilones, Vinca alkaloids, anthracyclines and epipodophyllotoxins. Establishment of new in silico and in vitro models for MRP7 substrates/inhibitors prediction Considering the limited knowledge of MRP7, we established a homology model based on bovine MRP1 cryo-EM models. The final model was used for protein global motion analysis and docking analysis. Before docking, potential drug binding pockets were identified and evaluated. Next, MRP7 substrates and inhibitors were docked into drug binding pockets. We found that docked inhibitors and substrates formed separate clusters, from which a substrate binding region and an inhibitor binding region were proposed. This homology protein model enables the docking analysis of potential MRP7 ligands for future studies. Moreover, we established a new SKOV3/MRP7 cell line which exhibits similar drug resistance profile as the previously established HEK/MRP7 cell line. This new cell line is valuable for MRP7 substrates and inhibitors discovery. Last but not the least, we established a novel machine learning model named Mrp7Pred for large-scale MRP7 substrates/inhibitors prediction. The model was also deployed as a web server and is freely available to users in http://www.mrp7pred.com. We successfully identified 2 substrates and 4 inhibitors from 70 FDA-approved drugs using Mrp7Pred. New synthetic agents targeting MRP7 and overcomes MRP7-medited MDR Previously, we identified two synthetic compounds, CMP25 and CP55, as potent ABCB1 and ABCG2 inhibitors. Here we found these two compounds also significantly reversed the MDR mediated by MRP7. Both compounds significantly sensitized MRP7- overexpressing HEK/MRP7 cells to paclitaxel and vincristine. Western blotting indicates that neither CMP25 nor CP55 alters MRP7 expression level. Immunofluorescence showed that the subcellular localization of MRP7 was not altered by these two compounds. However, intracellular accumulation of [3H]-paclitaxel and [3H]-vincristine were significantly increased while the efflux was significantly reduced when co- administered with CMP25 or CP55. Hydrophobic interactions were predicted as the major contributors in stabilizing the drug-protein complex via docking analysis

    Predicting gene essentiality in Caenorhabditis elegans by feature engineering and machine-learning

    Get PDF
    Defining genes that are essential for life has major implications for understanding critical biological processes and mechanisms. Although essential genes have been identified and characterised experimentally using functional genomic tools, it is challenging to predict with confidence such genes from molecular and phenomic data sets using computational methods. Using extensive data sets available for the model organism Caenorhabditis elegans, we constructed here a machine-learning (ML)-based workflow for the prediction of essential genes on a genome-wide scale. We identified strong predictors for such genes and showed that trained ML models consistently achieve highly-accurate classifications. Complementary analyses revealed an association between essential genes and chromosomal location. Our findings reveal that essential genes in C. elegans tend to be located in or near the centre of autosomal chromosomes; are positively correlated with low single nucleotide polymorphim (SNP) densities and epigenetic markers in promoter regions; are involved in protein and nucleotide processing; are transcribed in most cells; are enriched in reproductive tissues or are targets for small RNAs bound to the argonaut CSR-1. Based on these results, we hypothesise an interplay between epigenetic markers and small RNA pathways in the germline, with transcription-based memory; this hypothesis warrants testing. From a technical perspective, further work is needed to evaluate whether the present ML-based approach will be applicable to other metazoans (including Drosophila melanogaster) for which comprehensive data set (i.e. genomic, transcriptomic, proteomic, variomic, epigenetic and phenomic) are available

    Deep Learning for Genomics: A Concise Overview

    Full text link
    Advancements in genomic research such as high-throughput sequencing techniques have driven modern genomic studies into "big data" disciplines. This data explosion is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in a variety of fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning since we are expecting from deep learning a superhuman intelligence that explores beyond our knowledge to interpret the genome. A powerful deep learning model should rely on insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with a proper deep architecture, and remark on practical considerations of developing modern deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research, as well as pointing out potential opportunities and obstacles for future genomics applications.Comment: Invited chapter for Springer Book: Handbook of Deep Learning Application

    Applying Machine Learning to Predict the Exportome of Bovine and Canine Babesia Species That Cause Babesiosis

    Full text link
    Babesia infection of red blood cells can cause a severe disease called babesiosis in susceptible hosts. Bovine babesiosis causes global economic loss to the beef and dairy cattle industries, and canine babesiosis is considered a clinically significant disease. Potential therapeutic targets against bovine and canine babesiosis include members of the exportome, i.e., those proteins exported from the parasite into the host red blood cell. We developed three machine learning-derived methods (two novel and one adapted) to predict for every known Babesia bovis, Babesia bigemina, and Babesia canis protein the probability of being an exportome member. Two well-studied apicomplexan-related species, Plasmodium falciparum and Toxoplasma gondii, with extensive experimental evidence on their exportome or excreted/secreted proteins were used as important benchmarks for the three methods. Based on 10-fold cross validation and multiple train–validation–test splits of training data, we expect that over 90% of the predicted probabilities accurately provide a secretory or non-secretory indicator. Only laboratory testing can verify that predicted high exportome membership probabilities are creditable exportome indicators. However, the presented methods at least provide those proteins most worthy of laboratory validation and will ultimately save time and money.</jats:p
    • …
    corecore