190 research outputs found

    Prediction of breast cancer proteins involved in immunotherapy, metastasis, and RNA-binding using molecular descriptors and artificial neural networks

    Get PDF
    Breast cancer (BC) is a heterogeneous disease where genomic alterations, protein expression deregulation, signaling pathway alterations, hormone disruption, ethnicity and environmental determinants are involved. Due to the complexity of BC, the prediction of proteins involved in this disease is a trending topic in drug design. This work is proposing accurate prediction classifier for BC proteins using six sets of protein sequence descriptors and 13 machine-learning methods. After using a univariate feature selection for the mix of five descriptor families, the best classifier was obtained using multilayer perceptron method (artificial neural network) and 300 features. The performance of the model is demonstrated by the area under the receiver operating characteristics (AUROC) of 0.980 +/- 0.0037, and accuracy of 0.936 +/- 0.0056 (3-fold cross-validation). Regarding the prediction of 4,504 cancer-associated proteins using this model, the best ranked cancer immunotherapy proteins related to BC were RPS27, SUPT4H1, CLPSL2, POLR2K, RPL38, AKT3, CDK3, RPS20, RASL11A and UBTD1; the best ranked metastasis driver proteins related to BC were S100A9, DDA1, TXN, PRNP, RPS27, S100A14, S100A7, MAPK1, AGR3 and NDUFA13; and the best ranked RNA-binding proteins related to BC were S100A9, TXN, RPS27L, RPS27, RPS27A, RPL38, MRPL54, PPAN, RPS20 and CSRP1. This powerful model predicts several BC-related proteins that should be deeply studied to find new biomarkers and better therapeutic targets. Scripts can be downloaded at https://github.com/muntisa/neural-networks-for-breast-cancer-proteins

    Unraveling druggable cancer-driving proteins and targeted drugs using artificial intelligence and multi-omics analyses

    Get PDF
    [Abstract]: The druggable proteome refers to proteins that can bind to small molecules with appropriate chemical affinity, inducing a favorable clinical response. Predicting druggable proteins through screening and in silico modeling is imperative for drug design. To contribute to this field, we developed an accurate predictive classifier for druggable cancer-driving proteins using amino acid composition descriptors of protein sequences and 13 machine learning linear and non-linear classifiers. The optimal classifier was achieved with the support vector machine method, utilizing 200 tri-amino acid composition descriptors. The high performance of the model is evident from an area under the receiver operating characteristics (AUROC) of 0.975 ± 0.003 and an accuracy of 0.929 ± 0.006 (threefold cross-validation). The machine learning prediction model was enhanced with multi-omics approaches, including the target-disease evidence score, the shortest pathways to cancer hallmarks, structure-based ligandability assessment, unfavorable prognostic protein analysis, and the oncogenic variome. Additionally, we performed a drug repurposing analysis to identify drugs with the highest affinity capable of targeting the best predicted proteins. As a result, we identified 79 key druggable cancer-driving proteins with the highest ligandability, and 23 of them demonstrated unfavorable prognostic significance across 16 TCGA PanCancer types: CDKN2A, BCL10, ACVR1, CASP8, JAG1, TSC1, NBN, PREX2, PPP2R1A, DNM2, VAV1, ASXL1, TPR, HRAS, BUB1B, ATG7, MARK3, SETD2, CCNE1, MUTYH, CDKN2C, RB1, and SMARCA4. Moreover, we prioritized 11 clinically relevant drugs targeting these proteins. This strategy effectively predicts and prioritizes biomarkers, therapeutic targets, and drugs for in-depth studies in clinical trials. Scripts are available at https://github.com/muntisa/machine-learning-for-druggable-proteins.This work was supported by Universidad de Las Américas, Ecuador; the grant ED431C 2022/46—Competitive Reference Groups. GRC—funded by the EU and Xunta de Galicia, Spain; and the Latin American Society of Pharmacogenomics and Personalized Medicine (SOLFAGEM).Xunta de Galicia; ED431C 2022/4

    Priorización de genes y búsqueda de dianas terapéuticas por medio de herramientas informáticas y técnicas de aprendizaje automatizado en cáncer de mama

    Get PDF
    Programa Oficial de Doutoramento en Tecnoloxías da Información e as Comunicacións. 5032V01Tese por compendio de publicacións[Resumen] El cáncer de mama (CM) es la principal causa de muerte relacionada a neoplasias en mujeres y es el tipo de cáncer más diagnosticado a nivel mundial. CM es una enfermedad heterogénea en donde están envueltos diversos factores como alteraciones genómicas, desregulación de la expresión de proteínas, alteración de cascadas genéticas, desregulación hormonal, determinantes ambientales y etnicidad. A pesar de los grandes avances tecnológicos y científicos en los últimos años, la comprensión de los procesos moleculares, la identificación de nuevas dianas terapéuticas y la predicción de proteínas envueltas inmunoterapia, metástasis, y unión al ARN es indispensable para el desarrollo de fármacos y la aplicación de la medicina de precisión en la práctica clínica. La tesis aquí propuesta plantea el desarrollo de una estrategia consenso altamente eficiente en el reconocimiento de genes y proteínas asociadas al CM; la validación oncológica de dichos genes y proteínas priorizadas mediante la estrategia OncoOmics que consistió en el análisis de bases de datos experimentales de alta relevancia a nivel mundial; la identificación de mutaciones oncogénicas y fármacos indispensables para el desarrollo y aplicación de la medicina de precisión; y la predicción de proteínas de CM asociadas a inmunoterapia, metástasis y unión al ARN mediante diversas herramientas informáticas y métodos de inteligencia artificial. Todos los resultados se publicaron en revistas internacionales de importante factor de impacto.Abstract] Breast cancer (BC) is the leading cause of cancer-related death among women and the most commonly diagnosed cancer worldwide. BC is a heterogeneous disease where genomic alterations, protein expression deregulation, signaling pathway alterations, hormone disruption, ethnicity and environmental determinants are involved. Despite the technological and scientific advances in recent years, an understanding of molecular processes, the identification of new therapeutic targets and the prediction of proteins involved in immunotherapy, metastasis, and RNA binding is essential for drug development and application of precision medicine in clinical practice. The current thesis proposes the development of a high efficient consensus strategy in the recognition of genes and proteins associated with BC; the oncological validation of these prioritized genes and proteins using the OncoOmics strategy, which consisted of the analysis of outstanding experimental databases; the identification of oncogenic mutations and essential drugs for the development and application of precision medicine; and the prediction of BC proteins associated with immunotherapy, metastasis and RNA-binding using bioinformatics tools and artificial intelligence methods. All results were published in international journals with a significant impact factor.[Resumo] O cancro de mama (CM) é a principal causa de morte relacionada con enfermidades malignas en mulleres e é o tipo de cancro máis diagnosticado a nivel mundial. A CM é unha enfermidade heteroxénea onde interveñen varios factores, como alteracións xenómicas, desregulación da expresión proteica, alteración de cascadas xenéticas, desregulación hormonal, determinantes ambientais e etnia. A pesar dos grandes avances tecnolóxicos e científicos dos últimos anos, a comprensión dos procesos moleculares, a identificación de novas dianas terapéuticas e a predición de proteínas implicadas na inmunoterapia, metástase e unión ao ARN é fundamental para o desenvolvemento de fármacos e aplicación da medicina de precisión na práctica clínica. Esta tese propón o desenvolvemento dunha estratexia de consenso altamente eficiente no recoñecemento de xenes e proteínas asociadas a CM; a validación oncolóxica destes xenes e proteínas prioritarias mediante a estratexia OncoOmics, que consistiu na análise de bases de datos experimentais altamente relevantes en todo o mundo; a identificación de mutacións oncogénicas e fármacos esenciais para o desenvolvemento e aplicación da medicina de precisión; e a predición de proteínas CM asociadas á inmunoterapia, metástase e unión ao ARN usando diversas ferramentas informáticas e métodos de intelixencia artificial. Todos os resultados publicáronse en revistas internacionais cun importante factor de impacto

    Priorización de genes y búsqueda de fármacos por medio de herramientas informáticas y técnicas de aprendizaje de máquinas en osteosarcoma

    Get PDF
    Programa Oficial de Doutoramento en Tecnoloxías da Información e as Comunicacións. 5032V01Tese por compendio de publicacións[Resumen] El osteosarcoma es el subtipo más común de cáncer de hueso primario y afecta principalmente a adolescentes. En los últimos años, varios estudios se han centrado en dilucidar los mecanismos moleculares de este sarcoma; sin embargo, su etiología molecular aún no se ha determinado con precisión. Por otro lado, su diagnóstico clínico es generalista y sus terapias no han cambiado en las últimas décadas. Aunque hoy en día las tasas de supervivencia a 5 años pueden alcanzar hasta el 60-70%, las complicaciones agudas y los efectos tardíos del tratamiento del osteosarcoma son dos de los factores limitantes de los tratamientos. Así, el objetivo de esta tesis doctoral es desarrollar una estrategia de priorización que permita la identificación de genes asociados con la patogenicidad del osteosarcoma y explicar de forma más completa la etiología de esta enfermedad. Por otro lado, se busca desarrollar algoritmos de predicción de fármacos basados en aprendizaje de máquinas que permitan proponer nuevos agentes terapéuticos para el tratamiento de esta enfermedad. Todos los resultados obtenidos se publicaron en revistas científicas internacionales con importante factor de impacto JCR.[Abstract] Osteosarcoma is the most common subtype of primary bone cancer, affecting mainly adolescents. In recent years, several studies have focused on elucidating the molecular mechanisms of this sarcoma; however, its molecular etiology has not yet been accurately determined. On the other hand, the clinical diagnosis is generalist and therapies have not changed in recent decades. Although nowadays 5-year survival rates can reach up to 60-70%, acute complications and late effects of osteosarcoma therapy are two of the limiting factors in treatments. Thus, the objective of this doctoral thesis is to develop a prioritization strategy that allows the identification of genes associated with the pathogenicity of osteosarcoma, and to explain more fully the etiology of this disease. On the other hand, it seeks to develop drug prediction algorithms based on machine learning techniques that allow proposing new therapeutic agents for the treatment of this disease. All the results obtained in this research were published in international scientific journals with an important JCR impact factor.[Resumo] O osteosarcoma é o subtipo máis común de cancro óseo primario, que afecta principalmente a adolescentes. Nos últimos anos, varios estudos centráronse en dilucidar os mecanismos moleculares deste sarcoma; con todo, a súa etioloxía molecular aínda non foi determinada con precisión. Por outra banda, o seu diagnóstico clínico é xeralista e as súas terapias non cambiaron nas últimas décadas. Aínda que hoxe as taxas de supervivencia a 5 anos poden chegar ata o 60- 70%, as complicacións agudas e os efectos tardíos do tratamento con osteosarcoma son dous dos factores limitantes dos tratamentos. Deste xeito, o obxectivo desta tese de doutoramento é desenvolver unha estratexia de priorización que permita a identificación de xenes asociados á patoxenicidade do osteosarcoma e explicar máis plenamente a etioloxía desta enfermidade. Por outra banda, buscamos desenvolver algoritmos de predición de medicamentos baseados na aprendizaxe automática que permitan propoñer novos axentes terapéuticos para o tratamento desta enfermidade. Todos os resultados obtidos publicáronse en revistas científicas internacionais cun importante factor de impacto JCR

    Integrating proteomics and explainable artificial intelligence: a comprehensive analysis of protein biomarkers for endometrial cancer diagnosis and prognosis

    Get PDF
    Endometrial cancer, which is the most common gynaecological cancer in women after breast, colorectal and lung cancer, can be diagnosed at an early stage. The first aim of this study is to classify age, tumor grade, myometrial invasion and tumor size, which play an important role in the diagnosis and prognosis of endometrial cancer, with machine learning methods combined with explainable artificial intelligence. 20 endometrial cancer patients proteomic data obtained from tumor biopsies taken from different regions of EC tissue were used. The data obtained were then classified according to age, tumor size, tumor grade and myometrial invasion. Then, by using three different machine learning methods, explainable artificial intelligence was applied to the model that best classifies these groups and possible protein biomarkers that can be used in endometrial prognosis were evaluated. The optimal model for age classification was XGBoost with AUC (98.8%), for tumor grade classification was XGBoost with AUC (98.6%), for myometrial invasion classification was LightGBM with AUC (95.1%), and finally for tumor size classification was XGBoost with AUC (94.8%). By combining the optimal models and the SHAP approach, possible protein biomarkers and their expressions were obtained for classification. Finally, EWRS1 protein was found to be common in three groups (age, myometrial invasion, tumor size). This article's findings indicate that models have been developed that can accurately classify factors including age, tumor grade, and myometrial invasion all of which are critical for determining the prognosis of endometrial cancer as well as potential protein biomarkers associated with these factors. Furthermore, we were able to provide an analysis of how the quantities of the proteins suggested as biomarkers varied throughout the classes by combining the SHAP values with these ideal models

    OncoOmics approaches to reveal essential genes in breast cancer: a panoramic view from pathogenesis to precision medicine

    Get PDF
    [Abstract] Breast cancer (BC) is the leading cause of cancer-related death among women and the most commonly diagnosed cancer worldwide. Although in recent years large-scale efforts have focused on identifying new therapeutic targets, a better understanding of BC molecular processes is required. Here we focused on elucidating the molecular hallmarks of BC heterogeneity and the oncogenic mutations involved in precision medicine that remains poorly defined. To fill this gap, we established an OncoOmics strategy that consists of analyzing genomic alterations, signaling pathways, protein-protein interactome network, protein expression, dependency maps in cell lines and patient-derived xenografts in 230 previously prioritized genes to reveal essential genes in breast cancer. As results, the OncoOmics BC essential genes were rationally filtered to 140. mRNA up-regulation was the most prevalent genomic alteration. The most altered signaling pathways were associated with basal-like and Her2-enriched molecular subtypes. RAC1, AKT1, CCND1, PIK3CA, ERBB2, CDH1, MAPK14, TP53, MAPK1, SRC, RAC3, BCL2, CTNNB1, EGFR, CDK2, GRB2, MED1 and GATA3 were essential genes in at least three OncoOmics approaches. Drugs with the highest amount of clinical trials in phases 3 and 4 were paclitaxel, docetaxel, trastuzumab, tamoxifen and doxorubicin. Lastly, we collected ~3,500 somatic and germline oncogenic variants associated with 50 essential genes, which in turn had therapeutic connectivity with 73 drugs. In conclusion, the OncoOmics strategy reveals essential genes capable of accelerating the development of targeted therapies for precision oncology.Instituto de Salud Carlos III; PI17/0182

    Artificial intelligence in cancer target identification and drug discovery

    Get PDF
    Artificial intelligence is an advanced method to identify novel anticancer targets and discover novel drugs from biology networks because the networks can effectively preserve and quantify the interaction between components of cell systems underlying human diseases such as cancer. Here, we review and discuss how to employ artificial intelligence approaches to identify novel anticancer targets and discover drugs. First, we describe the scope of artificial intelligence biology analysis for novel anticancer target investigations. Second, we review and discuss the basic principles and theory of commonly used network-based and machine learning-based artificial intelligence algorithms. Finally, we showcase the applications of artificial intelligence approaches in cancer target identification and drug discovery. Taken together, the artificial intelligence models have provided us with a quantitative framework to study the relationship between network characteristics and cancer, thereby leading to the identification of potential anticancer targets and the discovery of novel drug candidates
    corecore