190 research outputs found
Prediction of breast cancer proteins involved in immunotherapy, metastasis, and RNA-binding using molecular descriptors and artificial neural networks
Breast cancer (BC) is a heterogeneous disease where genomic alterations, protein expression deregulation, signaling pathway alterations, hormone disruption, ethnicity and environmental determinants are involved. Due to the complexity of BC, the prediction of proteins involved in this disease is a trending topic in drug design. This work is proposing accurate prediction classifier for BC proteins using six sets of protein sequence descriptors and 13 machine-learning methods. After using a univariate feature selection for the mix of five descriptor families, the best classifier was obtained using multilayer perceptron method (artificial neural network) and 300 features. The performance of the model is demonstrated by the area under the receiver operating characteristics (AUROC) of 0.980 +/- 0.0037, and accuracy of 0.936 +/- 0.0056 (3-fold cross-validation). Regarding the prediction of 4,504 cancer-associated proteins using this model, the best ranked cancer immunotherapy proteins related to BC were RPS27, SUPT4H1, CLPSL2, POLR2K, RPL38, AKT3, CDK3, RPS20, RASL11A and UBTD1; the best ranked metastasis driver proteins related to BC were S100A9, DDA1, TXN, PRNP, RPS27, S100A14, S100A7, MAPK1, AGR3 and NDUFA13; and the best ranked RNA-binding proteins related to BC were S100A9, TXN, RPS27L, RPS27, RPS27A, RPL38, MRPL54, PPAN, RPS20 and CSRP1. This powerful model predicts several BC-related proteins that should be deeply studied to find new biomarkers and better therapeutic targets. Scripts can be downloaded at https://github.com/muntisa/neural-networks-for-breast-cancer-proteins
Unraveling druggable cancer-driving proteins and targeted drugs using artificial intelligence and multi-omics analyses
[Abstract]: The druggable proteome refers to proteins that can bind to small molecules with appropriate chemical affinity, inducing a favorable clinical response. Predicting druggable proteins through screening and in silico modeling is imperative for drug design. To contribute to this field, we developed an accurate predictive classifier for druggable cancer-driving proteins using amino acid composition descriptors of protein sequences and 13 machine learning linear and non-linear classifiers. The optimal classifier was achieved with the support vector machine method, utilizing 200 tri-amino acid composition descriptors. The high performance of the model is evident from an area under the receiver operating characteristics (AUROC) of 0.975 ± 0.003 and an accuracy of 0.929 ± 0.006 (threefold cross-validation). The machine learning prediction model was enhanced with multi-omics approaches, including the target-disease evidence score, the shortest pathways to cancer hallmarks, structure-based ligandability assessment, unfavorable prognostic protein analysis, and the oncogenic variome. Additionally, we performed a drug repurposing analysis to identify drugs with the highest affinity capable of targeting the best predicted proteins. As a result, we identified 79 key druggable cancer-driving proteins with the highest ligandability, and 23 of them demonstrated unfavorable prognostic significance across 16 TCGA PanCancer types: CDKN2A, BCL10, ACVR1, CASP8, JAG1, TSC1, NBN, PREX2, PPP2R1A, DNM2, VAV1, ASXL1, TPR, HRAS, BUB1B, ATG7, MARK3, SETD2, CCNE1, MUTYH, CDKN2C, RB1, and SMARCA4. Moreover, we prioritized 11 clinically relevant drugs targeting these proteins. This strategy effectively predicts and prioritizes biomarkers, therapeutic targets, and drugs for in-depth studies in clinical trials. Scripts are available at https://github.com/muntisa/machine-learning-for-druggable-proteins.This work was supported by Universidad de Las Américas, Ecuador; the grant ED431C 2022/46—Competitive Reference Groups. GRC—funded by the EU and Xunta de Galicia, Spain; and the Latin American Society of Pharmacogenomics and Personalized Medicine (SOLFAGEM).Xunta de Galicia; ED431C 2022/4
Priorización de genes y búsqueda de dianas terapéuticas por medio de herramientas informáticas y técnicas de aprendizaje automatizado en cáncer de mama
Programa Oficial de Doutoramento en Tecnoloxías da Información e as Comunicacións. 5032V01Tese por compendio de publicacións[Resumen]
El cáncer de mama (CM) es la principal causa de muerte relacionada a neoplasias en
mujeres y es el tipo de cáncer más diagnosticado a nivel mundial. CM es una enfermedad
heterogénea en donde están envueltos diversos factores como alteraciones genómicas,
desregulación de la expresión de proteínas, alteración de cascadas genéticas, desregulación
hormonal, determinantes ambientales y etnicidad. A pesar de los grandes avances
tecnológicos y científicos en los últimos años, la comprensión de los procesos moleculares, la
identificación de nuevas dianas terapéuticas y la predicción de proteínas envueltas
inmunoterapia, metástasis, y unión al ARN es indispensable para el desarrollo de fármacos y
la aplicación de la medicina de precisión en la práctica clínica. La tesis aquí propuesta plantea
el desarrollo de una estrategia consenso altamente eficiente en el reconocimiento de genes y
proteínas asociadas al CM; la validación oncológica de dichos genes y proteínas priorizadas
mediante la estrategia OncoOmics que consistió en el análisis de bases de datos
experimentales de alta relevancia a nivel mundial; la identificación de mutaciones
oncogénicas y fármacos indispensables para el desarrollo y aplicación de la medicina de
precisión; y la predicción de proteínas de CM asociadas a inmunoterapia, metástasis y unión
al ARN mediante diversas herramientas informáticas y métodos de inteligencia artificial.
Todos los resultados se publicaron en revistas internacionales de importante factor de
impacto.Abstract]
Breast cancer (BC) is the leading cause of cancer-related death among women and the
most commonly diagnosed cancer worldwide. BC is a heterogeneous disease where genomic
alterations, protein expression deregulation, signaling pathway alterations, hormone
disruption, ethnicity and environmental determinants are involved. Despite the technological
and scientific advances in recent years, an understanding of molecular processes, the
identification of new therapeutic targets and the prediction of proteins involved in
immunotherapy, metastasis, and RNA binding is essential for drug development and
application of precision medicine in clinical practice. The current thesis proposes the
development of a high efficient consensus strategy in the recognition of genes and proteins
associated with BC; the oncological validation of these prioritized genes and proteins using
the OncoOmics strategy, which consisted of the analysis of outstanding experimental
databases; the identification of oncogenic mutations and essential drugs for the development
and application of precision medicine; and the prediction of BC proteins associated with
immunotherapy, metastasis and RNA-binding using bioinformatics tools and artificial
intelligence methods. All results were published in international journals with a significant
impact factor.[Resumo]
O cancro de mama (CM) é a principal causa de morte relacionada con enfermidades
malignas en mulleres e é o tipo de cancro máis diagnosticado a nivel mundial. A CM é unha
enfermidade heteroxénea onde interveñen varios factores, como alteracións xenómicas,
desregulación da expresión proteica, alteración de cascadas xenéticas, desregulación
hormonal, determinantes ambientais e etnia. A pesar dos grandes avances tecnolóxicos e
científicos dos últimos anos, a comprensión dos procesos moleculares, a identificación de
novas dianas terapéuticas e a predición de proteínas implicadas na inmunoterapia, metástase e
unión ao ARN é fundamental para o desenvolvemento de fármacos e aplicación da medicina
de precisión na práctica clínica. Esta tese propón o desenvolvemento dunha estratexia de
consenso altamente eficiente no recoñecemento de xenes e proteínas asociadas a CM; a
validación oncolóxica destes xenes e proteínas prioritarias mediante a estratexia OncoOmics,
que consistiu na análise de bases de datos experimentais altamente relevantes en todo o
mundo; a identificación de mutacións oncogénicas e fármacos esenciais para o
desenvolvemento e aplicación da medicina de precisión; e a predición de proteínas CM
asociadas á inmunoterapia, metástase e unión ao ARN usando diversas ferramentas
informáticas e métodos de intelixencia artificial. Todos os resultados publicáronse en revistas
internacionais cun importante factor de impacto
Priorización de genes y búsqueda de fármacos por medio de herramientas informáticas y técnicas de aprendizaje de máquinas en osteosarcoma
Programa Oficial de Doutoramento en Tecnoloxías da Información e as Comunicacións. 5032V01Tese por compendio de publicacións[Resumen]
El osteosarcoma es el subtipo más común de cáncer de hueso primario y afecta principalmente
a adolescentes. En los últimos años, varios estudios se han centrado en dilucidar los mecanismos
moleculares de este sarcoma; sin embargo, su etiología molecular aún no se ha determinado
con precisión. Por otro lado, su diagnóstico clínico es generalista y sus terapias no han cambiado
en las últimas décadas. Aunque hoy en día las tasas de supervivencia a 5 años pueden alcanzar
hasta el 60-70%, las complicaciones agudas y los efectos tardíos del tratamiento del
osteosarcoma son dos de los factores limitantes de los tratamientos. Así, el objetivo de esta tesis
doctoral es desarrollar una estrategia de priorización que permita la identificación de genes
asociados con la patogenicidad del osteosarcoma y explicar de forma más completa la etiología
de esta enfermedad. Por otro lado, se busca desarrollar algoritmos de predicción de fármacos
basados en aprendizaje de máquinas que permitan proponer nuevos agentes terapéuticos para
el tratamiento de esta enfermedad. Todos los resultados obtenidos se publicaron en revistas
científicas internacionales con importante factor de impacto JCR.[Abstract]
Osteosarcoma is the most common subtype of primary bone cancer, affecting mainly
adolescents. In recent years, several studies have focused on elucidating the molecular
mechanisms of this sarcoma; however, its molecular etiology has not yet been accurately
determined. On the other hand, the clinical diagnosis is generalist and therapies have not
changed in recent decades. Although nowadays 5-year survival rates can reach up to 60-70%,
acute complications and late effects of osteosarcoma therapy are two of the limiting factors in
treatments. Thus, the objective of this doctoral thesis is to develop a prioritization strategy that
allows the identification of genes associated with the pathogenicity of osteosarcoma, and to
explain more fully the etiology of this disease. On the other hand, it seeks to develop drug
prediction algorithms based on machine learning techniques that allow proposing new
therapeutic agents for the treatment of this disease. All the results obtained in this research were
published in international scientific journals with an important JCR impact factor.[Resumo]
O osteosarcoma é o subtipo máis común de cancro óseo primario, que afecta principalmente a
adolescentes. Nos últimos anos, varios estudos centráronse en dilucidar os mecanismos
moleculares deste sarcoma; con todo, a súa etioloxía molecular aínda non foi determinada con
precisión. Por outra banda, o seu diagnóstico clínico é xeralista e as súas terapias non cambiaron
nas últimas décadas. Aínda que hoxe as taxas de supervivencia a 5 anos poden chegar ata o 60-
70%, as complicacións agudas e os efectos tardíos do tratamento con osteosarcoma son dous
dos factores limitantes dos tratamentos. Deste xeito, o obxectivo desta tese de doutoramento é
desenvolver unha estratexia de priorización que permita a identificación de xenes asociados á
patoxenicidade do osteosarcoma e explicar máis plenamente a etioloxía desta enfermidade. Por
outra banda, buscamos desenvolver algoritmos de predición de medicamentos baseados na
aprendizaxe automática que permitan propoñer novos axentes terapéuticos para o tratamento
desta enfermidade. Todos os resultados obtidos publicáronse en revistas científicas
internacionais cun importante factor de impacto JCR
Integrating proteomics and explainable artificial intelligence: a comprehensive analysis of protein biomarkers for endometrial cancer diagnosis and prognosis
Endometrial cancer, which is the most common gynaecological cancer in women after breast, colorectal and lung cancer, can be diagnosed at an early stage. The first aim of this study is to classify age, tumor grade, myometrial invasion and tumor size, which play an important role in the diagnosis and prognosis of endometrial cancer, with machine learning methods combined with explainable artificial intelligence. 20 endometrial cancer patients proteomic data obtained from tumor biopsies taken from different regions of EC tissue were used. The data obtained were then classified according to age, tumor size, tumor grade and myometrial invasion. Then, by using three different machine learning methods, explainable artificial intelligence was applied to the model that best classifies these groups and possible protein biomarkers that can be used in endometrial prognosis were evaluated. The optimal model for age classification was XGBoost with AUC (98.8%), for tumor grade classification was XGBoost with AUC (98.6%), for myometrial invasion classification was LightGBM with AUC (95.1%), and finally for tumor size classification was XGBoost with AUC (94.8%). By combining the optimal models and the SHAP approach, possible protein biomarkers and their expressions were obtained for classification. Finally, EWRS1 protein was found to be common in three groups (age, myometrial invasion, tumor size). This article's findings indicate that models have been developed that can accurately classify factors including age, tumor grade, and myometrial invasion all of which are critical for determining the prognosis of endometrial cancer as well as potential protein biomarkers associated with these factors. Furthermore, we were able to provide an analysis of how the quantities of the proteins suggested as biomarkers varied throughout the classes by combining the SHAP values with these ideal models
OncoOmics approaches to reveal essential genes in breast cancer: a panoramic view from pathogenesis to precision medicine
[Abstract]
Breast cancer (BC) is the leading cause of cancer-related death among women and the most commonly diagnosed cancer worldwide. Although in recent years large-scale efforts have focused on identifying new therapeutic targets, a better understanding of BC molecular processes is required. Here we focused on elucidating the molecular hallmarks of BC heterogeneity and the oncogenic mutations involved in precision medicine that remains poorly defined. To fill this gap, we established an OncoOmics strategy that consists of analyzing genomic alterations, signaling pathways, protein-protein interactome network, protein expression, dependency maps in cell lines and patient-derived xenografts in 230 previously prioritized genes to reveal essential genes in breast cancer. As results, the OncoOmics BC essential genes were rationally filtered to 140. mRNA up-regulation was the most prevalent genomic alteration. The most altered signaling pathways were associated with basal-like and Her2-enriched molecular subtypes. RAC1, AKT1, CCND1, PIK3CA, ERBB2, CDH1, MAPK14, TP53, MAPK1, SRC, RAC3, BCL2, CTNNB1, EGFR, CDK2, GRB2, MED1 and GATA3 were essential genes in at least three OncoOmics approaches. Drugs with the highest amount of clinical trials in phases 3 and 4 were paclitaxel, docetaxel, trastuzumab, tamoxifen and doxorubicin. Lastly, we collected ~3,500 somatic and germline oncogenic variants associated with 50 essential genes, which in turn had therapeutic connectivity with 73 drugs. In conclusion, the OncoOmics strategy reveals essential genes capable of accelerating the development of targeted therapies for precision oncology.Instituto de Salud Carlos III; PI17/0182
Artificial intelligence in cancer target identification and drug discovery
Artificial intelligence is an advanced method to identify novel anticancer targets and discover novel drugs from biology networks because the networks can effectively preserve and quantify the interaction between components of cell systems underlying human diseases such as cancer. Here, we review and discuss how to employ artificial intelligence approaches to identify novel anticancer targets and discover drugs. First, we describe the scope of artificial intelligence biology analysis for novel anticancer target investigations. Second, we review and discuss the basic principles and theory of commonly used network-based and machine learning-based artificial intelligence algorithms. Finally, we showcase the applications of artificial intelligence approaches in cancer target identification and drug discovery. Taken together, the artificial intelligence models have provided us with a quantitative framework to study the relationship between network characteristics and cancer, thereby leading to the identification of potential anticancer targets and the discovery of novel drug candidates
- …