9 research outputs found

    Desenvolvimento de modelos de machine learning baseados em QSAR-3D para predição de novos candidatos a fármacos inibidores da proteina CCR-5, para o tratamento de HIV/AIDS

    Get PDF
    Orientador: Prof. Anderson AraMonografia (especialização) - Universidade Federal do Paraná, Setor de Ciências Exatas, Curso de Especialização em Data Science e Big DataInclui referênciasResumo: Introdução. C-C receptor quimiocina tipo 5 (CCR-5), é uma proteína encontrada na superfície das células de defesa (linfócitos e macrófagos). A CCR-5 é a estrutura à qual o vírus HIV (vírus da imunodeficiência humana) se liga para invadir a célula hospedeira causando o desenvolvimento da AIDS (síndrome da imunodeficiência adquirida). Neste estudo, foram desenvolvidos modelos de machine learning (ML) baseados em relação estrutura atividade quantitativa (QSAR) para predizer compostos com bioatividade inibitória contra a proteína CCR-5 para o tratamento de HIV.Material e métodos. Umconjunto de dados experimentais não reduntantes de 2929 compostos com valores de bioatividade inibitória (expressa em IC50) contra a proteína CCR-5 foram colectados na base de dados CHEMBL e empregados para desenvolver modelos de ML baseados em QSAR, visando predizer a sua bioatividade. Esses 2929 compostos foram descritos usando Pubchem fungreprint e 32 diferentes algorítmos de ML foramtreinados e testados. A avaliação do desempenho dos modelos foi feita utilizando as métricas R2,MSE, RMSE, MAE e tempo de treinamento. Cada umdos cinco melhores modelos de ML foi aplicado o método SHAP value visando identificar as features (descritores) mais importantes na predição da bioatividade dos compostos contra HIV. Resultados. Os cinco melhores modelos de ML que tiveram melhor desempenho na predição da bioatividade inibitória contra a proteína CCR-5 para o tratamento de HIV foram: Random Forest (RF), Histogram Gradient Boosting (HGBM), LGBM, Bagging e KNN, cujos valores de capaciadde preditiva (R2) variaram entre 82-87%. Conclusão. Neste estudo, foramdesenvolvido cinco modelos de ML (RF, HGBM, LGBM, Bagging e KNN) para predizer a bioatividade inibitória dos compostos contra a proteína CCR-5 para a descoberta de novos fármacos contra HIV. Esses modelos deML podem ser usados como um filtro de seleção de novas moléculas, que podem ser testadas nos experimentos in vitro e in vivo que visam a descoberta de novos fármacos inibidores da proteína CCR-5 para o tratamento potencial de HIVAbstract: Introduction. C-C chemokine receptor type 5 (CCR-5) is a protein found on the surface of defense cells (lymphocytes and macrophages). CCR-5 is the structure to which the HIV virus (human immunodeficiency virus) binds to invade the host cell causing the development of AIDS (acquired immunodeficiency syndrome). In this study, machine learning (ML) models based on quantitative structure activity relationship (QSAR) were developed to predict compounds with inhibitory bioactivity against the CCR-5 protein for the treatment of HIV. Material e métodos. A non-redundant experimental dataset of 2929 compounds with inhibitory bioactivity values (expressed in IC50) against the CCR-5 protein were collected from the CHEMBL database and used to develop QSAR-based ML models to predict their bioactivity. These 2929 compounds were described using PubChem fingerprint and 32 different ML algorithms were trained and tested. The evaluation of the performance of theML models was made using the metrics R2, MSE, RMSE, MAE and training time. Each of the five best ML models was applied the SHAP values method to identify the most important features (descriptors) in predicting the bioactivity of compounds against HIV. Results: The five best ML models that had the best performance in predicting the inhibitory bioactivity against the CCR-5 protein for the treatment of HIV were: Random Forest (RF), Histogram based Gradient Boosting (HGBM), LGBM, Bagging and KNN, whose predictive capacity values (R2) ranged between 82-87Results. The five best ML models that had the best performance in predicting the inhibitory bioactivity against the CCR-5 protein for the treatment of HIV were: Random Forest (RF), Histogram based Gradient Boosting (HGBM), LGBM, Bagging and KNN, whose predictive capacity values (R2) ranged between 82-87%. Conclusion. In this study, five ML models (RF, HGBM, LGBM, Bagging and KNN) were developed to predict the inhibitory bioactivity of compounds against the CCR-5 protein for the discovery of new drugs against HIV. These ML models can be used as a selection filter for new molecules, which can be tested in in vitro and in vivo experiments aimed at discovering new CCR-5 protein inhibitor drugs for the potential treatment of HIV

    Fatores prognósticos de eventos recorrentes locais, regionais, metastáticos e sobrevida global em uma coorte populacional de pacientes com câncer cervical

    Get PDF
    Orientador: Prof. Dr.Roberto PontaroloCoorientadora: Profa Dra. Fernanda Tonin StumpfDissertação (mestrado) - Universidade Federal do Paraná, Setor de Ciências da Saúde, Programa de Pós-Graduação em Ciências Farmacêuticas. Defesa : Curitiba, 26/02/2020Inclui referências: p. 118-127Resumo: A cada ano, são registrados 570.000 novos casos de câncer de colo de útero no mundo, que representam a principal causa de mortalidade em mulheres com câncer. Diante disso, vários estudos epidemiológicos têm sido conduzidos para identificar potenciais fatores de risco para desenvolvimento da doença. Desta forma, o presente trabalho objetivou avaliar a Sobrevida Específica de Doença (SED) e fatores prognósticos associados ao câncer de colo de útero, bem como Sobrevida Livre de Doença (SLD) e fatores relacionados às recidivas local, regional e distal. Foi realizado um estudo observacional de coorte retrospectivo a partir de uma base de dados de registro hospitalares de câncer de colo de útero da Fundação Oncocentro (São Paulo, Brasil) no período de 2000 - 2018. Para a análise de SED e SLD, foram coletadas informações como caraterísticas sociodemográficas, tempo de admissão até o diagnóstico, tempo de espera para o início do tratamento após o diagnóstico, categoria de atendimento à admissão e tipos de tratamentos utilizados. As análises de sobrevida tanto no estudo da SED e SLD, foram realizadas utilizando os métodos Kaplan-Meier, teste log-rank, Breslow, Tarone-Ware e regressão de Cox tempodependente. Medidas de desfecho foram reportadas como hazard ratio (HR), com os seus respectivos intervalos de confiança de 95% (IC 95). A coorte contou com 36.038 pacientes. A SED foi de 12,5 anos (IC 95 12,42-12,67). O modelo de regressão de Cox tempodependente indicou que os fatores prognósticos estatisticamente associados à morte foram: ser analfabeto/ ter ensino fundamental incompleto ou ter ensino fundamental completo; idade maior que 20 anos, tempo de admissão até o diagnóstico maior que 60 dias, tempo de espera para o início de tratamento maior que 60 dias e adenocarcinoma no estágio IV. A cirurgia isolada foi associada estatisticamente a ganhos de SED e SLD de 17,20 (IC 95 17,06-17,36) anos e 18,56 (IC 95 18,45-18,67) anos, respectivamente. Esta também foi associada à redução em recidivas local, regional e distal (sobrevida livre de recidiva de 18,92,IC 95 18,83-19,00, anos). O nível de escolaridade e estágio clínico foram dois importantes fatores prognósticos associados à recidiva local, regional e distal. A radioterapia, sozinha ou associada à cirurgia, foi considerada fator protetor em todos tipos de recidivas. O acesso aos cuidados primários de saúde, incluindo a disseminação de informações sobre estratégias de prevenção do câncer do colo do útero, o diagnóstico e tratamento precoces podem melhorar as taxas de sobrevivência entre as mulheres com câncer de colo de útero. Palavras-chave: câncer de colo de útero. Sobrevida Específica de Doença. Sobrevida Livre de Doença. Kaplan-Meier. Regressão de Cox tempo-dependente.Abstract: Every year there are around 570,000 new cases of cervical cancer worldwide, which represents the leading cause of death in women with cancer. In this context, several epidemiological studies have been conducted to identify potential risk fators associated with cervical cancer. The aim of this present study was to evalute the Disease-Specific Survival (DSS) and cervial cancer prognostic fators, as well Disease-Free Survival (DFS) and fators associated with local, regional and distal recurrence. A retrospective cohort study was performed using data from hospital registry database of the Oncocentro Foundation (São Paulo, Brazil) (2000-2018). For the DSS analysis, information on sociodemographic characteristics, time from admission to diagnosis, lag-time to start treatment after diagnosis, category of admission care and types of treatments used were collected. For the DFS analysis we collected all information used in the DSS, except the time from admission to diagnosis and lag-time to start treatment. Survival analyzes were performed using the Kaplan-Meier, log-rank, Breslow, Tarone-Ware, time-dependent Cox regression. Effect measures were reported as hazard ratio (HR) with respective 95% confidence intervals (CI). The cohort included 38,038 patients. The DSS was 12.5 years (CI 12.42-12.67). The time-dependent Cox regression model indicated that the prognostic fators statistically associated with death were: being illiterate/incomplete elementary school or having complete elementary school; age over 20 years, admission to diagnosis time greater than 60 days, lag-time to start treatment greater than 60 days, and stage IV adenocarcinoma. Surgery alone was statistically associated with gains in DSS and DFS with 17.20 (17.06-17.36) years and 18.56 (CI 18.45-18.67) years, respectively. Surgery was also associated with reduction in recurrences (survival of 18.92 (CI 18.83-19.00) years). Education level and clinical status were the two important prognostic fators associated with local, regional and distant recurrences. Radiotherapy, alone or associated with surgery, was the protective treatment of all types of recurrences. The access to primary care, including information of cervical cancer prevention, early diagnosis and treatment may improve survival rates in women with this cancer. Keywords: Cervical Cancer. Disease-Specific Survival. Disease Free Survival. Kaplan- Meier. Time-dependent covariate. Cox Regression

    Machine learning-based virtual screening, molecular docking, drug-likeness, pharmacokinetics and toxicity analyses to identify new natural inhibitors of the glycoprotein spike (S1) od SARS-CoV-2

    Get PDF
    To identify natural bioactive compounds (NBCs) as potential inhibitors of the spike (S1) by means of in silico assays. NBCs with previously proven biological in vitro activity were obtained from the ZINC database and analyzed through virtual screening and molecular docking to identify those with higher affinity to the spike protein. Eight machine learning models were used to validate the results: Principal Component Analysis (PCA), Artificial Neural Network (ANN), Support Vector Machine (SVM), k-Nearest Neighbors (KNN), Partial Least Squares-Discriminant Analysis (PLS-DA), Gradient Boosted Tree Discriminant Analysis (XGBoostDA), Soft Independent Modelling of Class Analogies (SIMCA) and Logistic Regression Discriminate Analysis (LREG). Selected NBCs were submitted to drug-likeness prediction using Lipinski’s and Veber’s rule of five. A prediction of pharmacokinetic parameters and toxicity was also performed (ADMET). Antivirals currently used for COVID-19 (remdesivir and molnupiravir) were used as a comparator. A total of 170,906 compounds were analyzed. Of these, 34 showed a greater affinity with the S1 (affinity energy < -7 kcal mol-1). Most of these compounds belonged to the class of coumarins (benzopyrones), presenting a benzene ring fused to a lactone (group of heterosides). The PLS-DA model was able to reproduce the results of the virtual screening and molecular docking (accuracy of 97.0%). Of the 34 compounds, only NBC5 (feselol), NBC14, NBC15, and NBC27 had better results in ADMET predictions. These had a similar binding affinity to S1 when compared to remdesivir and molnupirvir. Feselol and three other NBCs were the most promising candidates for treating COVID-19. In vitro and in vivo studies are needed to confirm these findings.info:eu-repo/semantics/publishedVersio

    Fatores prognósticos de eventos recorrentes locais, regionais, metastáticos e sobrevida global em uma coorte populacional de pacientes com câncer cervical

    No full text
    Orientador: Prof. Dr.Roberto PontaroloCoorientadora: Profa Dra. Fernanda Tonin StumpfDissertação (mestrado) - Universidade Federal do Paraná, Setor de Ciências da Saúde, Programa de Pós-Graduação em Ciências Farmacêuticas. Defesa : Curitiba, 26/02/2020Inclui referências: p. 118-127Resumo: A cada ano, são registrados 570.000 novos casos de câncer de colo de útero no mundo, que representam a principal causa de mortalidade em mulheres com câncer. Diante disso, vários estudos epidemiológicos têm sido conduzidos para identificar potenciais fatores de risco para desenvolvimento da doença. Desta forma, o presente trabalho objetivou avaliar a Sobrevida Específica de Doença (SED) e fatores prognósticos associados ao câncer de colo de útero, bem como Sobrevida Livre de Doença (SLD) e fatores relacionados às recidivas local, regional e distal. Foi realizado um estudo observacional de coorte retrospectivo a partir de uma base de dados de registro hospitalares de câncer de colo de útero da Fundação Oncocentro (São Paulo, Brasil) no período de 2000 - 2018. Para a análise de SED e SLD, foram coletadas informações como caraterísticas sociodemográficas, tempo de admissão até o diagnóstico, tempo de espera para o início do tratamento após o diagnóstico, categoria de atendimento à admissão e tipos de tratamentos utilizados. As análises de sobrevida tanto no estudo da SED e SLD, foram realizadas utilizando os métodos Kaplan-Meier, teste log-rank, Breslow, Tarone-Ware e regressão de Cox tempodependente. Medidas de desfecho foram reportadas como hazard ratio (HR), com os seus respectivos intervalos de confiança de 95% (IC 95). A coorte contou com 36.038 pacientes. A SED foi de 12,5 anos (IC 95 12,42-12,67). O modelo de regressão de Cox tempodependente indicou que os fatores prognósticos estatisticamente associados à morte foram: ser analfabeto/ ter ensino fundamental incompleto ou ter ensino fundamental completo; idade maior que 20 anos, tempo de admissão até o diagnóstico maior que 60 dias, tempo de espera para o início de tratamento maior que 60 dias e adenocarcinoma no estágio IV. A cirurgia isolada foi associada estatisticamente a ganhos de SED e SLD de 17,20 (IC 95 17,06-17,36) anos e 18,56 (IC 95 18,45-18,67) anos, respectivamente. Esta também foi associada à redução em recidivas local, regional e distal (sobrevida livre de recidiva de 18,92,IC 95 18,83-19,00, anos). O nível de escolaridade e estágio clínico foram dois importantes fatores prognósticos associados à recidiva local, regional e distal. A radioterapia, sozinha ou associada à cirurgia, foi considerada fator protetor em todos tipos de recidivas. O acesso aos cuidados primários de saúde, incluindo a disseminação de informações sobre estratégias de prevenção do câncer do colo do útero, o diagnóstico e tratamento precoces podem melhorar as taxas de sobrevivência entre as mulheres com câncer de colo de útero. Palavras-chave: câncer de colo de útero. Sobrevida Específica de Doença. Sobrevida Livre de Doença. Kaplan-Meier. Regressão de Cox tempo-dependente.Abstract: Every year there are around 570,000 new cases of cervical cancer worldwide, which represents the leading cause of death in women with cancer. In this context, several epidemiological studies have been conducted to identify potential risk fators associated with cervical cancer. The aim of this present study was to evalute the Disease-Specific Survival (DSS) and cervial cancer prognostic fators, as well Disease-Free Survival (DFS) and fators associated with local, regional and distal recurrence. A retrospective cohort study was performed using data from hospital registry database of the Oncocentro Foundation (São Paulo, Brazil) (2000-2018). For the DSS analysis, information on sociodemographic characteristics, time from admission to diagnosis, lag-time to start treatment after diagnosis, category of admission care and types of treatments used were collected. For the DFS analysis we collected all information used in the DSS, except the time from admission to diagnosis and lag-time to start treatment. Survival analyzes were performed using the Kaplan-Meier, log-rank, Breslow, Tarone-Ware, time-dependent Cox regression. Effect measures were reported as hazard ratio (HR) with respective 95% confidence intervals (CI). The cohort included 38,038 patients. The DSS was 12.5 years (CI 12.42-12.67). The time-dependent Cox regression model indicated that the prognostic fators statistically associated with death were: being illiterate/incomplete elementary school or having complete elementary school; age over 20 years, admission to diagnosis time greater than 60 days, lag-time to start treatment greater than 60 days, and stage IV adenocarcinoma. Surgery alone was statistically associated with gains in DSS and DFS with 17.20 (17.06-17.36) years and 18.56 (CI 18.45-18.67) years, respectively. Surgery was also associated with reduction in recurrences (survival of 18.92 (CI 18.83-19.00) years). Education level and clinical status were the two important prognostic fators associated with local, regional and distant recurrences. Radiotherapy, alone or associated with surgery, was the protective treatment of all types of recurrences. The access to primary care, including information of cervical cancer prevention, early diagnosis and treatment may improve survival rates in women with this cancer. Keywords: Cervical Cancer. Disease-Specific Survival. Disease Free Survival. Kaplan- Meier. Time-dependent covariate. Cox Regression

    Diagnosis and prognosis of COVID-19 employing analysis of patients' plasma and serum via LC-MS and machine learning

    No full text
    Objective: To implement and evaluate machine learning (ML) algorithms for the prediction of COVID-19 diagnosis, severity, and fatality and to assess biomarkers potentially associated with these outcomes. Material and methods: Serum (n = 96) and plasma (n = 96) samples from patients with COVID-19 (acute, severe, and fatal illness) from two independent hospitals in China were analyzed by LC-MS. Samples from healthy volunteers and from patients with pneumonia caused by other viruses (i.e. negative RT-PCR for COVID-19) were used as controls. Seven different ML-based models were built: PLS-DA, ANNDA, XGBoostDA, SIMCA, SVM, LREG, and KNN. Results: The PLS-DA model presented the best performance for both datasets, with accuracy rates to predict the diagnosis, severity, and fatality of COVID-19 of 93%, 94%, and 97%, respectively. Low levels of the metabolites ribothymidine, 4-hydroxyphenylacetoylcarnitine and uridine were associated with COVID-19 positivity, whereas high levels of N-acetyl-glucosamine-1-phosphate, cysteinylglycine, methyl isobutyrate, l-ornithine, and 5,6-dihydro-5-methyluracil were significantly related to greater severity and fatality from COVID-19. Conclusion: The PLS-DA model can help to predict SARS-CoV-2 diagnosis, severity, and fatality in daily practice. Some biomarkers typically increased in COVID-19 patients’ serum or plasma (i.e. ribothymidine, N-acetyl-glucosamine-1-phosphate, l-ornithine, 5,6-dihydro-5-methyluracil) should be further evaluated as prognostic indicators of the disease.info:eu-repo/semantics/publishedVersio

    Naringenin-4'-glucuronide as a new drug candidate against the COVID-19 Omicron variant: a study based on molecular docking, molecular dynamics, MM/PBSA and MM/GBSA

    No full text
    This study aimed to identify natural bioactive compounds (NBCs) as potential inhibitors of the spike (S1) receptor binding domain (RBD) of the COVID-19 Omicron variant using computer simulations (in silico). NBCs with previously proven biological in vitro activity were obtained from the ZINC database and analyzed through virtual screening, molecular docking, molecular dynamics (MD), molecular mechanics/Poisson-Boltzmann surface area (MM/PBSA), and molecular mechanics/generalized Born surface area (MM/GBSA). Remdesivir was used as a reference drug in docking and MD calculations. A total of 170,906 compounds were analyzed. Molecular docking screening revealed the top four NBCs with a high affinity with the spike (affinity energy <-7 kcal/mol) to be ZINC000045789238, ZINC000004098448, ZINC000008662732, and ZINC000003995616. In the MD analysis, the four ligands formed a complex with the highest dynamic equilibrium S1 (mean RMSD <0.3 nm), lowest fluctuation of the complex amino acid residues (RMSF <1.3), and solvent accessibility stability. However, the ZINC000045789238-spike complex (naringenin-4'-O glucuronide) was the only one that simultaneously had minus signal (-) MM/PBSA and MM/GBSA binding free energy values (-3.74 kcal/mol and -15.65 kcal/mol, respectively), indicating favorable binding. This ligand (naringenin-4'-O glucuronide) was also the one that produced the highest number of hydrogen bonds in the entire dynamic period (average = 4601 bonds per nanosecond). Six mutant amino acid residues formed these hydrogen bonds from the RBD region of S1 in the Omicron variant: Asn417, Ser494, Ser496, Arg403, Arg408, and His505. Naringenin-4'-O-glucuronide showed promising results as a potential drug candidate against COVID-19. In vitro, and preclinical studies are needed to confirm these findings.info:eu-repo/semantics/publishedVersio

    Feature sensitivity criterion-based sampling strategy from the Optimization based on Phylogram Analysis (Fs-OPA) and Cox regression applied to mental disorder datasets.

    No full text
    Digital datasets in several health care facilities, as hospitals and prehospital services, accumulated data from thousands of patients for more than a decade. In general, there is no local team with enough experts with the required different skills capable of analyzing them in entirety. The integration of those abilities usually demands a relatively long-period and is cost. Considering that scenario, this paper proposes a new Feature Sensitivity technique that can automatically deal with a large dataset. It uses a criterion-based sampling strategy from the Optimization based on Phylogram Analysis. Called FS-opa, the new approach seems proper for dealing with any types of raw data from health centers and manipulate their entire datasets. Besides, FS-opa can find the principal features for the construction of inference models without depending on expert knowledge of the problem domain. The selected features can be combined with usual statistical or machine learning methods to perform predictions. The new method can mine entire datasets from scratch. FS-opa was evaluated using a relatively large dataset from electronic health records of mental disorder prehospital services in Brazil. Cox's approach was integrated to FS-opa to generate survival analysis models related to the length of stay (LOS) in hospitals, assuming that it is a relevant aspect that can benefit estimates of the efficiency of hospitals and the quality of patient treatments. Since FS-opa can work with raw datasets, no knowledge from the problem domain was used to obtain the preliminary prediction models found. Results show that FS-opa succeeded in performing a feature sensitivity analysis using only the raw data available. In this way, FS-opa can find the principal features without bias of an inference model, since the proposed method does not use it. Moreover, the experiments show that FS-opa can provide models with a useful trade-off according to their representativeness and parsimony. It can benefit further analyses by experts since they can focus on aspects that benefit problem modeling
    corecore