1,552 research outputs found

    Machine learning and mapping algorithms applied to proteomics problems

    Get PDF
    Proteins provide evidence that a given gene is expressed, and machine learning algorithms can be applied to various proteomics problems in order to gain information about the underlying biology. This dissertation applies machine learning algorithms to proteomics data in order to predict whether or not a given peptide is observable by mass spectrometry, whether a given peptide can serve as a cell penetrating peptide, and then utilizes the peptides observed through mass spectrometry to aid in the structural annotation of the chicken genome. Peptides observed by mass spectrometry are used to identify proteins, and being able to accurately predict which peptides will be seen can allow researchers to analyze to what extent a given protein is observable. Cell penetrating peptides can possibly be utilized to allow targeted small molecule delivery across cellular membranes and possibly serve a role as drug delivery peptides. Peptides and proteins identified through mass spectrometry can help refine computational gene models and improve structural genome annotations

    Prediction and characterization of therapeutic protein aggregation

    Get PDF

    Application of machine learning and deep learning for proteomics data analysis

    Get PDF

    Machine learning for the prediction of protein-protein interactions

    Get PDF
    The prediction of protein-protein interactions (PPI) has recently emerged as an important problem in the fields of bioinformatics and systems biology, due to the fact that most essential cellular processes are mediated by these kinds of interactions. In this thesis we focussed in the prediction of co-complex interactions, where the objective is to identify and characterize protein pairs which are members of the same protein complex. Although high-throughput methods for the direct identification of PPI have been developed in the last years. It has been demonstrated that the data obtained by these methods is often incomplete and suffers from high false-positive and false-negative rates. In order to deal with this technology-driven problem, several machine learning techniques have been employed in the past to improve the accuracy and trustability of predicted protein interacting pairs, demonstrating that the combined use of direct and indirect biological insights can improve the quality of predictive PPI models. This task has been commonly viewed as a binary classification problem. However, the nature of the data creates two major problems. Firstly, the imbalanced class problem due to the number of positive examples (pairs of proteins which really interact) being much smaller than the number of negative ones. Secondly, the selection of negative examples is based on some unreliable assumptions which could introduce some bias in the classification results. The first part of this dissertation addresses these drawbacks by exploring the use of one-class classification (OCC) methods to deal with the task of prediction of PPI. OCC methods utilize examples of just one class to generate a predictive model which is consequently independent of the kind of negative examples selected; additionally these approaches are known to cope with imbalanced class problems. We designed and carried out a performance evaluation study of several OCC methods for this task. We also undertook a comparative performance evaluation with several conventional learning techniques. Furthermore, we pay attention to a new potential drawback which appears to affect the performance of PPI prediction. This is associated with the composition of the positive gold standard set, which contain a high proportion of examples associated with interactions of ribosomal proteins. We demonstrate that this situation indeed biases the classification task, resulting in an over-optimistic performance result. The prediction of non-ribosomal PPI is a much more difficult task. We investigate some strategies in order to improve the performance of this subtask, integrating new kinds of data as well as combining diverse classification models generated from different sets of data. In this thesis, we undertook a preliminary validation study of the new PPI predicted by using OCC methods. To achieve this, we focus in three main aspects: look for biological evidence in the literature that support the new predictions; the analysis of predicted PPI networks properties; and the identification of highly interconnected groups of proteins which can be associated with new protein complexes. Finally, this thesis explores a slightly different area, related to the prediction of PPI types. This is associated with the classification of PPI structures (complexes) contained in the Protein Data Bank (PDB) data base according to its function and binding affinity. Considering the relatively reduced number of crystalized protein complexes available, it is not possible at the moment to link these results with the ones obtained previously for the prediction of PPI complexes. However, this could be possible in the near future when more PPI structures will be available

    Risk assessment for progression of Diabetic Nephropathy based on patient history analysis

    Get PDF
    A nefropatia diabética (ND) é uma das complicações mais comuns em doentes com diabetes. Trata-se de uma doença crónica que afeta progressivamente os rins, podendo resultar numa insuficiência renal. A digitalização permitiu aos hospitais armazenar as informações dos doentes em registos de saúde eletrónicos (RSE). A aplicação de algoritmos de Machine Learning (ML) a estes dados pode permitir a previsão do risco na evolução destes doentes, conduzindo a uma melhor gestão da doença. O principal objetivo deste trabalho é criar um modelo preditivo que tire partido do historial do doente presente nos RSE. Foi aplicado neste trabalho o maior conjunto de dados de doentes portugueses com DN, seguidos durante 22 anos pela Associação Protetora dos Diabéticos de Portugal (APDP). Foi desenvolvida uma abordagem longitudinal na fase de pré-processamento de dados, permitindo que estes fossem servidos como entrada para dezasseis algoritmos de ML distintos. Após a avaliação e análise dos respetivos resultados, o Light Gradient Boosting Machine foi identificado como o melhor modelo, apresentando boas capacidades de previsão. Esta conclusão foi apoiada não só pela avaliação de várias métricas de classificação em dados de treino, teste e validação, mas também pela avaliação do seu desempenho por cada estádio da doença. Para além disso, os modelos foram analisados utilizando gráficos de feature ranking e através de análise estatística. Como complemento, são ainda apresentados a interpretabilidade dos resultados através do método SHAP, assim como a distribuição do modelo utilizando o Gradio e os servidores da Hugging Face. Através da integração de técnicas ML, de um método de interpretação e de uma aplicação Web que fornece acesso ao modelo, este estudo oferece uma abordagem potencialmente eficaz para antecipar a evolução da ND, permitindo que os profissionais de saúde tomem decisões informadas para a prestação de cuidados personalizados e gestão da doença

    Seleção de embriões pela análise de imagens: uma abordagem Deep Learning

    Get PDF
    Infertility affects about 186 million people worldwide and 9-10% of couples in Portugal, causing financial, social and medical problems. Evaluation of embryo quality based morphological features is the standard in vitro fertilization (IVF) clinics around the world. This process is subjective and time-consuming, and results in discrepant classifications among embryologists and clinics, leading to fail in predict accurately embryo implantation and live birth potential. Although assisted reproductive technologies (ART) such as IVF coupled with time lapse elimination of periodic transfer to microscopy assessment and stable embryo culture conditions for embryos development, has alleviated the infertility problem, there are significant limitations even considering morphokinetic analysis. Likewise, many patients require multiple IVF cycles to achieve pregnancy, making the selection of single embryo for transfer a critical challenge. Here, we demonstrate the reliability of machine learning, especially deep learning based on TensorFlow open source and Keras libraries for embryo raw TLI images features extraction and classification in clinical practice. Equally, we present a follow up pipeline for clinicians and researchers, with no expertise in machine learning, to easily, rapid and accurately utilize deep learning as a clinical decision support tool in embryos viability studies, as well in other medical field where the analysis of images is preeminentA infertilidade afeta cerca de 186 milhões de pessoas em todo o mundo e 9-10% dos casais em Portugal, causando problemas financeiros, sociais e de saúde. Constitui procedimento padrão a avaliação da qualidade dos embriões baseadas em características morfológicas. No entanto, tais avaliações são subjetivas e demoradas e resultam em classificações discrepantes entre embriologistas e clínicas causando problemas na avaliação do potencial do embrião. Embora as tecnologias de reprodução medicamente assistida, como a fertilização in vitro, acoplada à tecnologia time-lapse, tenham diminuído o problema da infertilidade, existem limitações significativas, mesmo considerando a análise morfocinética. Outrossim, muitas pacientes necessitam de múltiplos ciclos de fertilização para alcançar a gravidez, tornando a seleção do embrião com maior potencial de implantação e geração de nados vivos um desafio crítico. No presente projeto demonstramos a prova do conceito da confiabilidade de Machine Learning (aprendizagem automática), especialmente Deep Learning baseado em TensorFlow e Keras, para extrair e discriminar caraterísticas associadas ao potencial embrionário, em imagens time-lapse. Igualmente, apresentamos um pipeline para que clínicos e investigadores, sem experiência em Machine Learning, possam utilizar com facilidade, rapidez e precisão Deep Learning como ferramenta de apoio à decisão clínica em estudos de viabilidade de embriões, bem como noutras áreas médicas onde a análise de imagens seja proeminenteMestrado em Biologia Molecular e Celula

    Understanding the functional roles of Intrinsic Protein disorder in NFkB Transcription factors

    Get PDF
    Master'sMASTER OF SCIENC
    corecore