757 research outputs found

    Processing of Electronic Health Records using Deep Learning: A review

    Full text link
    Availability of large amount of clinical data is opening up new research avenues in a number of fields. An exciting field in this respect is healthcare, where secondary use of healthcare data is beginning to revolutionize healthcare. Except for availability of Big Data, both medical data from healthcare institutions (such as EMR data) and data generated from health and wellbeing devices (such as personal trackers), a significant contribution to this trend is also being made by recent advances on machine learning, specifically deep learning algorithms

    Data integration for the analysis of uncharacterized proteins in Mycobacterium tuberculosis

    Get PDF
    Includes abstract.Includes bibliographical references (leaves 126-150).Mycobacterium tuberculosis is a bacterial pathogen that causes tuberculosis, a leading cause of human death worldwide from infectious diseases, especially in Africa. Despite enormous advances achieved in recent years in controlling the disease, tuberculosis remains a public health challenge. The contribution of existing drugs is of immense value, but the deadly synergy of the disease with Human Immunodeficiency Virus (HIV) or Acquired Immunodeficiency Syndrome (AIDS) and the emergence of drug resistant strains are threatening to compromise gains in tuberculosis control. In fact, the development of active tuberculosis is the outcome of the delicate balance between bacterial virulence and host resistance, which constitute two distinct and independent components. Significant progress has been made in understanding the evolution of the bacterial pathogen and its interaction with the host. The end point of these efforts is the identification of virulence factors and drug targets within the bacterium in order to develop new drugs and vaccines for the eradication of the disease

    Integration of multi-scale protein interactions for biomedical data analysis

    Get PDF
    With the advancement of modern technologies, we observe an increasing accumulation of biomedical data about diseases. There is a need for computational methods to sift through and extract knowledge from the diverse data available in order to improve our mechanistic understanding of diseases and improve patient care. Biomedical data come in various forms as exemplified by the various omics data. Existing studies have shown that each form of omics data gives only partial information on cells state and motivated jointly mining multi-omics, multi-modal data to extract integrated system knowledge. The interactome is of particular importance as it enables the modelling of dependencies arising from molecular interactions. This Thesis takes a special interest in the multi-scale protein interactome and its integration with computational models to extract relevant information from biomedical data. We define multi-scale interactions at different omics scale that involve proteins: pairwise protein-protein interactions, multi-protein complexes, and biological pathways. Using hypergraph representations, we motivate considering higher-order protein interactions, highlighting the complementary biological information contained in the multi-scale interactome. Based on those results, we further investigate how those multi-scale protein interactions can be used as either prior knowledge, or auxiliary data to develop machine learning algorithms. First, we design a neural network using the multi-scale organization of proteins in a cell into biological pathways as prior knowledge and train it to predict a patient's diagnosis based on transcriptomics data. From the trained models, we develop a strategy to extract biomedical knowledge pertaining to the diseases investigated. Second, we propose a general framework based on Non-negative Matrix Factorization to integrate the multi-scale protein interactome with multi-omics data. We show that our approach outperforms the existing methods, provide biomedical insights and relevant hypotheses for specific cancer types

    ITRPCA: a new model for computational drug repositioning based on improved tensor robust principal component analysis

    Get PDF
    Background: Drug repositioning is considered a promising drug development strategy with the goal of discovering new uses for existing drugs. Compared with the experimental screening for drug discovery, computational drug repositioning offers lower cost and higher efficiency and, hence, has become a hot issue in bioinformatics. However, there are sparse samples, multi-source information, and even some noises, which makes it difficult to accurately identify potential drug-associated indications.Methods: In this article, we propose a new scheme with improved tensor robust principal component analysis (ITRPCA) in multi-source data to predict promising drugā€“disease associations. First, we use a weighted k-nearest neighbor (WKNN) approach to increase the overall density of the drugā€“disease association matrix that will assist in prediction. Second, a drug tensor with five frontal slices and a disease tensor with two frontal slices are constructed using multi-similarity matrices and an updated association matrix. The two target tensors naturally integrate multiple sources of data from the drug-side aspect and the disease-side aspect, respectively. Third, ITRPCA is employed to isolate the low-rank tensor and noise information in the tensor. In this step, an additional range constraint is incorporated to ensure that all the predicted entry values of a low-rank tensor are within the specific interval. Finally, we focus on identifying promising drug indications by analyzing drugā€“disease association pairs derived from the low-rank drug and low-rank disease tensors.Results: We evaluate the effectiveness of the ITRPCA method by comparing it with five prominent existing drug repositioning methods. This evaluation is carried out using 10-fold cross-validation and independent testing experiments. Our numerical results show that ITRPCA not only yields higher prediction accuracy but also exhibits remarkable computational efficiency. Furthermore, case studies demonstrate the practical effectiveness of our method

    Graphlet-adjacencies provide complementary views on the functional organisation of the cell and cancer mechanisms

    Get PDF
    Recent biotechnological advances have led to a wealth of biological network data. Topo- logical analysis of these networks (i.e., the analysis of their structure) has led to break- throughs in biology and medicine. The state-of-the-art topological node and network descriptors are based on graphlets, induced connected subgraphs of different shapes (e.g., paths, triangles). However, current graphlet-based methods ignore neighbourhood infor- mation (i.e., what nodes are connected). Therefore, to capture topology and connectivity information simultaneously, I introduce graphlet adjacency, which considers two nodes adjacent based on their frequency of co-occurrence on a given graphlet. I use graphlet adjacency to generalise spectral methods and apply these on molecular networks. I show that, depending on the chosen graphlet, graphlet spectral clustering uncovers clusters en- riched in different biological functions, and graphlet diffusion of gene mutation scores predicts different sets of cancer driver genes. This demonstrates that graphlet adjacency captures topology-function and topology-disease relationships in molecular networks. To further detail these relationships, I take a pathway-focused approach. To enable this investigation, I introduce graphlet eigencentrality to compute the importance of a gene in a pathway either from the local pathway perspective or from the global network perspective. I show that pathways are best described by the graphlet adjacencies that capture the importance of their functionally critical genes. I also show that cancer driver genes characteristically perform hub roles between pathways. Given the latter finding, I hypothesise that cancer pathways should be identified by changes in their pathway-pathway relationships. Within this context, I propose pathway- driven non-negative matrix tri-factorisation (PNMTF), which fuses molecular network data and pathway annotations to learn an embedding space that captures the organisation of a network as a composition of subnetworks. In this space, I measure the functional importance of a pathway or gene in the cell and its functional disruption in cancer. I apply this method to predict genes and the pathways involved in four major cancers. By using graphlet-adjacency, I can exploit the tendency of cancer-related genes to perform hub roles to improve the prediction accuracy

    Relation Prediction over Biomedical Knowledge Bases for Drug Repositioning

    Get PDF
    Identifying new potential treatment options for medical conditions that cause human disease burden is a central task of biomedical research. Since all candidate drugs cannot be tested with animal and clinical trials, in vitro approaches are first attempted to identify promising candidates. Likewise, identifying other essential relations (e.g., causation, prevention) between biomedical entities is also critical to understand biomedical processes. Hence, it is crucial to develop automated relation prediction systems that can yield plausible biomedical relations to expedite the discovery process. In this dissertation, we demonstrate three approaches to predict treatment relations between biomedical entities for the drug repositioning task using existing biomedical knowledge bases. Our approaches can be broadly labeled as link prediction or knowledge base completion in computer science literature. Specifically, first we investigate the predictive power of graph paths connecting entities in the publicly available biomedical knowledge base, SemMedDB (the entities and relations constitute a large knowledge graph as a whole). To that end, we build logistic regression models utilizing semantic graph pattern features extracted from the SemMedDB to predict treatment and causative relations in Unified Medical Language System (UMLS) Metathesaurus. Second, we study matrix and tensor factorization algorithms for predicting drug repositioning pairs in repoDB, a general purpose gold standard database of approved and failed drugā€“disease indications. The idea here is to predict repoDB pairs by approximating the given input matrix/tensor structure where the value of a cell represents the existence of a relation coming from SemMedDB and UMLS knowledge bases. The essential goal is to predict the test pairs that have a blank cell in the input matrix/tensor based on the shared biomedical context among existing non-blank cells. Our final approach involves graph convolutional neural networks where entities and relation types are embedded in a vector space involving neighborhood information. Basically, we minimize an objective function to guide our model to concept/relation embeddings such that distance scores for positive relation pairs are lower than those for the negative ones. Overall, our results demonstrate that recent link prediction methods applied to automatically curated, and hence imprecise, knowledge bases can nevertheless result in high accuracy drug candidate prediction with appropriate configuration of both the methods and datasets used
    • ā€¦
    corecore