156 research outputs found

    Integrative methods for analysing big data in precision medicine

    Get PDF
    We provide an overview of recent developments in big data analyses in the context of precision medicine and health informatics. With the advance in technologies capturing molecular and medical data, we entered the area of “Big Data” in biology and medicine. These data offer many opportunities to advance precision medicine. We outline key challenges in precision medicine and present recent advances in data integration-based methods to uncover personalized information from big data produced by various omics studies. We survey recent integrative methods for disease subtyping, biomarkers discovery, and drug repurposing, and list the tools that are available to domain scientists. Given the ever-growing nature of these big data, we highlight key issues that big data integration methods will face

    Identifying drug-target and drug-disease associations using computational intelligence

    Get PDF
    Background: Traditional drug development is an expensive process that typically requires the investment of a large number of resources in terms of finances, equipment, and time. However, sometimes these efforts do not result in a pharmaceutical product in the market. To overcome the limitations of this process, complementary—or in some cases, alternative—methods with high-throughput results are necessary. Computational drug discovery is a shortcut that can reduce the difficulties of traditional methods because of its flexible nature. Drug repositioning, which aims to find new applications for existing drugs, is one of the promising approaches in computational drug discovery. Considering the availability of different types of data in various public databases, drug-disease association identification and drug repositioning can be performed based on the interaction of drugs and biomolecules. Moreover, drug repositioning mainly focuses on the similarity of drugs and the similarity of agents interacting with drugs. It is assumed that if drug D is associated or interacts with target T, then drugs similar to drug D can be associated or interact with target T or targets similar to target T. Therefore, similarity-based approaches are widely used for drug repositioning. Research Objectives: Develop novel computational methods for drug-target and drug-disease association prediction to be used for drug repositioning. Results: In this thesis, the problem of drug-disease association identification and drug repositioning is divided into sub-problems. These sub-problems include drug-target interaction prediction and using targets as intermediaries for drug-disease association identification. Addressing these subproblems results in the development of three new computational models for drug-target interaction and drug-disease association prediction: MDIPA, NMTF-DTI, and NTD-DR. MDIPA is a nonnegative matrix factorization-based method to predict interaction scores of drug-microRNA pairs, where the interaction scores can effectively be used for drug repositioning. This method uses the functional similarity of microRNAs and structural similarity of drugs to make predictions. To include more biomolecules (e.g., proteins) in the study as well as achieve a more flexible model, we develop NMTF-DTI. This nonnegative matrix tri- factorization method uses multiple types of similarities for drugs and proteins to predict the associations between drugs and targets and their interaction score. To take another step towards drug repositioning, we identify the associations between drugs and disease. In this step, we develop NTD-DR, a nonnegative tensor decomposition approach where multiple similarities for drugs, targets, and diseases are used to identify the associations between drugs and diseases to be used for drug repositioning. The detail of each method is discussed in Chapters 3, 4, 5, respectively. Future work will focus on considering additional biomolecules as the drug target to identify drug-disease associations for drug repositioning. In summary, using nonnegative matrix factorization, nonnegative matrix tri-factorization, and nonnegative tensor decomposition, as well as applying different types of association information and multiple types of similarities, improve the performance of proposed methods over those methods that use single association or similarity information

    The Reasonable Effectiveness of Randomness in Scalable and Integrative Gene Regulatory Network Inference and Beyond

    Get PDF
    Gene regulation is orchestrated by a vast number of molecules, including transcription factors and co-factors, chromatin regulators, as well as epigenetic mechanisms, and it has been shown that transcriptional misregulation, e.g., caused by mutations in regulatory sequences, is responsible for a plethora of diseases, including cancer, developmental or neurological disorders. As a consequence, decoding the architecture of gene regulatory networks has become one of the most important tasks in modern (computational) biology. However, to advance our understanding of the mechanisms involved in the transcriptional apparatus, we need scalable approaches that can deal with the increasing number of large-scale, high-resolution, biological datasets. In particular, such approaches need to be capable of efficiently integrating and exploiting the biological and technological heterogeneity of such datasets in order to best infer the underlying, highly dynamic regulatory networks, often in the absence of sufficient ground truth data for model training or testing. With respect to scalability, randomized approaches have proven to be a promising alternative to deterministic methods in computational biology. As an example, one of the top performing algorithms in a community challenge on gene regulatory network inference from transcriptomic data is based on a random forest regression model. In this concise survey, we aim to highlight how randomized methods may serve as a highly valuable tool, in particular, with increasing amounts of large-scale, biological experiments and datasets being collected. Given the complexity and interdisciplinary nature of the gene regulatory network inference problem, we hope our survey maybe helpful to both computational and biological scientists. It is our aim to provide a starting point for a dialogue about the concepts, benefits, and caveats of the toolbox of randomized methods, since unravelling the intricate web of highly dynamic, regulatory events will be one fundamental step in understanding the mechanisms of life and eventually developing efficient therapies to treat and cure diseases

    Integrative methods for analyzing big data in precision medicine

    Get PDF
    We provide an overview of recent developments in big data analyses in the context of precision medicine and health informatics. With the advance in technologies capturing molecular and medical data, we entered the area of “Big Data” in biology and medicine. These data offer many opportunities to advance precision medicine. We outline key challenges in precision medicine and present recent advances in data integration-based methods to uncover personalized information from big data produced by various omics studies. We survey recent integrative methods for disease subtyping, biomarkers discovery, and drug repurposing, and list the tools that are available to domain scientists. Given the ever-growing nature of these big data, we highlight key issues that big data integration methods will face

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Pathway-Based Multi-Omics Data Integration for Breast Cancer Diagnosis and Prognosis.

    Get PDF
    Ph.D. Thesis. University of Hawaiʻi at Mānoa 2017

    Descoberta de conhecimento biomédico através de representações continuas de grafos multi-relacionais

    Get PDF
    Knowledge graphs are multi-relational graph structures that allow to organize data in a way that is not only query able but that also allows the inference of implicit knowledge by both humans and, particularly, machines. In recent years new methods have been developed in order to maximize the knowledge that can be extracted from these structures, especially in the machine learning field. Knowledge graph embedding (KGE) strategies allow to map the data of these graphs to a lower dimensional space to facilitate the application of downstream tasks such as link prediction or node classification. In this work the capabilities and limitations of using these techniques to derive new knowledge from pre-existing biomedical networks was explored, since this is a field that not only has seen efforts towards converting its large knowledge bases into knowledge graphs, but that also can make use of the predictive capabilities of these models in order to accelerate research in the field. In order to do so, several KGE models were studied and a pipeline was created in order to obtain and train such models on different biomedical datasets. The results show that these models can make accurate predictions on some datasets, but that their performance can be hampered by some inherent characteristics of the networks. Additionally, with the knowledge acquired during this research a notebook was created that aims to be an entry point to other researchers interested in exploring this field.Grafos de conhecimento são grafos multi-relacionais que permitem organizar informação de maneira a que esta seja não apenas passível de ser inquirida, mas que também permita a inferência logica de nova informação por parte de humanos e especialmente sistemas computacionais. Recentemente vários métodos têm vindo a ser criados de maneira a maximizar a informação que pode ser retirada destas estruturas, sendo a área de \Machine Learning" um dos grandes propulsores para tal. \Knowledge graph embeddings" (KGE) permitem que os componentes destes grafos sejam mapeados num espaço latente, de maneira a facilitar a aplicação de tarefas como a predição de novas ligações no grafo ou classificação de nós. Neste trabalho foram exploradas as capacidades e limitações da aplicação de modelos baseados em \Knowledge graph embeddings" a redes biomédicas existentes, dado que a biomedicina é uma área na qual têm sido feitos esforços no sentido de organizar a sua vasta base de conhecimento em grafos de conhecimento, e onde esta capacidade de predição pode ser usada para potenciar avanços nos seus diversos domínios. Para tal, no presente trabalho, vários modelos foram estudados e uma pipeline foi criada para treinar os mesmos sobre algumas redes biomédicas. Os resultados mostram que estes modelos conseguem de facto ser precisos no que diz respeito á tarefa de predição de ligações em alguns conjuntos de dados, contudo esta precisão aparenta ser afetada por características inerentes à estrutura do grafo. Adicionalmente, com o conhecimento adquirido durante a realização deste trabalho foi criado um \notebook" que tem como objetivo servir como uma introdução à área de \Knowledge graph embeddings" para investigadores interessados em explorar a mesma.Mestrado em Engenharia de Computadores e Telemátic

    The consensus molecular subtypes of colorectal cancer

    Get PDF
    Colorectal cancer (CRC) is a frequently lethal disease with heterogeneous outcomes and drug responses. To resolve inconsistencies among the reported gene expression-based CRC classifications and facilitate clinical translation, we formed an international consortium dedicated to large-scale data sharing and analytics across expert groups. We show marked interconnectivity between six independent classification systems coalescing into four consensus molecular subtypes (CMSs) with distinguishing features: CMS1 (microsatellite instability immune, 14%), hypermutated, microsatellite unstable and strong immune activation; CMS2 (canonical, 37%), epithelial, marked WNT and MYC signaling activation; CMS3 (metabolic, 13%), epithelial and evident metabolic dysregulation; and CMS4 (mesenchymal, 23%), prominent transforming growth factor-beta activation, stromal invasion and angiogenesis. Samples with mixed features (13%) possibly represent a transition phenotype or intratumoral heterogeneity. We consider the CMS groups the most robust classification system currently available for CRC-with clear biological interpretability-and the basis for future clinical stratification and subtype-based targeted interventions

    Hyperbolic matrix factorization improves prediction of drug-target associations

    Get PDF
    Past research in computational systems biology has focused more on the development and applications of advanced statistical and numerical optimization techniques and much less on understanding the geometry of the biological space. By representing biological entities as points in a low dimensional Euclidean space, state-of-the-art methods for drug-target interaction (DTI) prediction implicitly assume the flat geometry of the biological space. In contrast, recent theoretical studies suggest that biological systems exhibit tree-like topology with a high degree of clustering. As a consequence, embedding a biological system in a flat space leads to distortion of distances between biological objects. Here, we present a novel matrix factorization methodology for drug-target interaction prediction that uses hyperbolic space as the latent biological space. When benchmarked against classical, Euclidean methods, hyperbolic matrix factorization exhibits superior accuracy while lowering embedding dimension by an order of magnitude. We see this as additional evidence that the hyperbolic geometry underpins large biological networks
    corecore