8,057 research outputs found
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
A guide to machine learning for biologists
The expanding scale and inherent complexity of biological data have encouraged a growing use of machine learning in biology to build informative and predictive models of the underlying biological processes. All machine learning techniques fit models to data; however, the specific methods are quite varied and can at first glance seem bewildering. In this Review, we aim to provide readers with a gentle introduction to a few key machine learning techniques, including the most recently developed and widely used techniques involving deep neural networks. We describe how different techniques may be suited to specific types of biological data, and also discuss some best practices and points to consider when one is embarking on experiments involving machine learning. Some emerging directions in machine learning methodology are also discussed
Systems Analytics and Integration of Big Omics Data
A “genotype"" is essentially an organism's full hereditary information which is obtained from its parents. A ""phenotype"" is an organism's actual observed physical and behavioral properties. These may include traits such as morphology, size, height, eye color, metabolism, etc. One of the pressing challenges in computational and systems biology is genotype-to-phenotype prediction. This is challenging given the amount of data generated by modern Omics technologies. This “Big Data” is so large and complex that traditional data processing applications are not up to the task. Challenges arise in collection, analysis, mining, sharing, transfer, visualization, archiving, and integration of these data. In this Special Issue, there is a focus on the systems-level analysis of Omics data, recent developments in gene ontology annotation, and advances in biological pathways and network biology. The integration of Omics data with clinical and biomedical data using machine learning is explored. This Special Issue covers new methodologies in the context of gene–environment interactions, tissue-specific gene expression, and how external factors or host genetics impact the microbiome
Network-driven strategies to integrate and exploit biomedical data
[eng] In the quest for understanding complex biological systems, the scientific community has been delving into protein, chemical and disease biology, populating biomedical databases with a wealth of data and knowledge. Currently, the field of biomedicine has entered a Big Data era, in which computational-driven research can largely benefit from existing knowledge to better understand and characterize biological and chemical entities. And yet, the heterogeneity and complexity of biomedical data trigger the need for a proper integration and representation of this knowledge, so that it can be effectively and efficiently exploited.
In this thesis, we aim at developing new strategies to leverage the current biomedical knowledge, so that meaningful information can be extracted and fused into downstream applications. To this goal, we have capitalized on network analysis algorithms to integrate and exploit biomedical data in a wide variety of scenarios, providing a better understanding of pharmacoomics experiments while helping accelerate the drug discovery process. More specifically, we have (i) devised an approach to identify functional gene sets associated with drug response mechanisms of action, (ii) created a resource of biomedical descriptors able to anticipate cellular drug response and identify new drug repurposing opportunities, (iii) designed a tool to annotate biomedical support for a given set of experimental observations, and (iv) reviewed different chemical and biological descriptors relevant for drug discovery, illustrating how they can be used to provide solutions to current challenges in biomedicine.[cat] En la cerca d’una millor comprensiĂł dels sistemes biològics complexos, la comunitat cientĂfica ha estat aprofundint en la biologia de les proteĂŻnes, fĂ rmacs i malalties, poblant les bases de dades biomèdiques amb un gran volum de dades i coneixement. En l’actualitat, el camp de la biomedicina es troba en una era de “dades massives” (Big Data), on la investigaciĂł duta a terme per ordinadors se’n pot beneficiar per entendre i caracteritzar millor les entitats quĂmiques i biològiques. No obstant, la heterogeneĂŻtat i complexitat de les dades biomèdiques requereix que aquestes s’integrin i es representin d’una manera idònia, permetent aixĂ explotar aquesta informaciĂł d’una manera efectiva i eficient.
L’objectiu d’aquesta tesis doctoral Ă©s desenvolupar noves estratègies que permetin explotar el coneixement biomèdic actual i aixĂ extreure informaciĂł rellevant per aplicacions biomèdiques futures. Per aquesta finalitat, em fet servir algoritmes de xarxes per tal d’integrar i explotar el coneixement biomèdic en diferents tasques, proporcionant un millor enteniment dels experiments farmacoòmics per tal d’ajudar accelerar el procĂ©s de descobriment de nous fĂ rmacs. Com a resultat, en aquesta tesi hem (i) dissenyat una estratègia per identificar grups funcionals de gens associats a la resposta de lĂnies cel·lulars als fĂ rmacs, (ii) creat una col·lecciĂł de descriptors biomèdics capaços, entre altres coses, d’anticipar com les cèl·lules responen als fĂ rmacs o trobar nous usos per fĂ rmacs existents, (iii) desenvolupat una eina per descobrir quins contextos biològics corresponen a una associaciĂł biològica observada experimentalment i, finalment, (iv) hem explorat diferents descriptors quĂmics i biològics rellevants pel procĂ©s de descobriment de nous fĂ rmacs, mostrant com aquests poden ser utilitzats per trobar solucions a reptes actuals dins el camp de la biomedicina
Gene2DisCo : gene to disease using disease commonalities
OBJECTIVE:
Finding the human genes co-causing complex diseases, also known as "disease-genes", is one of the emerging and challenging tasks in biomedicine. This process, termed gene prioritization (GP), is characterized by a scarcity of known disease-genes for most diseases, and by a vast amount of heterogeneous data, usually encoded into networks describing different types of functional relationships between genes. In addition, different diseases may share common profiles (e.g. genetic or therapeutic profiles), and exploiting disease commonalities may significantly enhance the performance of GP methods. This work aims to provide a systematic comparison of several disease similarity measures, and to embed disease similarities and heterogeneous data into a flexible framework for gene prioritization which specifically handles the lack of known disease-genes.
METHODS:
We present a novel network-based method, Gene2DisCo, based on generalized linear models (GLMs) to effectively prioritize genes by exploiting data regarding disease-genes, gene interaction networks and disease similarities. The scarcity of disease-genes is addressed by applying an efficient negative selection procedure, together with imbalance-aware GLMs. Gene2DisCo is a flexible framework, in the sense it is not dependent upon specific types of data, and/or upon specific disease ontologies.
RESULTS:
On a benchmark dataset composed of nine human networks and 708 medical subject headings (MeSH) diseases, Gene2DisCo largely outperformed the best benchmark algorithm, kernelized score functions, in terms of both area under the ROC curve (0.94 against 0.86) and precision at given recall levels (for recall levels from 0.1 to 1 with steps 0.1). Furthermore, we enriched and extended the benchmark data to the whole human genome and provided the top-ranked unannotated candidate genes even for MeSH disease terms without known annotations
Knowledge-augmented Graph Machine Learning for Drug Discovery: A Survey from Precision to Interpretability
The integration of Artificial Intelligence (AI) into the field of drug
discovery has been a growing area of interdisciplinary scientific research.
However, conventional AI models are heavily limited in handling complex
biomedical structures (such as 2D or 3D protein and molecule structures) and
providing interpretations for outputs, which hinders their practical
application. As of late, Graph Machine Learning (GML) has gained considerable
attention for its exceptional ability to model graph-structured biomedical data
and investigate their properties and functional relationships. Despite
extensive efforts, GML methods still suffer from several deficiencies, such as
the limited ability to handle supervision sparsity and provide interpretability
in learning and inference processes, and their ineffectiveness in utilising
relevant domain knowledge. In response, recent studies have proposed
integrating external biomedical knowledge into the GML pipeline to realise more
precise and interpretable drug discovery with limited training instances.
However, a systematic definition for this burgeoning research direction is yet
to be established. This survey presents a comprehensive overview of
long-standing drug discovery principles, provides the foundational concepts and
cutting-edge techniques for graph-structured data and knowledge databases, and
formally summarises Knowledge-augmented Graph Machine Learning (KaGML) for drug
discovery. A thorough review of related KaGML works, collected following a
carefully designed search methodology, are organised into four categories
following a novel-defined taxonomy. To facilitate research in this promptly
emerging field, we also share collected practical resources that are valuable
for intelligent drug discovery and provide an in-depth discussion of the
potential avenues for future advancements
- …