Search CORE

244 research outputs found

Knowledge-based Biomedical Data Science 2019

Author: Callahan Tiffany J.
Hunter Lawrence E.
Pielke-Lombardo Harrison
Tripodi Ignacio J.
Publication venue
Publication date: 08/10/2019
Field of study

Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages with 3 table

arXiv.org e-Print Archive

Discovering Complex Relationships between Drugs and Diseases

Author: Sharma Ranjana
Publication venue: North Dakota State University
Publication date: 01/01/2011
Field of study

Finding the complex semantic relations between existing drugs and new diseases will help in the drug development in a new way. Most of the drugs which have found new uses have been discovered due to serendipity. Hence, the prediction of the uses of drugs for more than one disease should be done in a systematic way by studying the semantic relations between the drugs and diseases and also the other entities involved in the relations. Hence, in order to study the complex semantic relations between drugs and diseases an application was developed that integrates the heterogeneous data in different formats from different public databases which are available online. A high level ontology was also developed to integrate the data and only the fields required for the current study were used. The data was collected from different data sources such as DrugBank, UniProt/SwissProt, GeneCards and OMIM. Most of these data sources are the standard data sources and have been used by National Committee of Biotechnology Information of Nation Institute of Health. The data was parsed and integrated and complex semantic relations were discovered. This is a simple and novel effort which may find uses in development of new drug targets and polypharmacology

NDSU Libraries Institutional Repository

Recommended from our members

Computational Toxinology

Author: Romano Joseph Daniel
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

Venoms are complex mixtures of biological macromolecules and other compounds that are used for predatory and defensive purposes by hundreds of thousands of known species worldwide. Throughout human history, venoms and venom components have been used to treat a vast array of illnesses, causing them to be of great clinical, economic, and academic interest to the drug discovery and toxinology communities. In spite of major computational advances that facilitate data-driven drug discovery, most therapeutic venom effects are still discovered via tedious trial-and-error, or simply by accident. In this dissertation, I describe a body of work that aims to establish a new subdiscipline of translational bioinformatics, which I name “computational toxinology”. To accomplish this goal, I present three integrated components that span a wide range of informatics techniques: (1) VenomKB, (2) VenomSeq, and (3) VenomKB’s Semantic API. To provide a platform for structuring, representing, retrieving, and integrating venom data relevant to drug discovery, VenomKB provides a database-backed web application and knowledge base for computational toxinology. VenomKB is structured according to a fully-featured ontology of venoms, and provides data aggregated from many popular web re- sources. VenomSeq is a biotechnology workflow that is designed to generate new high-throughput sequencing data for incorporation into VenomKB. Specifically, we expose human cells to controlled doses of crude venoms, conduct RNA-Sequencing, and build profiles of differential gene expression, which we then compare to publicly-available differential expression data for known dis- eases and drugs with known effects, and use those comparisons to hypothesize ways that the venoms could act in a therapeutic manner, as well. These data are then integrated into VenomKB, where they can be effectively retrieved and evaluated using existing data and known therapeutic associations. VenomKB’s Semantic API further develops this functionality by providing an intelligent, powerful, and user-friendly interface for querying the complex underlying data in VenomKB in a way that reflects the intuitive, human-understandable mean- ing of those data. The Semantic API is designed to cater to the needs of advanced users as well as laypersons and bench scientists without previous expertise in computational biology and semantic data analysis. In each chapter of the dissertation, I describe how we evaluated these 3 components through various approaches. We demonstrate the utility of VenomKB and the Semantic API by testing a number of practical use-cases for each, designed to highlight their ability to rediscover existing knowledge as well as suggesting potential areas for future exploration. We use statistics and data science techniques to evaluate VenomSeq on 25 diverse species of venomous animals, and propose biologically feasible explanations for significant findings. In evaluating the Semantic API, I show how observations on VenomSeq data can be interpreted and placed into the context of past research by members of the larger toxinology community. Computational toxinology is a toolbox designed to be used by multiple stakeholders (toxinologists, computational biologists, and systems pharmacologists, among others) to improve the return rate of clinically-significant findings from manual experimentation. It aims to achieve this goal by enabling access to data, providing means for easy validation of results, and suggesting specific hypotheses that are preliminarily supported by rigorous inferential statistics. All components of the research I describe are open-access and publicly available, to improve reproducibility and encourage widespread adoptio

Columbia University Academic Commons

Drug prioritization using the semantic properties of a knowledge graph

Author: 't Hoen PAC
Charrout M.
Hettne K.M. (Kristina)
Kors J.A. (Jan)
Kudrin R.
Malas T.B.
Mulligen E.M. (Erik) van
Peters D. (Dorien)
Roos M. (Marco)
Starikov S.
Vlietstra W.J.
Vos R.O.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Abstract Compounds that are candidates for drug repurposing can be ranked by leveraging knowledge available in the biomedical literature and databases. This knowledge, spread across a variety of sources, can be integrated within a knowledge graph, which thereby comprehensively describes known relationships between biomedical concepts, such as drugs, diseases, genes, etc. Our work uses the semantic information between drug and disease concepts as features, which are extracted from an existing knowledge graph that integrates 200 different biological knowledge sources. RepoDB, a standard drug repurposing database which describes drug-disease combinations that were approved or that failed in clinical trials, is used to train a random forest classifier. The 10-times repeated 10-fold cross-validation performance of the classifier achieves a mean area under the receiver operating characteristic curve (AUC) of 92.2%. We apply the classifier to prioritize 21 preclinical drug repurposing candidates that have been suggested for Autosomal Dominant Polycystic Kidney Disease (ADPKD). Mozavaptan, a vasopressin V2 receptor antagonist is predicted to be the drug most likely to be approved after a clinical trial, and belongs to the same drug class as tolvaptan, the only treatment for ADPKD that is currently approved. We conclude that semantic properties of concepts in a knowledge graph can be exploited to prioritize drug repurposing candidates for testing in clinical trials

Maastricht University Research Portal

Directory of Open Access Journals

EUR Research Repository

Leiden University Scholary Publications

Erasmus University Digital Repository

Using the Literature to Identify Confounders

Author: Malec Scott
Publication venue: DigitalCommons@TMC
Publication date: 01/01/2018
Field of study

Prior work in causal modeling has focused primarily on learning graph structures and parameters to model data generating processes from observational or experimental data, while the focus of the literature-based discovery paradigm was to identify novel therapeutic hypotheses in publicly available knowledge. The critical contribution of this dissertation is to refashion the literature-based discovery paradigm as a means to populate causal models with relevant covariates to abet causal inference. In particular, this dissertation describes a generalizable framework for mapping from causal propositions in the literature to subgraphs populated by instantiated variables that reflect observational data. The observational data are those derived from electronic health records. The purpose of causal inference is to detect adverse drug event signals. The Principle of the Common Cause is exploited as a heuristic for a defeasible practical logic. The fundamental intuition is that improbable co-occurrences can be “explained away” with reference to a common cause, or confounder. Semantic constraints in literature-based discovery can be leveraged to identify such covariates. Further, the asymmetric semantic constraints of causal propositions map directly to the topology of causal graphs as directed edges. The hypothesis is that causal models conditioned on sets of such covariates will improve upon the performance of purely statistical techniques for detecting adverse drug event signals. By improving upon previous work in purely EHR-based pharmacovigilance, these results establish the utility of this scalable approach to automated causal inference

Crossref

DigitalCommons@The Texas Medical Center

Data-driven knowledge discovery in polycystic kidney disease

Author: Malas T.
Publication venue
Publication date: 30/03/2021
Field of study

The use of data derived from genomics and transcriptomic to further develop our understanding of Polycystic Kidney Diseases and identify novel drugs for its treatment.LUMC / Geneeskund

Leiden University Scholary Publications

Fast Machine Learning Algorithms for Massive Datasets with Applications in the Biomedical Domain

Author: Sadrfaridpour Ehsan
Publication venue: Clemson University Libraries
Publication date: 01/08/2020
Field of study

The continuous increase in the size of datasets introduces computational challenges for machine learning algorithms. In this dissertation, we cover the machine learning algorithms and applications in large-scale data analysis in manufacturing and healthcare. We begin with introducing a multilevel framework to scale the support vector machine (SVM), a popular supervised learning algorithm with a few tunable hyperparameters and highly accurate prediction. The computational complexity of nonlinear SVM is prohibitive on large-scale datasets compared to the linear SVM, which is more scalable for massive datasets. The nonlinear SVM has shown to produce significantly higher classification quality on complex and highly imbalanced datasets. However, a higher classification quality requires a computationally expensive quadratic programming solver and extra kernel parameters for model selection. We introduce a generalized fast multilevel framework for regular, weighted, and instance weighted SVM that achieves similar or better classification quality compared to the state-of-the-art SVM libraries such as LIBSVM. Our framework improves the runtime more than two orders of magnitude for some of the well-known benchmark datasets. We cover multiple versions of our proposed framework and its implementation in detail. The framework is implemented using PETSc library which allows easy integration with scientific computing tasks. Next, we propose an adaptive multilevel learning framework for SVM to reduce the variance between prediction qualities across the levels, improve the overall prediction accuracy, and boost the runtime. We implement multi-threaded support to speed up the parameter fitting runtime that results in more than an order of magnitude speed-up. We design an early stopping criteria to reduce the extra computational cost when we achieve expected prediction quality. This approach provides significant speed-up, especially for massive datasets. Finally, we propose an efficient low dimensional feature extraction over massive knowledge networks. Knowledge networks are becoming more popular in the biomedical domain for knowledge representation. Each layer in knowledge networks can store the information from one or multiple sources of data. The relationships between concepts or between layers represent valuable information. The proposed feature engineering approach provides an efficient and highly accurate prediction of the relationship between biomedical concepts on massive datasets. Our proposed approach utilizes semantics and probabilities to reduce the potential search space for the exploration and learning of machine learning algorithms. The calculation of probabilities is highly scalable with the size of the knowledge network. The number of features is fixed and equivalent to the number of relationships or classes in the data. A comprehensive comparison of well-known classifiers such as random forest, SVM, and deep learning over various features extracted from the same dataset, provides an overview for performance and computational trade-offs. Our source code, documentation and parameters will be available at https://github.com/esadr/

Clemson University: TigerPrints

Reproducibility and Replicability in Unmanned Aircraft Systems and Geographic Information Science

Author: Howe Cassandra
Publication venue: ScholarWorks@UARK
Publication date: 01/05/2023
Field of study

Multiple scientific disciplines face a so-called crisis of reproducibility and replicability (R&R) in which the validity of methodologies is questioned due to an inability to confirm experimental results. Trust in information technology (IT)-intensive workflows within geographic information science (GIScience), remote sensing, and photogrammetry depends on solutions to R&R challenges affecting multiple computationally driven disciplines. To date, there have only been very limited efforts to overcome R&R-related issues in remote sensing workflows in general, let alone those tied to disruptive technologies such as unmanned aircraft systems (UAS) and machine learning (ML). To accelerate an understanding of this crisis, a review was conducted to identify the issues preventing R&R in GIScience. Key barriers included: (1) awareness of time and resource requirements, (2) accessibility of provenance, metadata, and version control, (3) conceptualization of geographic problems, and (4) geographic variability between study areas. As a case study, a replication of a GIScience workflow utilizing Yolov3 algorithms to identify objects in UAS imagery was attempted. Despite the ability to access source data and workflow steps, it was discovered that the lack of accessibility to provenance and metadata of each small step of the work prohibited the ability to successfully replicate the work. Finally, a novel method for provenance generation was proposed to address these issues. It was found that artificial intelligence (AI) could be used to quickly create robust provenance records for workflows that do not exceed time and resource constraints and provide the information needed to replicate work. Such information can bolster trust in scientific results and provide access to cutting edge technology that can improve everyday life

UARK (University of Arkansas )

Reproducibility and Replicability in Unmanned Aircraft Systems and Geographic Information Science

Author: Howe Cassandra
Publication venue: ScholarWorks@UARK
Publication date: 01/05/2023
Field of study

ScholarWorks@UARK