Search CORE

35 research outputs found

Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data

Author: Chen Bin
Ding Ying
Dong Xiao
Jiao Dazhi
Wang Huijun
Wild David J
Zhu Qian
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Recently there has been an explosion of new data sources about genes, proteins, genetic variations, chemical compounds, diseases and drugs. Integration of these data sources and the identification of patterns that go across them is of critical interest. Initiatives such as Bio2RDF and LODD have tackled the problem of linking biological data and drug data respectively using RDF. Thus far, the inclusion of chemogenomic and systems chemical biology information that crosses the domains of chemistry and biology has been very limited Results We have created a single repository called Chem2Bio2RDF by aggregating data from multiple chemogenomics repositories that is cross-linked into Bio2RDF and LODD. We have also created a linked-path generation tool to facilitate SPARQL query generation, and have created extended SPARQL functions to address specific chemical/biological search needs. We demonstrate the utility of Chem2Bio2RDF in investigating polypharmacology, identification of potential multiple pathway inhibitors, and the association of pathways with adverse drug reactions. Conclusions We have created a new semantic systems chemical biology resource, and have demonstrated its potential usefulness in specific examples of polypharmacology, multiple pathway inhibition and adverse drug reaction - pathway mapping. We have also demonstrated the usefulness of extending SPARQL with cheminformatics and bioinformatics functionality.</p

Crossref

Springer - Publisher Connector

IUScholarWorks (University of Indiana)

Directory of Open Access Journals

PubMed Central

A Simple Standard for Sharing Ontological Mappings (SSSOM).

Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec. Database URL: http://w3id.org/sssom/spec

The Jackson Laboratory: The Mouseion at the JAXlibrary

The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment.

Author: Amor Benjamin
Austin Christopher P
Bennett Tellen D
Blacketer Clair
Bradford Robert L
Chute Christopher G
Cimino James J
Clark Marshall
Colmenares Evan W
Eichmann David A
Francis Patricia A
Gabriel Davera
Gersing Ken R
Girvin Andrew T
Graves Alexis
Guinney Justin
Haendel Melissa A
Hemadri Raju
Hong Stephanie S
Hripscak George
Jiao Dazhi
Kibbe Warren A
Klann Jeffrey G
Kostka Kristin
Kurilla Michael G
Lee Adam M
Lehmann Harold P
Lingrey Lora
Manna Amin
Michael Sam G
Miller Robert T
Morris Michele
Murphy Shawn N
Natarajan Karthik
Palchuk Matvey B
Payne Philip R O
Pfaff Emily R
Portilla Lili M
Qureshi Nabeel
Robinson Peter N
Rutter Joni L
Saltz Joel H
Sheikh Usman
Solbrig Harold
Spratt Heidi
Suver Christine
Visweswaran Shyam
Walden Anita
Walters Kellie M
Weber Griffin M
Wilbanks John
Wilcox Adam B
Williams Andrew E
Wu Chunlei
Zhang Xiaohan Tanner
Zhu Richard L
Publication venue: The Mouseion at the JAXlibrary
Publication date: 01/03/2021
Field of study

OBJECTIVE: Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers. MATERIALS AND METHODS: The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics. RESULTS: Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access. CONCLUSIONS: The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19

The Jackson Laboratory: The Mouseion at the JAXlibrary

Userscripts for the Life Sciences

Author: A Herráez
AB Majumder
BM Good
C Knox
Christoph Steinbeck
David J Wild
Dazhi Jiao
DK Agrafiotis
E Willighagen
Egon L Willighagen
Harini Gopalakrishnan
HM Berman
JA Fox
JA Townsend
KV Mardia
M Ashburner
M Karthikeyan
MD Wilkinson
MY Galperin
Noel M O'Boyle
O Spjuth
P Corbett
P Ertl
R Guha
Rajarshi Guha
SJ Coles
T Etzold
T Lee
X Dong
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Scholarly Impact Services: Who we are, and what we do

Author: Jiao Dazhi
Ye Yunshan
Publication venue
Publication date: 01/06/2019
Field of study

Slide deck from a presentation about Sheridan Libraries new metrics service. Part of the 2019 Hopkins Libraries Staff Assembly.Slide deck from a presentation about Sheridan Libraries new metrics service. Part of the 2019 Hopkins Libraries Staff Assembly

JScholarship

www.emeraldinsight.com/researchregister www.emeraldinsight.com/0737-8831.htm

Author: All Floyd
Dazhi Jiao
Jenn Riley
Michelle Dalmau
Publication venue
Publication date
Field of study

OTHER ARTICLE Integrating thesaurus relationships into search and browse in an online photograph collectio

CiteSeerX

Probabilistic Inference of Biochemical Reactions in Microbial Communities from Metagenomic Sequences

Author: Dazhi Jiao (392777)
Haixu Tang (16054)
Yuzhen Ye (4283)
Publication venue
Publication date: 01/01/2013
Field of study

<div>Shotgun metagenomics has been applied to the studies of the functionality of various microbial communities. As a critical analysis step in these studies, biological pathways are reconstructed based on the genes predicted from metagenomic shotgun sequences. Pathway reconstruction provides insights into the functionality of a microbial community and can be used for comparing multiple microbial communities. The utilization of pathway reconstruction, however, can be jeopardized because of imperfect functional annotation of genes, and ambiguity in the assignment of predicted enzymes to biochemical reactions (e.g., some enzymes are involved in multiple biochemical reactions). Considering that metabolic functions in a microbial community are carried out by many enzymes in a collaborative manner, we present a probabilistic sampling approach to profiling functional content in a metagenomic dataset, by sampling functions of catalytically promiscuous enzymes within the context of the entire metabolic network defined by the annotated metagenome. We test our approach on metagenomic datasets from environmental and human-associated microbial communities. The results show that our approach provides a more accurate representation of the metabolic activities encoded in a metagenome, and thus improves the comparative analysis of multiple microbial communities. In addition, our approach reports likelihood scores of putative reactions, which can be used to identify important reactions and metabolic pathways that reflect the environmental adaptation of the microbial communities. Source code for sampling metabolic networks is available online at <a href="http://omics.informatics.indiana.edu/mg/MetaNetSam/" target="_blank">http://omics.informatics.indiana.edu/mg/MetaNetSam/</a>. </div

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

FigShare

Network of reactions that are different in the two environments.

Author: Dazhi Jiao (392777)
Haixu Tang (16054)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

Each vertex represents a reaction. An edge is connected between two vertices if the two reactions share one or more metabolites. Square shaped vertices represent the reactions discovered to be different by using t-test on marginal probabilities, but not different when using the Fisher's test on the enzyme occurrences; Circle shaped vertices represent the reactions considered to be different in both statistical tests. (a) 327 reactions with higher marginal probabilities in Alaska permafrost samples; (b) 120 reactions with lower marginal probabilities in Alaska permafrost samples.</p

FigShare

Properties of the Markov chain.

Author: Dazhi Jiao (392777)
Haixu Tang (16054)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

(a) Correlations of the probability of reaction in consecutive subnetworks sampled from the Markov chain. As the batch size in subsampling increases, the correlation decreases and become insignificant (0.1) for most reactions when batch size is set to 10,000. (b) Ergodic averages of the marginal probability for all reactions catalyzed by promiscuous enzymes in a metagenome. (subsampling with batch size = 10,000) (c) Running time of the Markov chain of global metabolic networks of various sizes for 250 million iterations. Top are the total numbers of reactions in each sample. Bottom are the numbers of reactions that are catalyzed by catalytically promiscuous enzymes.</p

FigShare