Search CORE

4,364 research outputs found

Recommended from our members

Integrating biomedical research and electronic health records to create knowledge-based biologically meaningful machine-readable embeddings.

Author: Baranzini Sergio E
Butte Atul J
Nelson Charlotte A
Publication venue: eScholarship, University of California
Publication date: 01/07/2019
Field of study

In order to advance precision medicine, detailed clinical features ought to be described in a way that leverages current knowledge. Although data collected from biomedical research is expanding at an almost exponential rate, our ability to transform that information into patient care has not kept at pace. A major barrier preventing this transformation is that multi-dimensional data collection and analysis is usually carried out without much understanding of the underlying knowledge structure. Here, in an effort to bridge this gap, Electronic Health Records (EHRs) of individual patients are connected to a heterogeneous knowledge network called Scalable Precision Medicine Oriented Knowledge Engine (SPOKE). Then an unsupervised machine-learning algorithm creates Propagated SPOKE Entry Vectors (PSEVs) that encode the importance of each SPOKE node for any code in the EHRs. We argue that these results, alongside the natural integration of PSEVs into any EHR machine-learning platform, provide a key step toward precision medicine

eScholarship - University of California

Characterization of protein interactions by mass spectrometry and bioinformatics

Author: Solis Mezarino Victor
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 20/03/2019
Field of study

Bacterial protein meta-interactomes predict cross-species interactions and protein function

Author: Caufield J. Harry
Shary Semarjit
Uetz Peter
Wimble Christopher
Wuchty Stefan
Publication venue: VCU Scholars Compass
Publication date: 01/01/2017
Field of study

Background Protein-protein interactions (PPIs) can offer compelling evidence for protein function, especially when viewed in the context of proteome-wide interactomes. Bacteria have been popular subjects of interactome studies: more than six different bacterial species have been the subjects of comprehensive interactome studies while several more have had substantial segments of their proteomes screened for interactions. The protein interactomes of several bacterial species have been completed, including several from prominent human pathogens. The availability of interactome data has brought challenges, as these large data sets are difficult to compare across species, limiting their usefulness for broad studies of microbial genetics and evolution. Results In this study, we use more than 52,000 unique protein-protein interactions (PPIs) across 349 different bacterial species and strains to determine their conservation across data sets and taxonomic groups. When proteins are collapsed into orthologous groups (OGs) the resulting meta-interactome still includes more than 43,000 interactions, about 14,000 of which involve proteins of unknown function. While conserved interactions provide support for protein function in their respective species data, we found only 429 PPIs (~1% of the available data) conserved in two or more species, rendering any cross-species interactome comparison immediately useful. The meta-interactome serves as a model for predicting interactions, protein functions, and even full interactome sizes for species with limited to no experimentally observed PPI, including Bacillus subtilis and Salmonella enterica which are predicted to have up to 18,000 and 31,000 PPIs, respectively. Conclusions In the course of this work, we have assembled cross-species interactome comparisons that will allow interactomics researchers to anticipate the structures of yet-unexplored microbial interactomes and to focus on well-conserved yet uncharacterized interactors for further study. Such conserved interactions should provide evidence for important but yet-uncharacterized aspects of bacterial physiology and may provide targets for anti-microbial therapies

VCU Scholars Compass

Validating module network learning algorithms using simulated data

Author: A Battle
A Butte
AA Petti
AJ Butte
Anagha Joshi
AP Gasch
CE Shannon
CT Harbison
D Pe'er
D Pe'er
E Segal
E Segal
E Segal
Eric Bonnet
HW Ma
J Kasturi
J Sinkkonen
K Basso
K Lemmens
KA Heller
Kathleen Marchal
Koenraad Van Leemput
LH Hartwell
M Ashburner
MA Beer
Martin Kuiper
MJL de Hoon
N Friedman
N Friedman
NM Luscombe
Piet van Remortel
S Maere
Steven Maere
T Ideker
T Van den Bulcke
T Van den Bulcke
Tim Van den Bulcke
Tom Michoel
X Xu
Y Garten
Yvan Saeys
Yves Van de Peer
Z Bar-Joseph
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

In recent years, several authors have used probabilistic graphical models to learn expression modules and their regulatory programs from gene expression data. Here, we demonstrate the use of the synthetic data generator SynTReN for the purpose of testing and comparing module network learning algorithms. We introduce a software package for learning module networks, called LeMoNe, which incorporates a novel strategy for learning regulatory programs. Novelties include the use of a bottom-up Bayesian hierarchical clustering to construct the regulatory programs, and the use of a conditional entropy measure to assign regulators to the regulation program nodes. Using SynTReN data, we test the performance of LeMoNe in a completely controlled situation and assess the effect of the methodological changes we made with respect to an existing software package, namely Genomica. Additionally, we assess the effect of various parameters, such as the size of the data set and the amount of noise, on the inference performance. Overall, application of Genomica and LeMoNe to simulated data sets gave comparable results. However, LeMoNe offers some advantages, one of them being that the learning process is considerably faster for larger data sets. Additionally, we show that the location of the regulators in the LeMoNe regulation programs and their conditional entropy may be used to prioritize regulators for functional validation, and that the combination of the bottom-up clustering strategy with the conditional entropy-based assignment of regulators improves the handling of missing or hidden regulators.Comment: 13 pages, 6 figures + 2 pages, 2 figures supplementary informatio

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Ghent University Academic Bibliography

PubMed Central

Edinburgh Research Explorer

Scalable Probabilistic Model Selection for Network Representation Learning in Biological Network Inference

Author: Kishan K C
Publication venue: RIT Scholar Works
Publication date: 01/02/2022
Field of study

A biological system is a complex network of heterogeneous molecular entities and their interactions contributing to various biological characteristics of the system. Although the biological networks not only provide an elegant theoretical framework but also offer a mathematical foundation to analyze, understand, and learn from complex biological systems, the reconstruction of biological networks is an important and unsolved problem. Current biological networks are noisy, sparse and incomplete, limiting the ability to create a holistic view of the biological reconstructions and thus fail to provide a system-level understanding of the biological phenomena. Experimental identification of missing interactions is both time-consuming and expensive. Recent advancements in high-throughput data generation and significant improvement in computational power have led to novel computational methods to predict missing interactions. However, these methods still suffer from several unresolved challenges. It is challenging to extract information about interactions and incorporate that information into the computational model. Furthermore, the biological data are not only heterogeneous but also high-dimensional and sparse presenting the difficulty of modeling from indirect measurements. The heterogeneous nature and sparsity of biological data pose significant challenges to the design of deep neural network structures which use essentially either empirical or heuristic model selection methods. These unscalable methods heavily rely on expertise and experimentation, which is a time-consuming and error-prone process and are prone to overfitting. Furthermore, the complex deep networks tend to be poorly calibrated with high confidence on incorrect predictions. In this dissertation, we describe novel algorithms that address these challenges. In Part I, we design novel neural network structures to learn representation for biological entities and further expand the model to integrate heterogeneous biological data for biological interaction prediction. In part II, we develop a novel Bayesian model selection method to infer the most plausible network structures warranted by data. We demonstrate that our methods achieve the state-of-the-art performance on the tasks across various domains including interaction prediction. Experimental studies on various interaction networks show that our method makes accurate and calibrated predictions. Our novel probabilistic model selection approach enables the network structures to dynamically evolve to accommodate incrementally available data. In conclusion, we discuss the limitations and future directions for proposed works

RIT Scholar Works

Structure-Templated Predictions of Novel Protein Interactions from Sequence Information

Author: Christopher W. V Hogue
Danielle Dewar-Darch
Doron Betel
Kevin E Breitkreuz
Luhua Lai
Mike Tyers
Ruth Isserlin
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

The multitude of functions performed in the cell are largely controlled by a set of carefully orchestrated protein interactions often facilitated by specific binding of conserved domains in the interacting proteins. Interacting domains commonly exhibit distinct binding specificity to short and conserved recognition peptides called binding profiles. Although many conserved domains are known in nature, only a few have well-characterized binding profiles. Here, we describe a novel predictive method known as domain–motif interactions from structural topology (D-MIST) for elucidating the binding profiles of interacting domains. A set of domains and their corresponding binding profiles were derived from extant protein structures and protein interaction data and then used to predict novel protein interactions in yeast. A number of the predicted interactions were verified experimentally, including new interactions of the mitotic exit network, RNA polymerases, nucleotide metabolism enzymes, and the chaperone complex. These results demonstrate that new protein interactions can be predicted exclusively from sequence information

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Integrative Identification of Arabidopsis Mitochondrial Proteome and Its Function Exploitation through Protein Interaction Network

Author: A Drawid
A Drawid
A Höglund
A Kumar
A Reinhardt
A Vianello
AA Vashisht
AH Millar
AH Millar
BJ Haas
BW Rhee SY
C Bai
C Guda
C Jonak
CG Bartoli
CG Kurland
CM Lee
D Skowyra
DA Bota
David Moore
E Delannoy
E Jambrina
E Mazzucotelli
EE Patton
EE Patton
EH Kruft V
EM Marcotte
EO Karlberg
F Rébeillé
GW Tian
H Bannai
H Fölsch
H Prokisch
HN Chua
I Small
ID Small
IM Moller
J Balk
J Bardel
J Cui
J Huang
J Kilian
JA Kreps
Jian Cui
Jinghua Liu
JK Zhu
JL Heazlewood
JL Heazlewood
JL Heazlewood
K Ishizaki
K Meierhoff
KP O'Brien
L Li
LJ Lu
M Teige
M Unseld
MG Claros
MG Claros
O Emanuelsson
O Van Aken
OA Koroleva
P Horton
P Pavlidis
R Nair
R Nair
RA Irizarry
S Hua
S Killcoyne
S Li
S Ma
S Maere
S Mahajan
S Mili
SG Andersson
T Sing
Tieliu Shi
V Gueguen
VK Mootha
Vladimir Uversky
W Werhahn
WK Huh
X Gong
Y Gavel
YD Cai
YD Cai
Yuhua Li
Z Liu
Z Yuan
Publication venue: Public Library of Science
Publication date: 31/01/2011
Field of study

Mitochondria are major players on the production of energy, and host several key reactions involved in basic metabolism and biosynthesis of essential molecules. Currently, the majority of nucleus-encoded mitochondrial proteins are unknown even for model plant Arabidopsis. We reported a computational framework for predicting Arabidopsis mitochondrial proteins based on a probabilistic model, called Naive Bayesian Network, which integrates disparate genomic data generated from eight bioinformatics tools, multiple orthologous mappings, protein domain properties and co-expression patterns using 1,027 microarray profiles. Through this approach, we predicted 2,311 candidate mitochondrial proteins with 84.67% accuracy and 2.53% FPR performances. Together with those experimental confirmed proteins, 2,585 mitochondria proteins (named CoreMitoP) were identified, we explored those proteins with unknown functions based on protein-protein interaction network (PIN) and annotated novel functions for 26.65% CoreMitoP proteins. Moreover, we found newly predicted mitochondrial proteins embedded in particular subnetworks of the PIN, mainly functioning in response to diverse environmental stresses, like salt, draught, cold, and wound etc. Candidate mitochondrial proteins involved in those physiological acitivites provide useful targets for further investigation. Assigned functions also provide comprehensive information for Arabidopsis mitochondrial proteome

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

InSite: a computational method for identifying protein-protein interaction binding sites on a proteome-wide scale

Author: Asa Ben-hur
Daphne Koller
Eran Segal
Eran Segal
Haidong Wang
Haidong Wang
Marc Vidal
Marc Vidal
Qianru Li
Qianru Li
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

InSite is a computational method that integrates high-throughput protein and sequence data to infer the specific binding regions of interacting protein pairs

CiteSeerX

Crossref

Harvard University - DASH

Springer - Publisher Connector

PubMed Central