Search CORE

144 research outputs found

A disulfide bridge allows for site-selective binding in liver bile acid binding protein thereby stabilising the orientation of key amino acid side chains

Author: Assfalg M.
Cogliati C.
Molinari H.
Pagano K.
Ragona L.
Tomaselli S.
Zanzoni S.
Publication venue
Publication date: 01/01/2012
Field of study

Preface: BITS2014, the annual meeting of the Italian Society of Bioinformatics

Author: Angelini C
Bosotti R
Facchiano A
Guffanti A
Helmer-Citterich M
Marabotti A
Marangoni R
Pascarella S
Romano P
Zanzoni A
Publication venue
Publication date: 01/01/2015
Field of study

This Preface introduces the content of the BioMed Central journal Supplements related to BITS2014 meeting, held in Rome, Italy, from the 26th to the 28th of February, 2014

Crossref

HAL AMU

Springer - Publisher Connector

PubMed Central

Archivio della Ricerca - Università di Salerno

ART

Archivio della ricerca- Università di Roma La Sapienza

Computation of significance scores of unweighted Gene Set Enrichment Analyses

Author: A Subramanian
A Zanzoni
Andreas Keller
C Backes
C Backes
Christina Backes
E Rubin
H Hermjakob
H Lee
Hans-Peter Lenhof
J Küntzer
J Lamb
L Salwinski
M Kanehisa
M Krull
S Kim
S Peri
S Wachi
T Barrett
TGO Consortium
V Matys
V Mootha
Y Benjamini
Y Hochberg
Z Jiang
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Gene Set Enrichment Analysis (GSEA) is a computational method for the statistical evaluation of sorted lists of genes or proteins. Originally GSEA was developed for interpreting microarray gene expression data, but it can be applied to any sorted list of genes. Given the gene list and an arbitrary biological category, GSEA evaluates whether the genes of the considered category are randomly distributed or accumulated on top or bottom of the list. Usually, significance scores (p-values) of GSEA are computed by nonparametric permutation tests, a time consuming procedure that yields only estimates of the p-values. Results We present a novel dynamic programming algorithm for calculating exact significance values of unweighted Gene Set Enrichment Analyses. Our algorithm avoids typical problems of nonparametric permutation tests, as varying findings in different runs caused by the random sampling procedure. Another advantage of the presented dynamic programming algorithm is its runtime and memory efficiency. To test our algorithm, we applied it not only to simulated data sets, but additionally evaluated expression profiles of squamous cell lung cancer tissue and autologous unaffected tissue.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Exploiting likely-positive and unlabeled data to improve the identification of protein-protein interaction articles

Author: A Yakushiji
A Zanzoni
AR Mendelsohn
B Liu
BJ Breitkreutz
EM Marcotte
GD Bader
Hong-Jie Dai
Hsi-Chuan Hung
I Xenarios
J Thomas
JA Hanley
JM Temkin
LM Manevitz
M Krallinger
M Lan
N Cristianini
Richard Tzong-Han Tsai
S Fields
S Fujita
S Peri
S Robertson
T Joachims
T Ono
U Güldener
Wen-Lian Hsu
Y Hao
Yi-Wen Lin
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Experimentally verified protein-protein interactions (PPI) cannot be easily retrieved by researchers unless they are stored in PPI databases. The curation of such databases can be made faster by ranking newly-published articles' relevance to PPI, a task which we approach here by designing a machine-learning-based PPI classifier. All classifiers require labeled data, and the more labeled data available, the more reliable they become. Although many PPI databases with large numbers of labeled articles are available, incorporating these databases into the base training data may actually reduce classification performance since the supplementary databases may not annotate exactly the same PPI types as the base training data. Our first goal in this paper is to find a method of selecting likely positive data from such supplementary databases. Only extracting likely positive data, however, will bias the classification model unless sufficient negative data is also added. Unfortunately, negative data is very hard to obtain because there are no resources that compile such information. Therefore, our second aim is to select such negative data from unlabeled PubMed data. Thirdly, we explore how to exploit these likely positive and negative data. And lastly, we look at the somewhat unrelated question of which term-weighting scheme is most effective for identifying PPI-related articles. Results To evaluate the performance of our PPI text classifier, we conducted experiments based on the BioCreAtIvE-II IAS dataset. Our results show that adding likely-labeled data generally increases AUC by 3~6%, indicating better ranking ability. Our experiments also show that our newly-proposed term-weighting scheme has the highest AUC among all common weighting schemes. Our final model achieves an F-measure and AUC 2.9% and 5.0% higher than those of the top-ranking system in the IAS challenge. Conclusion Our experiments demonstrate the effectiveness of integrating unlabeled and likely labeled data to augment a PPI text classification system. Our mixed model is suitable for ranking purposes whereas our hierarchical model is better for filtering. In addition, our results indicate that supervised weighting schemes outperform unsupervised ones. Our newly-proposed weighting scheme, TFBRF, which considers documents that do not contain the target word, avoids some of the biases found in traditional weighting schemes. Our experiment results show TFBRF to be the most effective among several other top weighting schemes.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

SIDEKICK: Genomic data driven analysis and decision-making framework

Author: A Rowe
A Zanzoni
AP Dempster
AP Dempster
C Alfarano
C Stark
G Bindea
G Joshi-Tope
H Hermjakob
H Parkinson
H Ramos
HB Fraser
J Goodman
JC Bare
JD Han
Kay A Robbins
Kihoon Yoon
L Salwinski
M Castellano
M Doderer
M Jayapandian
M Reich
Mark S Doderer
P Pagel
PT Shannon
S Grossmann
S Mathivanan
S Matos
S Peri
S Pounds
SN Goodman
T Beuming
T Cover
U Stelzl
Z Du
Publication venue: BioMed Central
Publication date: 01/12/2010
Field of study

Abstract Background Scientists striving to unlock mysteries within complex biological systems face myriad barriers in effectively integrating available information to enhance their understanding. While experimental techniques and available data sources are rapidly evolving, useful information is dispersed across a variety of sources, and sources of the same information often do not use the same format or nomenclature. To harness these expanding resources, scientists need tools that bridge nomenclature differences and allow them to integrate, organize, and evaluate the quality of information without extensive computation. Results Sidekick, a genomic data driven analysis and decision making framework, is a web-based tool that provides a user-friendly intuitive solution to the problem of information inaccessibility. Sidekick enables scientists without training in computation and data management to pursue answers to research questions like "What are the mechanisms for disease X" or "Does the set of genes associated with disease X also influence other diseases." Sidekick enables the process of combining heterogeneous data, finding and maintaining the most up-to-date data, evaluating data sources, quantifying confidence in results based on evidence, and managing the multi-step research tasks needed to answer these questions. We demonstrate Sidekick's effectiveness by showing how to accomplish a complex published analysis in a fraction of the original time with no computational effort using Sidekick. Conclusions Sidekick is an easy-to-use web-based tool that organizes and facilitates complex genomic research, allowing scientists to explore genomic relationships and formulate hypotheses without computational effort. Possible analysis steps include gene list discovery, gene-pair list discovery, various enrichments for both types of lists, and convenient list manipulation. Further, Sidekick's ability to characterize pairs of genes offers new ways to approach genomic analysis that traditional single gene lists do not, particularly in areas such as interaction discovery.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ChemProt: a disease chemical biology database

Author: A. Bora
Bader
Camon
Chamba
Chen
D. Edsgard
Durant
F. S. Roque
Guldener
Halden
Hamosh
Hermjakob
Hewett
I. Kouskoumvekaki
Joshi-Tope
K. Audouze
Kanehisa
Keiser
Keiser
Knight
Kuhn
Lage
Mestres
Mestres
Mishra
N. Weinhold
O'Brien
O. Taboureau
Oprea
Pafilis
Ponten
R. Curpan
Roth
Rual
S. Brunak
S. K. Nielsen
Safran
Salwinski
Stark
T. I. Oprea
T. S. Jensen
Weill
Willett
Wishart
Y ld r m
Zanzoni
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Systems pharmacology is an emergent area that studies drug action across multiple scales of complexity, from molecular and cellular to tissue and organism levels. There is a critical need to develop network-based approaches to integrate the growing body of chemical biology knowledge with network biology. Here, we report ChemProt, a disease chemical biology database, which is based on a compilation of multiple chemical–protein annotation resources, as well as disease-associated protein–protein interactions (PPIs). We assembled more than 700 000 unique chemicals with biological annotation for 30 578 proteins. We gathered over 2-million chemical–protein interactions, which were integrated in a quality scored human PPI network of 428 429 interactions. The PPI network layer allows for studying disease and tissue specificity through each protein complex. ChemProt can assist in the in silico evaluation of environmental chemicals, natural products and approved drugs, as well as the selection of new compounds based on their activity profile against most known biological targets, including those related to adverse drug events. Results from the disease chemical biology database associate citalopram, an antidepressant, with osteogenesis imperfect and leukemia and bisphenol A, an endocrine disruptor, with certain types of cancer, respectively. The server can be accessed at http://www.cbs.dtu.dk/services/ChemProt/

CiteSeerX

Crossref

PubMed Central

Copenhagen University Research Information System

Online Research Database In Technology

Walk-weighted subsequence kernels for protein-protein interaction extraction

Author: A Airola
A Bairoch
A Culotta
A Moschitti
A Zanzoni
B Boeckmann
C Giuliano
C Hsu
D Sleator
G Zhou
GD Bader
H Lodhi
J Hakenberg
J Kim
J Shawe-Taylor
Jihoon Yang
Juntae Yoon
K Fundel
M Huang
M Lease
M Miwa
M Miwa
R Bunescu
R Sætre
S Aubin
S Pyysalo
S Riedel
Seog Park
Seonho Kim
SH Kim
SM Harabagiu
T Ono
TH Cormen
Y Miyao
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Candidate gene prioritization by network analysis of differential expression using machine learning approaches

Author: A Subramanian
A Zanzoni
AJ Smola
AP Francisco
B Aranda
B Harr
Bart de Moor
C Saunders
C Stark
C von Mering
D Nitsch
D Zieker
Daniela Nitsch
F Chung
F Fouss
Fabian Ojeda
GC Cawley
GD Bader
H Yang
HY Chuang
J Chen
JA Hanley
Joana P Gonçalves
JW Park
K Lage
KR Brown
L Franke
L Gautier
L Salwinski
LC Tranchevent
M Liu
P Baldi
P Pagel
R Gupta
RA Irizarry
RI Kondor
RK Nibbe
S Aerts
S Köhler
S Mirkin
S Razick
S Vardhanabhuti
SE Choe
T Fawcett
WK Lim
Y Saad
Yves Moreau
Z Wu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Discovering novel disease genes is still challenging for diseases for which no prior knowledge - such as known disease genes or disease-related pathways - is available. Performing genetic studies frequently results in large lists of candidate genes of which only few can be followed up for further investigation. We have recently developed a computational method for constitutional genetic disorders that identifies the most promising candidate genes by replacing prior knowledge by experimental data of differential gene expression between affected and healthy individuals. To improve the performance of our prioritization strategy, we have extended our previous work by applying different machine learning approaches that identify promising candidate genes by determining whether a gene is surrounded by highly differentially expressed genes in a functional association or protein-protein interaction network. Results We have proposed three strategies scoring disease candidate genes relying on network-based machine learning approaches, such as kernel ridge regression, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expression levels (<it>Simple Expression Ranking</it>). Our results showed that our four strategies could outperform this standard procedure and that the best results were obtained using the <it>Heat Kernel Diffusion Ranking </it>leading to an average ranking position of 8 out of 100 genes, an AUC value of 92.3% and an error reduction of 52.8% relative to the standard procedure approach which ranked the knockout gene on average at position 17 with an AUC value of 83.7%. Conclusion In this study we could identify promising candidate genes using network based machine learning approaches even if no knowledge is available about the disease or phenotype.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MassNet: a functional annotation service for protein mass spectrometry data.

Author: Aebersold
Ashburner
B.-C. Kim
Bader
Camon
D. Park
Dennis
Guldener
Guruprasad
Hermjakob
J. Bhak
J.-S. Choi
Kanehisa
Kemmeren
Kim
Kyte
K ll
Marcotte
Mering
Pandey
Park
Peri
Perkins
S. I. Kim
S. Lee
S.-J. Park
S.-W. Cho
Stark
Walhout
Xenarios
Zanzoni
Publication venue: 'Oxford University Press (OUP)'
Publication date: 04/11/2015
Field of study

Although mass spectrometry has been frequently used to identify proteins, there are no web servers that provide comprehensive functional annotation of those identified proteins. It is necessary to provide such web service due to a rapid increase in the data. We, therefore, introduce MassNet, which provides (i) physico-chemical analysis information, (ii) KEGG pathway assignment (iii) Gene Ontology mapping and (iv) proteinprotein interaction (PPI) prediction for the data from MASCOT, Prospector and Profound. MassNet provides the prediction information for PPIs using both 3D structural interaction and experimental interaction deposited in PSIMAP, BIND, DIP, HPRD, IntAct, MINT, CYGD and BioGrid. The web service is freely available at http://massnet.kr or http://sequenceome.kobic.re.kr/MassNet/close4

Crossref

ScholarWorks@UNIST

Network Neighbors of Drug Targets Contribute to Drug Side-Effect Similarity

Author: A Bender
A Zanzoni
AF Fliri
AF Fliri
AL Hopkins
AP Chiang
C von Mering
C von Mering
DC Liebler
DS Wishart
ED Kharasch
G Zurcher
Georg Zeller
GJ Sanger
GV Paolini
H Pettersson
HG Whittington
I Kola
J Loughlin
JC Nacher
Joaquín Dopazo
KH Deane
L Lemberger
L Xie
L Zhang
Lucas Brouwers
M Berthouze
M Campillos
M Iskar
M Kuhn
M Kuhn
MJ Keiser
Murat Iskar
P Willett
Peer Bork
RA Pache
RC Hatton
RR Reeves
RS Foti
S Appel
S Gunther
S Suthram
S Zhao
SF Altschul
SF Lin
T Hase
Vera van Noort
Y Yamanishi
YC Martin
Z Desta
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

In pharmacology, it is essential to identify the molecular mechanisms of drug action in order to understand adverse side effects. These adverse side effects have been used to infer whether two drugs share a target protein. However, side-effect similarity of drugs could also be caused by their target proteins being close in a molecular network, which as such could cause similar downstream effects. In this study, we investigated the proportion of side-effect similarities that is due to targets that are close in the network compared to shared drug targets. We found that only a minor fraction of side-effect similarities (5.8 %) are caused by drugs targeting proteins close in the network, compared to side-effect similarities caused by overlapping drug targets (64%). Moreover, these targets that cause similar side effects are more often in a linear part of the network, having two or less interactions, than drug targets in general. Based on the examples, we gained novel insight into the molecular mechanisms of side effects associated with several drug targets. Looking forward, such analyses will be extremely useful in the process of drug development to better understand adverse side effects

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

MDC Repository