Search CORE

Hanze UAS repository

Pure OAI Repository

Benchmarking ortholog identification methods using functional genomics data

Author: de Vlieg Jacob
Groenen Peter MA
Hulsen Tim
Huynen Martijn A
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The transfer of functional annotations from model organism proteins to human proteins is one of the main applications of comparative genomics. Various methods are used to analyze cross-species orthologous relationships according to an operational definition of orthology. Often the definition of orthology is incorrectly interpreted as a prediction of proteins that are functionally equivalent across species, while in fact it only defines the existence of a common ancestor for a gene in different species. However, it has been demonstrated that orthologs often reveal significant functional similarity. Therefore, the quality of the orthology prediction is an important factor in the transfer of functional annotations (and other related information). To identify protein pairs with the highest possible functional similarity, it is important to qualify ortholog identification methods. RESULTS: To measure the similarity in function of proteins from different species we used functional genomics data, such as expression data and protein interaction data. We tested several of the most popular ortholog identification methods. In general, we observed a sensitivity/selectivity trade-off: the functional similarity scores per orthologous pair of sequences become higher when the number of proteins included in the ortholog groups decreases. CONCLUSION: By combining the sensitivity and the selectivity into an overall score, we show that the InParanoid program is the best ortholog identification method in terms of identifying functionally equivalent proteins

Drug Design Approaches to Manipulate the Agonist-Antagonist Equilibrium in Steroid Receptors

Author: Jacob de Vlieg
Paolo Conti
Pedro H. Hermkens
Scott J. Lusher
Wim Dokter
Publication venue: 'IntechOpen'
Publication date: 11/01/2012
Field of study

IntechOpen

Integrating gene expression and GO classification for PCA by preclustering

Author: Bauerschmidt Susanne
Buydens Lutgarde MC
De Haan Jorn R
de Vlieg Jacob
Piek Ester
van Schaik Rene C
Wehrens Ron
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Contains fulltext : 83407.pdf (publisher's version ) (Open Access)10 p

Prednisolone-induced differential gene expression in mouse liver carrying wild type or a dimerization-defective glucocorticoid receptor

Author: Alkema Wynand
de Vlieg Jacob
Dokter Wim
Fleuren Wilco
Frijters Raoul
Reichardt Holger M
Toonen Erik JM
Tuckermann Jan P
van der Maaden Hans
van Elsas Andrea
van Lierop Marie-Jose
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Contains fulltext : 89658.pdf (publisher's version ) (Open Access)BACKGROUND: Glucocorticoids (GCs) control expression of a large number of genes via binding to the GC receptor (GR). Transcription may be regulated either by binding of the GR dimer to DNA regulatory elements or by protein-protein interactions of GR monomers with other transcription factors. Although the type of regulation for a number of individual target genes is known, the relative contribution of both mechanisms to the regulation of the entire transcriptional program remains elusive. To study the importance of GR dimerization in the regulation of gene expression, we performed gene expression profiling of livers of prednisolone-treated wild type (WT) and mice that have lost the ability to form GR dimers (GRdim). RESULTS: The GR target genes identified in WT mice were predominantly related to glucose metabolism, the cell cycle, apoptosis and inflammation. In GRdim mice, the level of prednisolone-induced gene expression was significantly reduced compared to WT, but not completely absent. Interestingly, for a set of genes, involved in cell cycle and apoptosis processes and strongly related to Foxo3a and p53, induction by prednisolone was completely abolished in GRdim mice. In contrast, glucose metabolism-related genes were still modestly upregulated in GRdim mice upon prednisolone treatment. Finally, we identified several novel GC-inducible genes from which Fam107a, a putative histone acetyltransferase complex interacting protein, was most strongly dependent on GR dimerization. CONCLUSIONS: This study on prednisolone-induced effects in livers of WT and GRdim mice identified a number of interesting candidate genes and pathways regulated by GR dimers and sheds new light onto the complex transcriptional regulation of liver function by GCs

Hanze UAS repository

PhyloPat: phylogenetic pattern analysis of eukaryotic genes

Author: A Kasprzyk
C Minguillon
DA Natale
DL Wheeler
E Birney
F Al-Shahrour
F Chen
GP Wagner
H Li
Jacob de Vlieg
JF Dufayard
JO Korbel
K Reichard
M Ashburner
Peter MA Groenen
PS Dehal
R Fredriksson
RC Edgar
S Guindon
T Hulsen
TA Eyre
Tim Hulsen
V Matys
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Phylogenetic patterns show the presence or absence of certain genes or proteins in a set of species. They can also be used to determine sets of genes or proteins that occur only in certain evolutionary branches. Phylogenetic patterns analysis has routinely been applied to protein databases such as COG and OrthoMCL, but not upon gene databases. Here we present a tool named PhyloPat which allows the complete Ensembl gene database to be queried using phylogenetic patterns. DESCRIPTION: PhyloPat is an easy-to-use webserver, which can be used to query the orthologies of all complete genomes within the EnsMart database using phylogenetic patterns. This enables the determination of sets of genes that occur only in certain evolutionary branches or even single species. We found in total 446,825 genes and 3,164,088 orthologous relationships within the EnsMart v40 database. We used a single linkage clustering algorithm to create 147,922 phylogenetic lineages, using every one of the orthologies provided by Ensembl. PhyloPat provides the possibility of querying with either binary phylogenetic patterns (created by checkboxes) or regular expressions. Specific branches of a phylogenetic tree of the 21 included species can be selected to create a branch-specific phylogenetic pattern. Users can also input a list of Ensembl or EMBL IDs to check which phylogenetic lineage any gene belongs to. The output can be saved in HTML, Excel or plain text format for further analysis. A link to the FatiGO web interface has been incorporated in the HTML output, creating easy access to functional information. Finally, lists of omnipresent, polypresent and oligopresent genes have been included. CONCLUSION: PhyloPat is the first tool to combine complete genome information with phylogenetic pattern querying. Since we used the orthologies generated by the accurate pipeline of Ensembl, the obtained phylogenetic lineages are reliable. The completeness and reliability of these phylogenetic lineages will further increase with the addition of newly found orthologous relationships within each new Ensembl release

In Silico Veritas: The Pitfalls and Challenges of Predicting

Recently the first community-wide assessments of the prediction of the structures of complexes between proteins and small molecule ligands have been reported in the so-called GPCR Dock 2008 and 2010 assessments. In the current review we discuss the different steps along the protein-ligand modeling workflow by critically analyzing the modeling strategies we used to predict the structures of protein-ligand complexes we submitted to the recent GPCR Dock 2010 challenge. These representative test cases, focusing on the pharmaceutically relevant G Protein-Coupled Receptors, are used to demonstrate the strengths and challenges of the different modeling methods. Our analysis indicates that the proper performance of the sequence alignment, introduction of structural adjustments guided by experimental data, and the usage of experimental data to identify protein-ligand interactions are critical steps in the protein-ligand modeling protocol. © 2011 by the authors; licensee MDPI, Basel, Switzerland

VU Research Portal

Testing statistical significance scores of sequence comparison methods with structure similarity

Author: AA Schaffer
AD Kester
EV Kriventseva
G Salton
GA Price
HS Booth
J Park
Jack AM Leunissen
Jacob de Vlieg
JJ Codani
JP Comet
JT Reese
M Gribskov
O Bastien
P Agarwal
Peter MA Groenen
R Apweiler
RF Doolittle
S Henikoff
SE Brenner
SE Brenner
SF Altschul
T Hulsen
T Rognes
TF Smith
Tim Hulsen
WR Pearson
WR Pearson
WR Pearson
WR Pearson
Z Chen
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences. RESULTS: All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores. CONCLUSION: The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons

Wageningen University & Research Publications

Public Library of Science (PLOS)

Literature Mining for the Discovery of Hidden Connections between Drugs, Genes and Diseases

Author: AA Morgan
AC Nicholson
AJ Perez
Andrey Rzhetsky
AP Weetman
B Dell'Osso
B Rapoport
B Vaidya
BA Imhof
BT Alako
C Blaschke
C Nielsen
C Puozzo
CJ McDougle
CR Faltynek
D Chaussabel
D Denys
D Hristovski
D Olive
D Shao
DB Kell
DR Swanson
DR Swanson
E Yung
EC Butcher
EC Butcher
GR Hajer
H Kakeya
H Shatkay
HP Fischer
I Kola
J Han
J Kuhlmann
JA Wagner
Jacob de Vlieg
JD Wren
JD Wren
K Kajinami
K Miguita
K Njung'e
K Tomiyama
K Vandenborre
L Prokunina
LJ Jensen
M Briley
M Briley
M Campillos
M Hayashi
M Imoto
M Inazu
M Kamata
M Sugiyama
M Yetisgen-Yildiz
MA Andrade
MA Andrade
Marianne van Vugt
N Daraselia
NR Smalheiser
PD Pelton
PR Newby
R Frijters
R Frijters
R Frijters
R Homayouni
R Jelier
RA DiGiacomo
Raoul Frijters
René van Schaik
Ruben Smeets
RY Mukhtar
S Gordon
S Morikawa
S Raychaudhuri
S Raychaudhuri
SN Vaishnavi
SS Fuller
T Fawcett
T Hiramatsu
T Ito
T Shokawa
T Tabata
TK Jenssen
TT Ashburn
U Kaneyuki
WA Colburn
WK Goodman
Wynand Alkema
Y Ichimaru
Y Sugimoto
Y Tamori
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

The scientific literature represents a rich source for retrieval of knowledge on associations between biomedical concepts such as genes, diseases and cellular processes. A commonly used method to establish relationships between biomedical concepts from literature is co-occurrence. Apart from its use in knowledge retrieval, the co-occurrence method is also well-suited to discover new, hidden relationships between biomedical concepts following a simple ABC-principle, in which A and C have no direct relationship, but are connected via shared B-intermediates. In this paper we describe CoPub Discovery, a tool that mines the literature for new relationships between biomedical concepts. Statistical analysis using ROC curves showed that CoPub Discovery performed well over a wide range of settings and keyword thesauri. We subsequently used CoPub Discovery to search for new relationships between genes, drugs, pathways and diseases. Several of the newly found relationships were validated using independent literature sources. In addition, new predicted relationships between compounds and cell proliferation were validated and confirmed experimentally in an in vitro cell proliferation assay. The results show that CoPub Discovery is able to identify novel associations between genes, drugs, pathways and diseases that have a high probability of being biologically valid. This makes CoPub Discovery a useful tool to unravel the mechanisms behind disease, to find novel drug targets, or to find novel applications for existing drugs