Search CORE

Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices

Author: A Ramani
AJ Enright
AM Gustafson
B Scholkopf
C Goh
CS Goh
D Jones
EM Marcotte
F Pazos
F Pazos
I Xenarios
J Felsenstein
J Gertz
JR Bock
K Katoh
Li Liao
M Gribskov
M Kanehisa
M Pellegrini
N Cristianini
P Uetz
R Durbin
R Jothi
Roger A Craig
SM Gomez
T Ito
T Joachims
T Sato
V Vapnik
V Vapnik
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: Protein-protein interactions are critical for cellular functions. Recently developed computational approaches for predicting protein-protein interactions utilize co-evolutionary information of the interacting partners, e.g., correlations between distance matrices, where each matrix stores the pairwise distances between a protein and its orthologs from a group of reference genomes. RESULTS: We proposed a novel, simple method to account for some of the intra-matrix correlations in improving the prediction accuracy. Specifically, the phylogenetic species tree of the reference genomes is used as a guide tree for hierarchical clustering of the orthologous proteins. The distances between these clusters, derived from the original pairwise distance matrix using the Neighbor Joining algorithm, form intermediate distance matrices, which are then transformed and concatenated into a super phylogenetic vector. A support vector machine is trained and tested on pairs of proteins, represented as super phylogenetic vectors, whose interactions are known. The performance, measured as ROC score in cross validation experiments, shows significant improvement of our method (ROC score 0.8446) over that of using Pearson correlations (0.6587). CONCLUSION: We have shown that the phylogenetic tree can be used as a guide to extract intra-matrix correlations in the distance matrices of orthologous proteins, where these correlations are represented as intermediate distance matrices of the ancestral orthologous proteins. Both the unsupervised and supervised learning paradigms benefit from the explicit inclusion of these intermediate distance matrices, and particularly so in the latter case, which offers a better balance between sensitivity and specificity in the prediction of protein-protein interactions

Springer - Publisher Connector

arXiv.org e-Print Archive

Effect of promoter architecture on the cell-to-cell variability in gene expression

Author: A Bar-Even
A Gansen
A Raj
A Raj
A Sanchez
A Singh
A Warmflash
AC Babic
AD Cameron
Alvaro Sanchez
B Muller-Hill
BS Burz
C Zurla
CD Cox
CD Cox
D Kennell
D Müller
D Nevozhay
D Zenklusen
Daniel Jones
DF Browning
DR Rigney
DT Gillespie
DW Austin
E Segal
EM Ozbudak
F Vanzi
F Vanzi
FMV Rossi
G Li
G Tkacik
H Boeger
H Maamar
HD Kim
HD Kim
Hernan G. Garcia
I Golding
IB Dodd
J Elf
J Gertz
J Müller
J Ou
J Paulsson
J Paulsson
J Peccoud
J Yu
Jané Kondev
JK Joung
JK Joung
JM Pedraza
JM Pedraza
JM Raser
JM Vilar
JM Vilar
JR Chubb
JS van Zon
K Gaston
KF Murphy
KS Koblan
L Bintu
L Bintu
L Cai
L Saiz
LS Weinberger
M Ackerman
M Dobrzynski
M Kaern
M Ptashne
M Shin
M Thattai
M Voliotis
MA Shea
MB Elowitz
MF Wernet
MJ Dunlop
N Maheshri
N Rosenfeld
NE Buchler
O Berg
OK Wong
PJ Choi
PJ Ingram
PJ Schlax
PS Gutierrez
PS Swain
R Losik
Rob Phillips
S Klumpp
S Semsey
SB Straney
SE Halford
T Höfer
T Kuhlman
T Raveh-Sadka
TB Kepler
TL To
TP Malan
TS Karpova
U Moran
V Shahrezaei
WJ Blake
WJ Blake
Wyeth W. Wasserman
Y Taniguchi
Y Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 11/08/2010
Field of study

According to recent experimental evidence, the architecture of a promoter, defined as the number, strength and regulatory role of the operators that control the promoter, plays a major role in determining the level of cell-to-cell variability in gene expression. These quantitative experiments call for a corresponding modeling effort that addresses the question of how changes in promoter architecture affect noise in gene expression in a systematic rather than case-by-case fashion. In this article, we make such a systematic investigation, based on a simple microscopic model of gene regulation that incorporates stochastic effects. In particular, we show how operator strength and operator multiplicity affect this variability. We examine different modes of transcription factor binding to complex promoters (cooperative, independent, simultaneous) and how each of these affects the level of variability in transcription product from cell-to-cell. We propose that direct comparison between in vivo single-cell experiments and theoretical predictions for the moments of the probability distribution of mRNA number per cell can discriminate between different kinetic models of gene regulation.Comment: 35 pages, 6 figures, Submitte

Caltech Authors

Composition-based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST

Author: AA Schäffer
AL Delcher
Alejandro A Schäffer
B Brejová
B Hao
BG Barrell
DJ States
E Birney
E Birney
E Boy-Marcotte
E Boy-Marcotte
E Halperin
E Michael Gertz
EM Gertz
F Damak
F Zinoni
G Macino
H Peltola
IG Young
J Hein
J Hein
JC Wootton
L Knecht
M Gribskov
MS Boguski
MS Boguski
MS Gelfand
O Gotoh
P Steneberg
P Steneberg
R Durbin
Richa Agarwala
S Henikoff
S Kurtz
SA Chervitz
SC Low
SF Altschul
SF Altschul
SF Altschul
SF Altschul
Stephen F Altschul
TF Smith
W Gish
WJ Kent
WR Pearson
WR Pearson
WR Pearson
X Guan
X Huang
Yi-Kuo Yu
YK Yu
YK Yu
Z Zhang
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: TBLASTN is a mode of operation for BLAST that aligns protein sequences to a nucleotide database translated in all six frames. We present the first description of the modern implementation of TBLASTN, focusing on new techniques that were used to implement composition-based statistics for translated nucleotide searches. Composition-based statistics use the composition of the sequences being aligned to generate more accurate E-values, which allows for a more accurate distinction between true and false matches. Until recently, composition-based statistics were available only for protein-protein searches. They are now available as a command line option for recent versions of TBLASTN and as an option for TBLASTN on the NCBI BLAST web server. RESULTS: We evaluate the statistical and retrieval accuracy of the E-values reported by a baseline version of TBLASTN and by two variants that use different types of composition-based statistics. To test the statistical accuracy of TBLASTN, we ran 1000 searches using scrambled proteins from the mouse genome and a database of human chromosomes. To test retrieval accuracy, we modernize and adapt to translated searches a test set previously used to evaluate the retrieval accuracy of protein-protein searches. We show that composition-based statistics greatly improve the statistical accuracy of TBLASTN, at a small cost to the retrieval accuracy. CONCLUSION: TBLASTN is widely used, as it is common to wish to compare proteins to chromosomes or to libraries of mRNAs. Composition-based statistics improve the statistical accuracy, and therefore the reliability, of TBLASTN results. The algorithms used by TBLASTN are not widely known, and some of the most important are reported here. The data used to test TBLASTN are available for download and may be useful in other studies of translated search algorithms

Springer - Publisher Connector

Accelerated Profile HMM Searches

Author: A Jacob
A Krogh
A Milosavljević
A Wozniak
AA Schäffer
B Rekapalli
C Camacho
DR Horn
EK Freyhult
EM Gertz
G Chukkapalli
GA Price
J Landman
JP Walters
JP Walters
K Karplus
LR Rabiner
LS Johnson
M Farrar
M Madera
R Durbin
RD Finn
RP Maddimsetty
S Derrien
S Hunter
S Johnson
Sean R. Eddy
SF Altschul
SF Altschul
SF Altschul
SF Altschul
SJ Melnikoff
SR Eddy
T Oliver
T Rognes
T Rognes
TF Smith
V Chaudhary
V Sachdeva
William R. Pearson
WN Grundy
WR Pearson
Y Sun
Y Sun
YK Yu
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call “sparse rescaling”. These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches

Detection of Heteroplasmic Mitochondrial DNA in Single Mitochondria

Author: A Agarwal
A Ashkin
A Resnick
Ashley Knipe
B Snijder
B Westermann
Barbara C. Levin
BC Levin
BC Levin
BG Poe
DC Chan
E Ruiz-Pesini
EM Gertz
ER Dufresne
F Legros
GD Jeffries
H Niu
H Tang
H Wang
HH Dahl
IA Vorobjev
JM Butler
Joseph E. Reiner
JP Shelby
K Nakada
KM Fuller
Koren Holland Deckman
Kristian Helmerson
L Cavelier
LC Greaves
LM Cree
M van Oven
MD Coble
MW Berns
Nicholas Boire
PJ Pauzauskie
R Agarwal
R Nambiar
Rani B. Kishore
RM Andrews
RW Taylor
S Anderson
S DiMauro
T Kuroiwa
Tadafumi Kato
Thomas Albanetti
Y Chen
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

BACKGROUND: Mitochondrial DNA (mtDNA) genome mutations can lead to energy and respiratory-related disorders like myoclonic epilepsy with ragged red fiber disease (MERRF), mitochondrial myopathy, encephalopathy, lactic acidosis and stroke (MELAS) syndrome, and Leber's hereditary optic neuropathy (LHON). It is not well understood what effect the distribution of mutated mtDNA throughout the mitochondrial matrix has on the development of mitochondrial-based disorders. Insight into this complex sub-cellular heterogeneity may further our understanding of the development of mitochondria-related diseases. METHODOLOGY: This work describes a method for isolating individual mitochondria from single cells and performing molecular analysis on that single mitochondrion's DNA. An optical tweezer extracts a single mitochondrion from a lysed human HL-60 cell. Then a micron-sized femtopipette tip captures the mitochondrion for subsequent analysis. Multiple rounds of conventional DNA amplification and standard sequencing methods enable the detection of a heteroplasmic mixture in the mtDNA from a single mitochondrion. SIGNIFICANCE: Molecular analysis of mtDNA from the individually extracted mitochondrion demonstrates that a heteroplasmy is present in single mitochondria at various ratios consistent with the 50/50 heteroplasmy ratio found in single cells that contain multiple mitochondria

Identifying Cognate Binding Pairs among a Large Set of Paralogs: The Case of PE/PPE Proteins of Mycobacterium tuberculosis

Author: A Grigoriev
A Zarrinpar
AK Ramani
AM Abdallah
AS Pym
B Rivas
C-S Goh
CM Deane
CS Goh
David Eisenberg
E Gasteiger
EM Marcotte
F Jacob
F Tekaia
G Delogu
GM Suel
H Ge
H Markel
I Halperin
I Kass
I Tirosh
J Gertz
JD Thompson
KL Lightbody
L Stols
LM Okkels
M Pellegrini
M Pellegrini
M Strong
M Strong
Matteo Pellegrini
MD Ermolaeva
MJ Brennan
NCGv Pittius
NS Berrow
P Aloy
P Brodin
PD Karp
PR Marri
PS Renshaw
R Jansen
R Pajon
RA Sayle
Robert Riley
S Banu
SL Sampson
ST Cole
SW Lockless
T Barrett
T Dandekar
T Garnier
Thomas Lengauer
U Gobel
W Kabsch
Y Li
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

We consider the problem of how to detect cognate pairs of proteins that bind when each belongs to a large family of paralogs. To illustrate the problem, we have undertaken a genomewide analysis of interactions of members of the PE and PPE protein families of Mycobacterium tuberculosis. Our computational method uses structural information, operon organization, and protein coevolution to infer the interaction of PE and PPE proteins. Some 289 PE/PPE complexes were predicted out of a possible 5,590 PE/PPE pairs genomewide. Thirty-five of these predicted complexes were also found to have correlated mRNA expression, providing additional evidence for these interactions. We show that our method is applicable to other protein families, by analyzing interactions of the Esx family of proteins. Our resulting set of predictions is a starting point for genomewide experimental interaction screens of the PE and PPE families, and our method may be generally useful for detecting interactions of proteins within families having many paralogs

Enchytraeus albidus Microarray: Enrichment, Design, Annotation and Database (EnchyBASE)

Author: A Conesa
Amadeu M. V. M. Soares
B Nota
B Nota
Cynthia Gibas
D Roelofs
D Roelofs
Dick Roelofs
DV Rebrikov
E Werle
EM Gertz
H Løkke
H Ogata
HC Poynton
ISO
J Owen
J Parkinson
Joel Arrais
K Lock
K Lock
K Lock
K Maraldo
K Maraldo
L Diatchenko
L Diatchenko
L Posthuma
LH Heckmann
M Amorim
M Holmstrup
M Pirooznia
MG Vijver
MJ Timmermans
MJB Amorim
MJB Amorim
MJB Amorim
MJB Amorim
MJTN Timmermans
MJTN Timmermans
MS Clark
MS Lee
Mónica J. B. Amorim
OECD
OECD
Pedro Lopes
RO Schill
S Altschul
S Jeffrey
S Loureiro
S Slotsbo
Sara C. Novais
SC Novais
SC Novais
SC Novais
SG Dodard
SIL Gomes
SIL Gomes
SR Sturzenbaum
TE de Boer
TE de Boer
Tine Vandenbrouck
W Deng
Wim De Coen
YY Zhu
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Enchytraeus albidus (Oligochaeta) is an ecologically relevant species used as standard test organisms for risk assessment. Effects of stressors in this species are commonly determined at the population level using reproduction and survival as endpoints. The assessment of transcriptomic responses can be very useful e.g. to understand underlying mechanisms of toxicity with gene expression fingerprinting. In the present paper the following is being addressed: 1) development of suppressive subtractive hybridization (SSH) libraries enriched for differentially expressed genes after metal and pesticide exposures; 2) sequencing and characterization of all generated cDNA inserts; 3) development of a publicly available genomic database on E. albidus. A total of 2100 Expressed Sequence Tags (ESTs) were isolated, sequenced and assembled into 1124 clusters (947 singletons and 177 contigs). From these sequences, 41% matched known proteins in GenBank (BLASTX, e-value≤10-5) and 37% had at least one Gene Ontology (GO) term assigned. In total, 5.5% of the sequences were assigned to a metabolic pathway, based on KEGG. With this new sequencing information, an Agilent custom oligonucleotide microarray was designed, representing a potential tool for transcriptomic studies. EnchyBASE (http://bioinformatics.ua.pt/enchybase/) was developed as a web freely available database containing genomic information on E. albidus and will be further extended in the near future for other enchytraeid species. The database so far includes all ESTs generated for E. albidus from three cDNA libraries. This information can be downloaded and applied in functional genomics and transcription studies

Repositório Institucional da Universidade de Aveiro

VU Research Portal

Institutional Repository Universiteit Antwerpen

Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment

Author: AC Gavin
AJ Enright
AK Ramani
B Snel
BG Mirkin
C von Mering
C von Mering
CS Goh
CS Goh
D Barker
D Vallenet
E Kolker
EM Marcotte
EM Marcotte
ES Snitkin
F Pazos
F Pazos
G Butland
GV Glazko
H Li
H Rachman
H Tettelin
H Wu
H Wu
HB Fraser
I Lee
I Tirosh
I Uchiyama
J De Las Rivas
J Gertz
J Sun
J Tamames
J Wu
J Wu
JB Pereira-Leal
JC Mellor
JC Rain
JF Rual
JM Peregrin-Alvarez
K Jim
K Tan
L Aravind
L Giot
M Campillos
M Levesque
M Pellegrini
M Strong
M Strong
M Wu
MA Huynen
MG Kann
MJ Martin
ML Green
MY Galperin
N Lopez-Bigas
NJ Krogan
NS Baliga
NS Baliga
P Pagel
P Shannon
P Ternes
P Uetz
PM Bowers
PM Bowers
PM Bowers
R Bonneau
R Jothi
R Jothi
R Overbeek
RA Gutierrez
RA Gutierrez
Raja Jothi
RL Tatusov
SB Hedges
SF Altschul
SV Date
SV Date
T Dandekar
T Gaasterland
T Ito
T Sato
T Wang
T Yamada
Teresa M Przytycka
TF Deluca
TS Mikkelsen
U Stelzl
V Kunin
Y Kim
Y Kim
Y Ye
Y Zheng
Y Zhou
Z Su
ZI Johnson
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background A widely-used approach for discovering functional and physical interactions among proteins involves phylogenetic profile comparisons (PPCs). Here, proteins with similar profiles are inferred to be functionally related under the assumption that proteins involved in the same metabolic pathway or cellular system are likely to have been co-inherited during evolution. Results Our experimentation with <it>E. coli </it>and yeast proteins with 16 different carefully composed reference sets of genomes revealed that the phyletic patterns of proteins in prokaryotes alone could be adequate enough to make reasonably accurate functional linkage predictions. A slight improvement in performance is observed on adding few eukaryotes into the reference set, but a noticeable drop-off in performance is observed with increased number of eukaryotes. Inclusion of most parasitic, pathogenic or vertebrate genomes and multiple strains of the same species into the reference set do not necessarily contribute to an improved sensitivity or accuracy. Interestingly, we also found that evolutionary histories of individual pathways have a significant affect on the performance of the PPC approach with respect to a particular reference set. For example, to accurately predict functional links in carbohydrate or lipid metabolism, a reference set solely composed of prokaryotic (or bacterial) genomes performed among the best compared to one composed of genomes from all three super-kingdoms; this is in contrast to predicting functional links in translation for which a reference set composed of prokaryotic (or bacterial) genomes performed the worst. We also demonstrate that the widely used random null model to quantify the statistical significance of profile similarity is incomplete, which could result in an increased number of false-positives. Conclusion Contrary to previous proposals, it is not merely the number of genomes but a careful selection of informative genomes in the reference set that influences the prediction accuracy of the PPC approach. We note that the predictive power of the PPC approach, especially in eukaryotes, is heavily influenced by the primary endosymbiosis and subsequent bacterial contributions. The over-representation of parasitic unicellular eukaryotes and vertebrates additionally make eukaryotes less useful in the reference sets. Reference sets composed of highly non-redundant set of genomes from all three super-kingdoms fare better with pathways showing considerable vertical inheritance and strong conservation (e.g. translation apparatus), while reference sets solely composed of prokaryotic genomes fare better for more variable pathways like carbohydrate metabolism. Differential performance of the PPC approach on various pathways, and a weak positive correlation between functional and profile similarities suggest that caution should be exercised while interpreting functional linkages inferred from genome-wide large-scale profile comparisons using a single reference set.</p

Springer - Publisher Connector

Identification of a new European rabbit IgA with a serine-rich hinge region

Author: A Pinheiro
A Pinheiro
A Pinheiro
A Pinheiro
A Pinheiro
A Shimizu
AK Surridge
Ana Pinheiro
B Wagner
BD Cooke
BD Cooke
C Auffray
CA Matthee
E Slack
E Tarelli
EM Gertz
F Ros
G Queney
H Spieker-Polet
H Spieker-Polet
HW Schroeder Jr.
J Abrantes
J Abrantes
JE Butler
JE Butler
Jenny M. Woof
JG Flanagan
JM Woof
JN Arnold
Joana Abrantes
K Tamura
Katherine L. Knight
KL Knight
L Abi-Rached
L Royle
M Bruggemann
M Carneiro
M Kingzette
MJ Betts
MV Katti
Patricia de Sousa-Pereira
Pedro J. Esteves
PJ Esteves
PJ Esteves
PJ Esteves
PJ Esteves
R Pettinello
RC Burnett
RC Edgar
RG Mage
S Kabir
S Kawamura
Sebastian D. Fugmann
T Strive
T Strive
T Strive
TA Hall
Tanja Strive
TS Mattu
V Franc
V Volgina
VV Volgina
W van der Loo
X Zhang
Y Narimatsu
Z Xu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

<div>In mammals, the most striking IgA system belongs to Lagomorpha. Indeed, 14 IgA subclasses have been identified in European rabbits, 11 of which are expressed. In contrast, most other mammals have only one IgA, or in the case of hominoids, two IgA subclasses. Characteristic features of the mammalian IgA subclasses are the length and amino acid sequence of their hinge regions, which are often rich in Pro, Ser and Thr residues and may also carry Cys residues. Here, we describe a new IgA that was expressed in New Zealand White domestic rabbits of IGHVa1 allotype. This IgA has an extended hinge region containing an intriguing stretch of nine consecutive Ser residues and no Pro or Thr residues, a motif exclusive to this new rabbit IgA. Considering the amino acid properties, this hinge motif may present some advantage over the common IgA hinge by affording novel functional capabilities. We also sequenced for the first time the IgA14 CH2 and CH3 domains and showed that IgA14 and IgA3 are expressed.</div