Search CORE

Public Library of Science (PLOS)

MDC Repository

A General Model of Codon Bias Due to GC Mutational Bias

Author: Palidwor Gareth A.
Perkins Theodore J.
Xia Xuhua
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Background - In spite of extensive research on the effect of mutation and selection on codon usage, a general model of codon usage bias due to mutational bias has been lacking. Because most amino acids allow synonymous GC content changing substitutions in the third codon position, the overall GC bias of a genome or genomic region is highly correlated with GC3, a measure of third position GC content. For individual amino acids as well, G/C ending codons usage generally increases with increasing GC bias and decreases with increasing AT bias. Arginine and leucine, amino acids that allow GC-changing synonymous substitutions in the first and third codon positions, have codons which may be expected to show different usage patterns. // Principal Findings - In analyzing codon usage bias in hundreds of prokaryotic and plant genomes and in human genes, we find that two G-ending codons, AGG (arginine) and TTG (leucine), unlike all other G/C-ending codons, show overall usage that decreases with increasing GC bias, contrary to the usual expectation that G/C-ending codon usage should increase with increasing genomic GC bias. Moreover, the usage of some codons appears nonlinear, even nonmonotone, as a function of GC bias. To explain these observations, we propose a continuous-time Markov chain model of GC-biased synonymous substitution. This model correctly predicts the qualitative usage patterns of all codons, including nonlinear codon usage in isoleucine, arginine and leucine. The model accounts for 72%, 64% and 52% of the observed variability of codon usage in prokaryotes, plants and human respectively. When codons are grouped based on common GC content, 87%, 80% and 68% of the variation in usage is explained for prokaryotes, plants and human respectively. // Conclusions - The model clarifies the sometimes-counterintuitive effects that GC mutational bias can have on codon usage, quantifies the influence of GC mutational bias and provides a natural null model relative to which other influences on codon bias may be measured

Taxonomic colouring of phylogenetic trees of protein sequences

Author: Andrade-Navarro Miguel A
Palidwor Gareth
Reynaud Emmanuel G
Publication venue: BioMed Central
Publication date: 01/02/2006
Field of study

BACKGROUND: Phylogenetic analyses of protein families are used to define the evolutionary relationships between homologous proteins. The interpretation of protein-sequence phylogenetic trees requires the examination of the taxonomic properties of the species associated to those sequences. However, there is no online tool to facilitate this interpretation, for example, by automatically attaching taxonomic information to the nodes of a tree, or by interactively colouring the branches of a tree according to any combination of taxonomic divisions. This is especially problematic if the tree contains on the order of hundreds of sequences, which, given the accelerated increase in the size of the protein sequence databases, is a situation that is becoming common. RESULTS: We have developed PhyloView, a web based tool for colouring phylogenetic trees upon arbitrary taxonomic properties of the species represented in a protein sequence phylogenetic tree. Provided that the tree contains SwissProt, SpTrembl, or GenBank protein identifiers, the tool retrieves the taxonomic information from the corresponding database. A colour picker displays a summary of the findings and allows the user to associate colours to the leaves of the tree according to any number of taxonomic partitions. Then, the colours are propagated to the branches of the tree. CONCLUSION: PhyloView can be used at . A tutorial, the software with documentation, and GPL licensed source code, can be accessed at the same web address

Research Repository UCD

ChIP on SNP-chip for genome-wide analysis of human histone H4 hyperacetylation

Author: Andrade-Navarro Miguel A
McCann Jennifer A
Muro Enrique M
Palidwor Gareth
Palmer Claire
Porter Christopher J
Rudnicki Michael A
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background SNP microarrays are designed to genotype Single Nucleotide Polymorphisms (SNPs). These microarrays report hybridization of DNA fragments and therefore can be used for the purpose of detecting genomic fragments. Results Here, we demonstrate that a SNP microarray can be effectively used in this way to perform chromatin immunoprecipitation (ChIP) on chip as an alternative to tiling microarrays. We illustrate this novel application by mapping whole genome histone H4 hyperacetylation in human myoblasts and myotubes. We detect clusters of hyperacetylated histone H4, often spanning across up to 300 kilobases of genomic sequence. Using complementary genome-wide analyses of gene expression by DNA microarray we demonstrate that these clusters of hyperacetylated histone H4 tend to be associated with expressed genes. Conclusion The use of a SNP array for a ChIP-on-chip application (ChIP on SNP-chip) will be of great value to laboratories whose interest is the determination of general rules regarding the relationship of specific chromatin modifications to transcriptional status throughout the genome and to examine the asymmetric modification of chromatin at heterozygous loci.</p

Gene function in early mouse embryonic stem cell differentiation

Author: Andrade-Navarro Miguel A
Campbell Pearl A
Muro Enrique M
Palidwor Gareth
Perez-Iratxeta Carolina
Porter Christopher J
Rudnicki Michael A
Sene Kagnew Hailesellasse
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: Little is known about the genes that drive embryonic stem cell differentiation. However, such knowledge is necessary if we are to exploit the therapeutic potential of stem cells. To uncover the genetic determinants of mouse embryonic stem cell (mESC) differentiation, we have generated and analyzed 11-point time-series of DNA microarray data for three biologically equivalent but genetically distinct mESC lines (R1, J1, and V6.5) undergoing undirected differentiation into embryoid bodies (EBs) over a period of two weeks. RESULTS: We identified the initial 12 hour period as reflecting the early stages of mESC differentiation and studied probe sets showing consistent changes of gene expression in that period. Gene function analysis indicated significant up-regulation of genes related to regulation of transcription and mRNA splicing, and down-regulation of genes related to intracellular signaling. Phylogenetic analysis indicated that the genes showing the largest expression changes were more likely to have originated in metazoans. The probe sets with the most consistent gene changes in the three cell lines represented 24 down-regulated and 12 up-regulated genes, all with closely related human homologues. Whereas some of these genes are known to be involved in embryonic developmental processes (e.g. Klf4, Otx2, Smn1, Socs3, Tagln, Tdgf1), our analysis points to others (such as transcription factor Phf21a, extracellular matrix related Lama1 and Cyr61, or endoplasmic reticulum related Sc4mol and Scd2) that have not been previously related to mESC function. The majority of identified functions were related to transcriptional regulation, intracellular signaling, and cytoskeleton. Genes involved in other cellular functions important in ESC differentiation such as chromatin remodeling and transmembrane receptors were not observed in this set. CONCLUSION: Our analysis profiles for the first time gene expression at a very early stage of mESC differentiation, and identifies a functional and phylogenetic signature for the genes involved. The data generated constitute a valuable resource for further studies. All DNA microarray data used in this study are available in the StemBase database of stem cell gene expression data [1] and in the NCBI's GEO database

Recent developments in StemBase: a tool to study gene expression in human and murine stem cells

Author: Andrade-Navarro Miguel A
Huska Matthew R
Krzyzanowski Paul M
Muro Enrique M
Palidwor Gareth A
Perez-Iratxeta Carolina
Porter Christopher J
Sandie Reatha
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Currently one of the largest online repositories for human and mouse stem cell gene expression data, StemBase was first designed as a simple web-interface to DNA microarray data generated by the Canadian Stem Cell Network to facilitate the discovery of gene functions relevant to stem cell control and differentiation. Findings Since its creation, StemBase has grown in both size and scope into a system with analysis tools that examine either the whole database at once, or slices of data, based on tissue type, cell type or gene of interest. As of September 1, 2008, StemBase contains gene expression data (microarray and Serial Analysis of Gene Expression) from 210 stem cell samples in 60 different experiments. Conclusion StemBase can be used to study gene expression in human and murine stem cells and is available at <url>http://www.stembase.ca</url>.</p

MDC Repository

BIDCHIPS: bias decomposition and removal from ChIP-seq data clarifies true binding signal and its functional correlates

Author: A Barski
A Diaz
A Diaz
A Mathelier
A Natarajan
A Sala
AR Quinlan
B Lenhard
C Cheng
C Cheng
C Zang
CA Meyer
CE Grant
CF Connelly
D Karolchik
D Park
DS Johnson
DS Latchman
EM Blackwood
F Jacob
F Spitz
G Tuteja
Gareth A. Palidwor
H Ji
I Sur
J Harrow
J Rozowsky
JC Dohm
JF Degner
KR Rosenbloom
KS Zaret
L Teytelman
L Teytelman
LA Cirillo
M-S Cheung
MT Maurano
NJ Sakabe
Parameswaran Ramachandran
R Jothi
RE Thurman
RJ Britten
RK Auerbach
S Meader
S Naumann
S Ogbourne
S Pepke
S Schwartz
SG Landt
SV Ramagopalan
T Shiraki
The ENCODE Project Consortium
Theodore J. Perkins
TI Lee
TL Bailey
W Krebs
X Fan
X Feng
Y Benjamini
Y Zhang
Z Ouyang
ZS Qin
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Public Library of Science (PLOS)

Detection of Alpha-Rod Protein Repeats Using a Neural Network and Application to Huntingtin

Author: A Adami
A Akhmanova
A Cena
A Krogh
A Losada
A Lupas
AF Neuwald
Anup Arumughan
AV Kajava
AV Kajava
B Falkowska-Hansen
B Rost
B Rost
B Rost
BE McGuinness
BM Collins
C Cole
CL Wellington
D Baillat
E Sontag
E Staub
EE Heldwein
EF Smith
Erich E. Wanker
F Rosenblatt
G Hoffner
Gareth A. Palidwor
HC Gregson
I Letunic
I Melvin
J Al-Bassam
J Al-Bassam
J Nasir
JA Cuff
L Cassimeris
L Chelysheva
L Smith
L Spagnolo
LJ McGuffin
LM Mende-Mueller
Luis Sanchez-Pulido
M Barrios-Rodiles
M Gruber
M Nakayama
M Oeffinger
M Peifer
M Sagermann
M Yao
MA Andrade
MA Andrade
MA Andrade
MA Andrade
Matthew R. Huska
MD Hatfield
Miguel A. Andrade-Navarro
MM Golas
MR Huska
MS Boguski
NC Turner
P Harjes
P Legrand
Pablo Porras
Philip E. Bourne
PJ Preker
R Sapiro
Raphaele Foulle
RD Finn
S Hauf
Sergey Shcherbinin
SF Altschul
SY Lee
T Tukamoto
Tamas Rasko
U Stelzl
Ulrich Stelzl
US Tulu
W Li
Y Bai
Y Mao
Y Matsuura
Y Mimori-Kiyosue
Y Shomura
Y Wang
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

A growing number of solved protein structures display an elongated structural domain, denoted here as alpha-rod, composed of stacked pairs of anti-parallel alpha-helices. Alpha-rods are flexible and expose a large surface, which makes them suitable for protein interaction. Although most likely originating by tandem duplication of a two-helix unit, their detection using sequence similarity between repeats is poor. Here, we show that alpha-rod repeats can be detected using a neural network. The network detects more repeats than are identified by domain databases using multiple profiles, with a low level of false positives (<10%). We identify alpha-rod repeats in approximately 0.4% of proteins in eukaryotic genomes. We then investigate the results for all human proteins, identifying alpha-rod repeats for the first time in six protein families, including proteins STAG1-3, SERAC1, and PSMD1-2 & 5. We also characterize a short version of these repeats in eight protein families of Archaeal, Bacterial, and Fungal species. Finally, we demonstrate the utility of these predictions in directing experimental work to demarcate three alpha-rods in huntingtin, a protein mutated in Huntington's disease. Using yeast two hybrid analysis and an immunoprecipitation technique, we show that the huntingtin fragments containing alpha-rods associate with each other. This is the first definition of domains in huntingtin and the first validation of predicted interactions between fragments of huntingtin, which sets up directions toward functional characterization of this protein. An implementation of the repeat detection algorithm is available as a Web server with a simple graphical output: http://www.ogic.ca/projects/ard. This can be further visualized using BiasViz, a graphic tool for representation of multiple sequence alignments

CiteSeerX

Oxford University Research Archive

MDC Repository

MPG.PuRe

BIDCHIPS: bias decomposition and removal from ChIP-seq data clarifies true binding signal and its functional correlates

Author: Palidwor Gareth A
Perkins Theodore J
Ramachandran Parameswaran
Publication venue
Publication date: 19/11/2015
Field of study

Abstract Background Unraveling transcriptional regulatory networks is a central problem in molecular biology and, in this quest, chromatin immunoprecipitation and sequencing (ChIP-seq) technology has given us the unprecedented ability to identify sites of protein-DNA binding and histone modification genome wide. However, multiple systemic and procedural biases hinder harnessing the full potential of this technology. Previous studies have addressed this problem, but a thorough characterization of different, interacting biases on ChIP-seq signals is still lacking. Results Here, we present a novel framework where the genome-wide ChIP-seq signal is viewed as being quantifiably influenced by different, measurable sources of bias, which can then be computationally subtracted away. We use a compendium of 123 human ENCODE ChIP-seq datasets to build regression models that tell us how much of a ChIP-seq signal can be attributed to mappability, GC-content, chromatin accessibility, and factors represented in input DNA and IgG controls. When we use the model to separate out these non-binding influences from the ChIP-seq signal, we obtain a purified signal that associates better to TF-DNA-binding motifs than do other measures of peak significance. We also carry out a multiscale analysis that reveals how ChIP-seq signal biases differ across different scales. Finally, we investigate previously reported associations between gene expression and ChIP-seq signals at transcription start sites. We show that our model can be used to discriminate ChIP-seq signals that are truly related to gene expression from those that are merely correlated by virtue of bias—in particular, chromatin accessibility bias, which shows up in ChIP-seq signals and also relates to gene expression. Conclusions Our study provides new insights into the behavior of ChIP-seq signal biases and proposes a novel mitigation framework that improves results compared to existing techniques. With ChIP-seq now being the central technology for studying transcriptional regulation, it is most crucial to accurately characterize, quantify, and adjust for the genome-wide effects of biases affecting ChIP-seq. Our study also emphasizes that properly accounting for confounders in ChIP-seq data is of paramount importance for obtaining biologically accurate insights into the workings of the complex regulatory mechanisms in living organisms. R and MATLAB packages implementing the framework can be obtained from http://www.perkinslab.ca/Software.html