Search CORE

edoc

Bern Open Repository and Information System (BORIS)

FigShare

Shuffling of cis-regulatory elements is a pervasive feature of the vertebrate lineage

Author: Claudiani Pamela
D'Amato Maria
Kalmar Eva
Muller Ferenc
Sanges Remo
Stupka Elia
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: All vertebrates share a remarkable degree of similarity in their development as well as in the basic functions of their cells. Despite this, attempts at unearthing genome-wide regulatory elements conserved throughout the vertebrate lineage using BLAST-like approaches have thus far detected noncoding conservation in only a few hundred genes, mostly associated with regulation of transcription and development. RESULTS: We used a unique combination of tools to obtain regional global-local alignments of orthologous loci. This approach takes into account shuffling of regulatory regions that are likely to occur over evolutionary distances greater than those separating mammalian genomes. This approach revealed one order of magnitude more vertebrate conserved elements than was previously reported in over 2,000 genes, including a high number of genes found in the membrane and extracellular regions. Our analysis revealed that 72% of the elements identified have undergone shuffling. We tested the ability of the elements identified to enhance transcription in zebrafish embryos and compared their activity with a set of control fragments. We found that more than 80% of the elements tested were able to enhance transcription significantly, prevalently in a tissue-restricted manner corresponding to the expression domain of the neighboring gene. CONCLUSION: Our work elucidates the importance of shuffling in the detection of cis-regulatory elements. It also elucidates how similarities across the vertebrate lineage, which go well beyond development, can be explained not only within the realm of coding genes but also in that of the sequences that ultimately govern their expression

Springer - Publisher Connector

University of Birmingham Research Portal

Institutional Repository of the Freie Universität Berlin

Sissa Digital Library

The impact of different negative training data on regulatory sequence predictions

Author: Kircher Martin
Krützfeldt Louisa-Marie
Schubach Max
Publication venue
Publication date: 01/01/2020
Field of study

Regulatory regions, like promoters and enhancers, cover an estimated 5-15% of the human genome. Changes to these sequences are thought to underlie much of human phenotypic variation and a substantial proportion of genetic causes of disease. However, our understanding of their functional encoding in DNA is still very limited. Applying machine or deep learning methods can shed light on this encoding and gapped k-mer support vector machines (gkm-SVMs) or convolutional neural networks (CNNs) are commonly trained on putative regulatory sequences. Here, we investigate the impact of negative sequence selection on model performance. By training gkm-SVM and CNN models on open chromatin data and corresponding negative training dataset, both learners and two approaches for negative training data are compared. Negative sets use either genomic background sequences or sequence shuffles of the positive sequences. Model performance was evaluated on three different tasks: predicting elements active in a cell-type, predicting cell-type specific elements, and predicting elements' relative activity as measured from independent experimental data. Our results indicate strong effects of the negative training data, with genomic backgrounds showing overall best results. Specifically, models trained on highly shuffled sequences perform worse on the complex tasks of tissue-specific activity and quantitative activity prediction, and seem to learn features of artificial sequences rather than regulatory activity. Further, we observe that insufficient matching of genomic background sequences results in model biases. While CNNs achieved and exceeded the performance of gkm-SVMs for larger training datasets, gkm-SVMs gave robust and best results for typical training dataset sizes without the need of hyperparameter optimization

The Alternative Choice of Constitutive Exons throughout Evolution

Author: Adi Doron-Faigenboim
Amir Goren
Eddo Kim
Galit Lev-Maor
Gil Ast
Hadas Keren
Lisa Stubbs
Noa Sela
Shelly Leibman-Barak
Tal Pupko
Publication venue
Publication date: 01/11/2007
Field of study

Alternative cassette exons are known to originate from two processes exonization of intronic sequences and exon shuffling. Herein, we suggest an additional mechanism by which constitutively spliced exons become alternative cassette exons during evolution. We compiled a dataset of orthologous exons from human and mouse that are constitutively spliced in one species but alternatively spliced in the other. Examination of these exons suggests that the common ancestors were constitutively spliced. We show that relaxation of the 59 splice site during evolution is one of the molecular mechanisms by which exons shift from constitutive to alternative splicing. This shift is associated with the fixation of exonic splicing regulatory sequences (ESRs) that are essential for exon definition and control the inclusion level only after the transition to alternative splicing. The effect of each ESR on splicing and the combinatorial effects between two ESRs are conserved from fish to human. Our results uncover an evolutionary pathway that increases transcriptome diversity by shifting exons from constitutive to alternative splicin

arXiv.org e-Print Archive

CiteSeerX

Phylogenetic differences in content and intensity of periodic proteins

Author: Gatherer D.
McEwan N.R.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2005
Field of study

Many proteins exhibit sequence periodicity, often correlated with a visible structural periodicity. The statistical significance of such periodicity can be assessed by means of a chi-square-based test, with significance thresholds being calculated from shuffled sequences. Comparison of the complete proteomes of 45 species reveals striking differences in the proportion of periodic proteins and the intensity of the most significant periodicities. Eukaryotes tend to have a higher proportion of periodic proteins than eubacteria, which in turn tend to have more than archaea. The intensity of periodicity in the most periodic proteins is also greatest in eukaryotes. By contrast, the relatively small group of periodic proteins in archaea also tend to be weakly periodic compared to those of eukaryotes and eubacteria. Exceptions to this general rule are found in those prokaryotes with multicellular life-cycle phases, e.g. Methanosarcina sps. or Anabaena sps., which have more periodicities than prokaryotes in general, and in unicellular eukaryotes, which have fewer than multicellular eukaryotes. The distribution of significantly periodic proteins in eukaryotes is over a wide range of period lengths, whereas prokaryotic proteins typically have a more limited set of period lengths. This is further investigated by repeating the analysis on the NRL-3D database of proteins of solved structure. Some short range periodicities are explicable in terms of basic secondary structure, e.g. alpha helices, while middle range periodicities are frequently found to consist of known short Pfam domains, e.g. leucine-rich repeats, tetratricopeptides or armadillo domains. However, not all can be explained in this way

Enlighten

Lancaster E-Prints

Statistical Significance of Precisely Repeated Intracellular Synaptic Patterns

Author: A Luczak
A Mokeichev
A Roxin
BQ Mao
D Contreras
E Pastalkova
EM Izhikevich
G Molnar
Gloster Aaron
Hiromu Tanimoto
Huei-Yu Chiou
I Lampl
JD Rolston
JM Beggs
JN MacLean
M Abeles
MW Oram
N Raichman
R Lestienne
R Madhavan
Rafael Yuste
RJ Douglas
RR Llinas
S Chung
S Kang
SN Baker
T Kenet
T Shmiel
Wataru Matsumoto
Y Ikegaya
Y Prut
Yuji Ikegaya
Z Nadasdy
Publication venue: Public Library of Science
Publication date
Field of study

Can neuronal networks produce patterns of activity with millisecond accuracy? It may seem unlikely, considering the probabilistic nature of synaptic transmission. However, some theories of brain function predict that such precision is feasible and can emerge from the non-linearity of the action potential generation in circuits of connected neurons. Several studies have presented evidence for and against this hypothesis. Our earlier work supported the precision hypothesis, based on results demonstrating that precise patterns of synaptic inputs could be found in intracellular recordings from neurons in brain slices and in vivo. To test this hypothesis, we devised a method for finding precise repeats of activity and compared repeats found in the data to those found in surrogate datasets made by shuffling the original data. Because more repeats were found in the original data than in the surrogate data sets, we argued that repeats were not due to chance occurrence. Mokeichev et al. (2007) challenged these conclusions, arguing that the generation of surrogate data was insufficiently rigorous. We have now reanalyzed our previous data with the methods introduced from Mokeichev et al. (2007). Our reanalysis reveals that repeats are statistically significant, thus supporting our earlier conclusions, while also supporting many conclusions that Mokeichev et al. (2007) drew from their recent in vivo recordings. Moreover, we also show that the conditions under which the membrane potential is recorded contributes significantly to the ability to detect repeats and may explain conflicting results. In conclusion, our reevaluation resolves the methodological contradictions between Ikegaya et al. (2004) and Mokeichev et al. (2007), but demonstrates the validity of our previous conclusion that spontaneous network activity is non-randomly organized

Public Library of Science (PLOS)

Coevolved mutations reveal distinct architectures for two core proteins in the bacterial flagellar motor

Author: A Pandini
A Pandini
AC Lowenthal
Alessandro Pandini
AM Waterhouse
Anna Roujeinikova
AS Vartanian
B Ruhnau
BJ Grant
BJ Lowder
CJ Tsai
CM Dyer
D de Juan
D Stock
DL Guzman
DR Livesay
DR Thomas
DR Thomas
DS Bischoff
DT Jones
F Pazos
F Pazos
H Ashkenazy
H Ashkenazy
H Shimodaira
H Sockett
H Szurmant
HC Berg
J Friedman
J Yuan
Jens Kleinjung
JP Armitage
JP Armitage
JS Parkinson
K Paul
K Paul
K Paul
KA Reynolds
KH Lam
KH Lam
L Cavallo
LK Lee
M Punta
MK Sarkar
MN Price
NA Rosenberg
NJ Delalez
P Cluzel
PN Brown
PN Brown
Q Ma
R Saito
RC Edgar
RD Finn
RW Branch
S Chen
S Pronk
SA Lloyd
SD Dunn
Shafqat Rasool
Shahid Khan
SM Van Way
SY Park
SY Park
T Minamino
T Pilizota
TA Duke
VM Irikura
WR Taylor
WR Taylor
X Zhao
Y Tu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

Switching of bacterial flagellar rotation is caused by large domain movements of the FliG protein triggered by binding of the signal protein CheY to FliM. FliG and FliM form adjacent multi-subunit arrays within the basal body C-ring. The movements alter the interaction of the FliG C-terminal (FliGC) "torque" helix with the stator complexes. Atomic models based on the Salmonella entrovar C-ring electron microscopy reconstruction have implications for switching, but lack consensus on the relative locations of the FliG armadillo (ARM) domains (amino-terminal (FliGN), middle (FliGM) and FliGC) as well as changes during chemotaxis. The generality of the Salmonella model is challenged by the variation in motor morphology and response between species. We studied coevolved residue mutations to determine the unifying elements of switch architecture. Residue interactions, measured by their coevolution, were formalized as a network, guided by structural data. Our measurements reveal a common design with dedicated switch and motor modules. The FliM middle domain (FliMM) has extensive connectivity most simply explained by conserved intra and inter-subunit contacts. In contrast, FliG has patchy, complex architecture. Conserved structural motifs form interacting nodes in the coevolution network that wire FliMM to the FliGC C-terminal, four-helix motor module (C3-6). FliG C3-6 coevolution is organized around the torque helix, differently from other ARM domains. The nodes form separated, surface-proximal patches that are targeted by deleterious mutations as in other allosteric systems. The dominant node is formed by the EHPQ motif at the FliMMFliGM contact interface and adjacent helix residues at a central location within FliGM. The node interacts with nodes in the N-terminal FliGc α-helix triad (ARM-C) and FliGN. ARM-C, separated from C3-6 by the MFVF motif, has poor intra-network connectivity consistent with its variable orientation revealed by structural data. ARM-C could be the convertor element that provides mechanistic and species diversity.JK was supported by Medical Research Council grant U117581331. SK was supported by seed funds from Lahore University of Managment Sciences (LUMS) and the Molecular Biology Consortium

Brunel University Research Archive

FigShare

An evolutionary perspective on the kinome of malaria parasites

Author: Andrew B. Tobin
Carvalho T. G.
Christian Doerig
Eric Talevich
Natarajan Kannan
Wells T. N.
Wilson R. J.
Publication venue: 'The Royal Society'
Publication date: 01/01/2012
Field of study

Malaria parasites belong to an ancient lineage that diverged very early from the main branch of eukaryotes. The approximately 90-member plasmodial kinome includes a majority of eukaryotic protein kinases that clearly cluster within the AGC, CMGC, TKL, CaMK and CK1 groups found in yeast, plants and mammals, testifying to the ancient ancestry of these families. However, several hundred millions years of independent evolution, and the specific pressures brought about by first a photosynthetic and then a parasitic lifestyle, led to the emergence of unique features in the plasmodial kinome. These include taxon-restricted kinase families, and unique peculiarities of individual enzymes even when they have homologues in other eukaryotes. Here, we merge essential aspects of all three malaria-related communications that were presented at the Evolution of Protein Phosphorylation meeting, and propose an integrated discussion of the specific features of the parasite's kinome and phosphoproteome