2,408 research outputs found

    Eukaryotic translation elongation factor 1A (eEF1A) domain I from S. cerevisiae is required but not sufficient for inter-species complementation

    Get PDF
    Ethanolamine phosphoglycerol (EPG) is a protein modification attached exclusively to eukaryotic elongation factor 1A (eEF1A). In mammals and plants, EPG is linked to conserved glutamate residues located in eEF1A domains II and III, whereas in the unicellular eukaryote Trypanosoma brucei, only domain III is modified by a single EPG. A biosynthetic precursor of EPG and structural requirements for EPG attachment to T. brucei eEF1A have been reported, but nothing is known about the EPG modifying enzyme(s). By expressing human eEF1A in T. brucei, we now show that EPG attachment to eEF1A is evolutionarily conserved between T. brucei and Homo sapiens. In contrast, S. cerevisiae eEF1A, which has been shown to lack EPG is not modified in T. brucei. Furthermore, we show that eEF1A cannot functionally complement across species when using T. brucei and S. cerevisiae as model organisms. However, functional complementation in yeast can be obtained using eEF1A chimera containing domains II or III from other species. In contrast, yeast domain I is strictly required for functional complementation in S. cerevisia

    Shuffling of cis-regulatory elements is a pervasive feature of the vertebrate lineage

    Get PDF
    BACKGROUND: All vertebrates share a remarkable degree of similarity in their development as well as in the basic functions of their cells. Despite this, attempts at unearthing genome-wide regulatory elements conserved throughout the vertebrate lineage using BLAST-like approaches have thus far detected noncoding conservation in only a few hundred genes, mostly associated with regulation of transcription and development. RESULTS: We used a unique combination of tools to obtain regional global-local alignments of orthologous loci. This approach takes into account shuffling of regulatory regions that are likely to occur over evolutionary distances greater than those separating mammalian genomes. This approach revealed one order of magnitude more vertebrate conserved elements than was previously reported in over 2,000 genes, including a high number of genes found in the membrane and extracellular regions. Our analysis revealed that 72% of the elements identified have undergone shuffling. We tested the ability of the elements identified to enhance transcription in zebrafish embryos and compared their activity with a set of control fragments. We found that more than 80% of the elements tested were able to enhance transcription significantly, prevalently in a tissue-restricted manner corresponding to the expression domain of the neighboring gene. CONCLUSION: Our work elucidates the importance of shuffling in the detection of cis-regulatory elements. It also elucidates how similarities across the vertebrate lineage, which go well beyond development, can be explained not only within the realm of coding genes but also in that of the sequences that ultimately govern their expression

    The impact of different negative training data on regulatory sequence predictions

    Get PDF
    Regulatory regions, like promoters and enhancers, cover an estimated 5-15% of the human genome. Changes to these sequences are thought to underlie much of human phenotypic variation and a substantial proportion of genetic causes of disease. However, our understanding of their functional encoding in DNA is still very limited. Applying machine or deep learning methods can shed light on this encoding and gapped k-mer support vector machines (gkm-SVMs) or convolutional neural networks (CNNs) are commonly trained on putative regulatory sequences. Here, we investigate the impact of negative sequence selection on model performance. By training gkm-SVM and CNN models on open chromatin data and corresponding negative training dataset, both learners and two approaches for negative training data are compared. Negative sets use either genomic background sequences or sequence shuffles of the positive sequences. Model performance was evaluated on three different tasks: predicting elements active in a cell-type, predicting cell-type specific elements, and predicting elements' relative activity as measured from independent experimental data. Our results indicate strong effects of the negative training data, with genomic backgrounds showing overall best results. Specifically, models trained on highly shuffled sequences perform worse on the complex tasks of tissue-specific activity and quantitative activity prediction, and seem to learn features of artificial sequences rather than regulatory activity. Further, we observe that insufficient matching of genomic background sequences results in model biases. While CNNs achieved and exceeded the performance of gkm-SVMs for larger training datasets, gkm-SVMs gave robust and best results for typical training dataset sizes without the need of hyperparameter optimization

    The Alternative Choice of Constitutive Exons throughout Evolution

    Get PDF
    Alternative cassette exons are known to originate from two processes exonization of intronic sequences and exon shuffling. Herein, we suggest an additional mechanism by which constitutively spliced exons become alternative cassette exons during evolution. We compiled a dataset of orthologous exons from human and mouse that are constitutively spliced in one species but alternatively spliced in the other. Examination of these exons suggests that the common ancestors were constitutively spliced. We show that relaxation of the 59 splice site during evolution is one of the molecular mechanisms by which exons shift from constitutive to alternative splicing. This shift is associated with the fixation of exonic splicing regulatory sequences (ESRs) that are essential for exon definition and control the inclusion level only after the transition to alternative splicing. The effect of each ESR on splicing and the combinatorial effects between two ESRs are conserved from fish to human. Our results uncover an evolutionary pathway that increases transcriptome diversity by shifting exons from constitutive to alternative splicin

    Phylogenetic differences in content and intensity of periodic proteins

    Get PDF
    Many proteins exhibit sequence periodicity, often correlated with a visible structural periodicity. The statistical significance of such periodicity can be assessed by means of a chi-square-based test, with significance thresholds being calculated from shuffled sequences. Comparison of the complete proteomes of 45 species reveals striking differences in the proportion of periodic proteins and the intensity of the most significant periodicities. Eukaryotes tend to have a higher proportion of periodic proteins than eubacteria, which in turn tend to have more than archaea. The intensity of periodicity in the most periodic proteins is also greatest in eukaryotes. By contrast, the relatively small group of periodic proteins in archaea also tend to be weakly periodic compared to those of eukaryotes and eubacteria. Exceptions to this general rule are found in those prokaryotes with multicellular life-cycle phases, e.g. Methanosarcina sps. or Anabaena sps., which have more periodicities than prokaryotes in general, and in unicellular eukaryotes, which have fewer than multicellular eukaryotes. The distribution of significantly periodic proteins in eukaryotes is over a wide range of period lengths, whereas prokaryotic proteins typically have a more limited set of period lengths. This is further investigated by repeating the analysis on the NRL-3D database of proteins of solved structure. Some short range periodicities are explicable in terms of basic secondary structure, e.g. alpha helices, while middle range periodicities are frequently found to consist of known short Pfam domains, e.g. leucine-rich repeats, tetratricopeptides or armadillo domains. However, not all can be explained in this way

    Statistical Significance of Precisely Repeated Intracellular Synaptic Patterns

    Get PDF
    Can neuronal networks produce patterns of activity with millisecond accuracy? It may seem unlikely, considering the probabilistic nature of synaptic transmission. However, some theories of brain function predict that such precision is feasible and can emerge from the non-linearity of the action potential generation in circuits of connected neurons. Several studies have presented evidence for and against this hypothesis. Our earlier work supported the precision hypothesis, based on results demonstrating that precise patterns of synaptic inputs could be found in intracellular recordings from neurons in brain slices and in vivo. To test this hypothesis, we devised a method for finding precise repeats of activity and compared repeats found in the data to those found in surrogate datasets made by shuffling the original data. Because more repeats were found in the original data than in the surrogate data sets, we argued that repeats were not due to chance occurrence. Mokeichev et al. (2007) challenged these conclusions, arguing that the generation of surrogate data was insufficiently rigorous. We have now reanalyzed our previous data with the methods introduced from Mokeichev et al. (2007). Our reanalysis reveals that repeats are statistically significant, thus supporting our earlier conclusions, while also supporting many conclusions that Mokeichev et al. (2007) drew from their recent in vivo recordings. Moreover, we also show that the conditions under which the membrane potential is recorded contributes significantly to the ability to detect repeats and may explain conflicting results. In conclusion, our reevaluation resolves the methodological contradictions between Ikegaya et al. (2004) and Mokeichev et al. (2007), but demonstrates the validity of our previous conclusion that spontaneous network activity is non-randomly organized

    Coevolved mutations reveal distinct architectures for two core proteins in the bacterial flagellar motor

    Get PDF
    Switching of bacterial flagellar rotation is caused by large domain movements of the FliG protein triggered by binding of the signal protein CheY to FliM. FliG and FliM form adjacent multi-subunit arrays within the basal body C-ring. The movements alter the interaction of the FliG C-terminal (FliGC) "torque" helix with the stator complexes. Atomic models based on the Salmonella entrovar C-ring electron microscopy reconstruction have implications for switching, but lack consensus on the relative locations of the FliG armadillo (ARM) domains (amino-terminal (FliGN), middle (FliGM) and FliGC) as well as changes during chemotaxis. The generality of the Salmonella model is challenged by the variation in motor morphology and response between species. We studied coevolved residue mutations to determine the unifying elements of switch architecture. Residue interactions, measured by their coevolution, were formalized as a network, guided by structural data. Our measurements reveal a common design with dedicated switch and motor modules. The FliM middle domain (FliMM) has extensive connectivity most simply explained by conserved intra and inter-subunit contacts. In contrast, FliG has patchy, complex architecture. Conserved structural motifs form interacting nodes in the coevolution network that wire FliMM to the FliGC C-terminal, four-helix motor module (C3-6). FliG C3-6 coevolution is organized around the torque helix, differently from other ARM domains. The nodes form separated, surface-proximal patches that are targeted by deleterious mutations as in other allosteric systems. The dominant node is formed by the EHPQ motif at the FliMMFliGM contact interface and adjacent helix residues at a central location within FliGM. The node interacts with nodes in the N-terminal FliGc α-helix triad (ARM-C) and FliGN. ARM-C, separated from C3-6 by the MFVF motif, has poor intra-network connectivity consistent with its variable orientation revealed by structural data. ARM-C could be the convertor element that provides mechanistic and species diversity.JK was supported by Medical Research Council grant U117581331. SK was supported by seed funds from Lahore University of Managment Sciences (LUMS) and the Molecular Biology Consortium

    An evolutionary perspective on the kinome of malaria parasites

    Get PDF
    Malaria parasites belong to an ancient lineage that diverged very early from the main branch of eukaryotes. The approximately 90-member plasmodial kinome includes a majority of eukaryotic protein kinases that clearly cluster within the AGC, CMGC, TKL, CaMK and CK1 groups found in yeast, plants and mammals, testifying to the ancient ancestry of these families. However, several hundred millions years of independent evolution, and the specific pressures brought about by first a photosynthetic and then a parasitic lifestyle, led to the emergence of unique features in the plasmodial kinome. These include taxon-restricted kinase families, and unique peculiarities of individual enzymes even when they have homologues in other eukaryotes. Here, we merge essential aspects of all three malaria-related communications that were presented at the Evolution of Protein Phosphorylation meeting, and propose an integrated discussion of the specific features of the parasite's kinome and phosphoproteome
    corecore