113 research outputs found
PHYLOGENOMICS - GUIDED VALIDATION OF FUNCTION FOR CONSERVED UNKNOWN GENES
Identifying functions for all gene products in all sequenced organisms is a central challenge of the post-genomic era. However, at least 30-50% of the proteins encoded by any given genome are of unknown function, or wrongly or vaguely annotated. Many of these 'unknown' proteins are common to prokaryotes and plants. We accordingly set out to predict and experimentally test the functions of such proteins. Our approach to functional prediction is integrative, coupling the extensive post-genomic resources available for plants with comparative genomics based on hundreds of microbial genomes, and functional genomic datasets from model microorganisms. The early phase is computer-assisted; later phases incorporate intellectual input from expert plant and microbial biochemists. The approach thus bridges the gap between automated homology-based annotations and the classical gene discovery efforts of experimentalists, and is much more powerful than purely computational approaches to identifying gene-function associations. Among Arabidopsis genes, we focused on those (2,325 in total) that (i) are unique or belong to families with no more than three members, (ii) are conserved between plants and prokaryotes, and (iii) have unknown or poorly known functions. Computer-assisted selection of promising targets for deeper analysis was based on homology .. independent characteristics associated in the SEED database with the prokaryotic members of each family, specifically gene clustering and phyletic spread, as well as availability of functional genomics data, and publications that could link candidate families to general metabolic areas, or to specific functions. In-depth comparative genomic analysis was then performed for about 500 top candidate families, which connected ~55 of them to general areas of metabolism and led to specific functional predictions for a subset of ~25 more. Twenty predicted functions were experimentally tested in at least one prokaryotic organism via reverse genetics, metabolic profiling, functional complementation, and recombinant protein biochemistry. Our approach predicted and validated functions for 10 formerly uncharacterized protein families common to plants and prokaryotes; none of these functions had previously been correctly predicted by computational methods. The functions of five more are currently being validated. Experimental testing of diverse representatives of these families combined with in silica analysis allowed accurate projection of the annotations to hundreds more sequenced genomes
Identification of a conserved N-terminal domain in the first module of ACV synthetases
Abstract The lâÎŽâ(αâaminoadipoyl)âlâcysteinylâdâvaline synthetase (ACVS) is a trimodular nonribosomal peptide synthetase (NRPS) that provides the peptide precursor for the synthesis of ÎČâlactams. The enzyme has been extensively characterized in terms of tripeptide formation and substrate specificity. The first module is highly specific and is the only NRPS unit known to recruit and activate the substrate lâαâaminoadipic acid, which is coupled to the αâamino group of lâcysteine through an unusual peptide bond, involving its ÎŽâcarboxyl group. Here we carried out an inâdepth investigation on the architecture of the first module of the ACVS enzymes from the fungus Penicillium rubens and the bacterium Nocardia lactamdurans. Bioinformatic analyses revealed the presence of a previously unidentified domain at the Nâterminus which is structurally related to condensation domains, but smaller in size. Deletion variants of both enzymes were generated to investigate the potential impact on penicillin biosynthesis in vivo and in vitro. The data indicate that the Nâterminal domain is important for catalysis
The YqfN protein of Bacillus subtilis is the tRNA: m1A22 methyltransferase (TrmK)
N1-methylation of adenosine to m1A occurs in several different positions in tRNAs from various organisms. A methyl group at position N1 prevents WatsonâCrick-type base pairing by adenosine and is therefore important for regulation of structure and stability of tRNA molecules. Thus far, only one family of genes encoding enzymes responsible for m1A methylation at position 58 has been identified, while other m1A methyltransferases (MTases) remain elusive. Here, we show that Bacillus subtilis open reading frame yqfN is necessary and sufficient for N1-adenosine methylation at position 22 of bacterial tRNA. Thus, we propose to rename YqfN as TrmK, according to the traditional nomenclature for bacterial tRNA MTases, or TrMet(m1A22) according to the nomenclature from the MODOMICS database of RNA modification enzymes. tRNAs purified from a ÎtrmK strain are a good substrate in vitro for the recombinant TrmK protein, which is sufficient for m1A methylation at position 22 as are tRNAs from Escherichia coli, which natively lacks m1A22. TrmK is conserved in Gram-positive bacteria and present in some Gram-negative bacteria, but its orthologs are apparently absent from archaea and eukaryota. Protein structure prediction indicates that the active site of TrmK does not resemble the active site of the m1A58 MTase TrmI, suggesting that these two enzymatic activities evolved independently
High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource
The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today's annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed
The structure of the mouse ADAT2/ADAT3 complex reveals the molecular basis for mammalian tRNA wobble adenosine-to-inosine deamination
Post-transcriptional modification of tRNA wobble adenosine into inosine is crucial for decoding multiple mRNA codons by a single tRNA. The eukaryotic wobble adenosine-to-inosine modification is catalysed by the ADAT (ADAT2/ADAT3) complex that modifies up to eight tRNAs, requiring a full tRNA for activity. Yet, ADAT catalytic mechanism and its implication in neurodevelopmental disorders remain poorly understood. Here, we have characterized mouse ADAT and provide the molecular basis for tRNAs deamination by ADAT2 as well as ADAT3 inactivation by loss of catalytic and tRNA-binding determinants. We show that tRNA binding and deamination can vary depending on the cognate tRNA but absolutely rely on the eukaryote-specific ADAT3 N-terminal domain. This domain can rotate with respect to the ADAT catalytic domain to present and position the tRNA anticodon-stem-loop correctly in ADAT2 active site. A founder mutation in the ADAT3 N-terminal domain, which causes intellectual disability, does not affect tRNA binding despite the structural changes it induces but most likely hinders optimal presentation of the tRNA anticodon-stem-loop to ADAT2
Non-canonical CRP sites control competence regulons in Escherichia coli and many other Îł-proteobacteria
Escherichia coli's cAMP receptor protein (CRP), the archetypal bacterial transcription factor, regulates over a hundred promoters by binding 22 bp symmetrical sites with the consensus core half-site TGTGA. However, Haemophilus influenzae has two types of CRP sites, one like E.coli's and one with the core sequence TGCGA that regulates genes required for DNA uptake (natural competence). Only the latter âCRP-Sâ sites require both CRP and the coregulator Sxy for activation. To our knowledge, the TGTGA and TGCGA motifs are the first example of one transcription factor having two distinct binding-site motifs. Here we show that CRP-S promoters are widespread in the Îł-proteobacteria and demonstrate their Sxy-dependence in E.coli. Orthologs of most H.influenzae CRP-S-regulated genes are ubiquitous in the five best-studied Îł-proteobacteria families, Enterobacteriaceae, Pasteurellaceae, Pseudomonadaceae, Vibrionaceae and Xanthomonadaceae. Phylogenetic footprinting identified CRP-S sites in the promoter regions of the Enterobacteriaceae, Pasteurellaceae and Vibrionaceae orthologs, and canonical CRP sites in orthologs of genes known to be Sxy-independent in H.influenzae. Bandshift experiments confirmed that E.coli CRP-S sequences are low affinity binding sites for CRP, and mRNA analysis showed that they require CRP, cAMP (CRP's allosteric effector) and Sxy for gene induction. This work suggests not only that the Îł-proteobacteria share a common DNA uptake mechanism, but also that, in the three best studied families, their competence regulons share both CRP-S specificity and Sxy dependence
Uncovering Genes with Divergent mRNA-Protein Dynamics in Streptomyces coelicolor
Many biological processes are intrinsically dynamic, incurring profound changes at both molecular and physiological levels. Systems analyses of such processes incorporating large-scale transcriptome or proteome profiling can be quite revealing. Although consistency between mRNA and proteins is often implicitly assumed in many studies, examples of divergent trends are frequently observed. Here, we present a comparative transcriptome and proteome analysis of growth and stationary phase adaptation in Streptomyces coelicolor, taking the time-dynamics of process into consideration. These processes are of immense interest in microbiology as they pertain to the physiological transformations eliciting biosynthesis of many naturally occurring therapeutic agents. A shotgun proteomics approach based on mass spectrometric analysis of isobaric stable isotope labeled peptides (iTRAQâą) enabled identification and rapid quantification of approximately 14% of the theoretical proteome of S. coelicolor. Independent principal component analyses of this and DNA microarray-derived transcriptome data revealed that the prominent patterns in both protein and mRNA domains are surprisingly well correlated. Despite this overall correlation, by employing a systematic concordance analysis, we estimated that over 30% of the analyzed genes likely exhibited significantly divergent patterns, of which nearly one-third displayed even opposing trends. Integrating this data with biological information, we discovered that certain groups of functionally related genes exhibit mRNA-protein discordance in a similar fashion. Our observations suggest that differences between mRNA and protein synthesis/degradation mechanisms are prominent in microbes while reaffirming the plausibility of such mechanisms acting in a concerted fashion at a protein complex or sub-pathway level
GOBLET: The Global Organisation for Bioinformatics Learning, Education and Training
In recent years, high-throughput technologies have brought big data to the life sciences. The march of progress has been rapid, leaving in its wake a demand for courses in data analysis, data stewardship, computing fundamentals, etc., a need that universities have not yet been able to satisfyâparadoxically, many are actually closing ânicheâ bioinformatics courses at a time of critical need. The impact of this is being felt across continents, as many students and early-stage researchers are being left without appropriate skills to manage, analyse, and interpret their data with confidence. This situation has galvanised a group of scientists to address the problems on an international scale. For the first time, bioinformatics educators and trainers across the globe have come together to address common needs, rising above institutional and international boundaries to cooperate in sharing bioinformatics training expertise, experience, and resources, aiming to put ad hoc training practices on a more professional footing for the benefit of all
- âŠ