47 research outputs found
PHYLOGENOMICS - GUIDED VALIDATION OF FUNCTION FOR CONSERVED UNKNOWN GENES
Identifying functions for all gene products in all sequenced organisms is a central challenge of the post-genomic era. However, at least 30-50% of the proteins encoded by any given genome are of unknown function, or wrongly or vaguely annotated. Many of these 'unknown' proteins are common to prokaryotes and plants. We accordingly set out to predict and experimentally test the functions of such proteins. Our approach to functional prediction is integrative, coupling the extensive post-genomic resources available for plants with comparative genomics based on hundreds of microbial genomes, and functional genomic datasets from model microorganisms. The early phase is computer-assisted; later phases incorporate intellectual input from expert plant and microbial biochemists. The approach thus bridges the gap between automated homology-based annotations and the classical gene discovery efforts of experimentalists, and is much more powerful than purely computational approaches to identifying gene-function associations. Among Arabidopsis genes, we focused on those (2,325 in total) that (i) are unique or belong to families with no more than three members, (ii) are conserved between plants and prokaryotes, and (iii) have unknown or poorly known functions. Computer-assisted selection of promising targets for deeper analysis was based on homology .. independent characteristics associated in the SEED database with the prokaryotic members of each family, specifically gene clustering and phyletic spread, as well as availability of functional genomics data, and publications that could link candidate families to general metabolic areas, or to specific functions. In-depth comparative genomic analysis was then performed for about 500 top candidate families, which connected ~55 of them to general areas of metabolism and led to specific functional predictions for a subset of ~25 more. Twenty predicted functions were experimentally tested in at least one prokaryotic organism via reverse genetics, metabolic profiling, functional complementation, and recombinant protein biochemistry. Our approach predicted and validated functions for 10 formerly uncharacterized protein families common to plants and prokaryotes; none of these functions had previously been correctly predicted by computational methods. The functions of five more are currently being validated. Experimental testing of diverse representatives of these families combined with in silica analysis allowed accurate projection of the annotations to hundreds more sequenced genomes
Identification of a conserved N-terminal domain in the first module of ACV synthetases
Abstract The lâδâ(Îąâaminoadipoyl)âlâcysteinylâdâvaline synthetase (ACVS) is a trimodular nonribosomal peptide synthetase (NRPS) that provides the peptide precursor for the synthesis of βâlactams. The enzyme has been extensively characterized in terms of tripeptide formation and substrate specificity. The first module is highly specific and is the only NRPS unit known to recruit and activate the substrate lâÎąâaminoadipic acid, which is coupled to the Îąâamino group of lâcysteine through an unusual peptide bond, involving its δâcarboxyl group. Here we carried out an inâdepth investigation on the architecture of the first module of the ACVS enzymes from the fungus Penicillium rubens and the bacterium Nocardia lactamdurans. Bioinformatic analyses revealed the presence of a previously unidentified domain at the Nâterminus which is structurally related to condensation domains, but smaller in size. Deletion variants of both enzymes were generated to investigate the potential impact on penicillin biosynthesis in vivo and in vitro. The data indicate that the Nâterminal domain is important for catalysis
High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource
The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today's annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed
Uncovering Genes with Divergent mRNA-Protein Dynamics in Streptomyces coelicolor
Many biological processes are intrinsically dynamic, incurring profound changes at both molecular and physiological levels. Systems analyses of such processes incorporating large-scale transcriptome or proteome profiling can be quite revealing. Although consistency between mRNA and proteins is often implicitly assumed in many studies, examples of divergent trends are frequently observed. Here, we present a comparative transcriptome and proteome analysis of growth and stationary phase adaptation in Streptomyces coelicolor, taking the time-dynamics of process into consideration. These processes are of immense interest in microbiology as they pertain to the physiological transformations eliciting biosynthesis of many naturally occurring therapeutic agents. A shotgun proteomics approach based on mass spectrometric analysis of isobaric stable isotope labeled peptides (iTRAQâ˘) enabled identification and rapid quantification of approximately 14% of the theoretical proteome of S. coelicolor. Independent principal component analyses of this and DNA microarray-derived transcriptome data revealed that the prominent patterns in both protein and mRNA domains are surprisingly well correlated. Despite this overall correlation, by employing a systematic concordance analysis, we estimated that over 30% of the analyzed genes likely exhibited significantly divergent patterns, of which nearly one-third displayed even opposing trends. Integrating this data with biological information, we discovered that certain groups of functionally related genes exhibit mRNA-protein discordance in a similar fashion. Our observations suggest that differences between mRNA and protein synthesis/degradation mechanisms are prominent in microbes while reaffirming the plausibility of such mechanisms acting in a concerted fashion at a protein complex or sub-pathway level
GOBLET: the Global Organisation for Bioinformatics Learning, Education and Training
In recent years, high-throughput technologies have brought big data to the life sciences. The march of progress has been rapid, leaving in its wake a demand for courses in data analysis, data stewardship, computing fundamentals, etc., a need that universities have not yet been able to satisfy--paradoxically, many are actually closing "niche" bioinformatics courses at a time of critical need. The impact of this is being felt across continents, as many students and early-stage researchers are being left without appropriate skills to manage, analyse, and interpret their data with confidence. This situation has galvanised a group of scientists to address the problems on an international scale. For the first time, bioinformatics educators and trainers across the globe have come together to address common needs, rising above institutional and international boundaries to cooperate in sharing bioinformatics training expertise, experience, and resources, aiming to put ad hoc training practices on a more professional footing for the benefit of all
GOBLET: The Global Organisation for Bioinformatics Learning, Education and Training
In recent years, high-throughput technologies have brought big data to the life sciences. The march of progress has been rapid, leaving in its wake a demand for courses in data analysis, data stewardship, computing fundamentals, etc., a need that universities have not yet been able to satisfyâparadoxically, many are actually closing ânicheâ bioinformatics courses at a time of critical need. The impact of this is being felt across continents, as many students and early-stage researchers are being left without appropriate skills to manage, analyse, and interpret their data with confidence. This situation has galvanised a group of scientists to address the problems on an international scale. For the first time, bioinformatics educators and trainers across the globe have come together to address common needs, rising above institutional and international boundaries to cooperate in sharing bioinformatics training expertise, experience, and resources, aiming to put ad hoc training practices on a more professional footing for the benefit of all
Computing with bacterial constituents, cells and populations: from bioputing to bactoputing
The relevance of biological materials and processes to computingâaliasbioputingâhas been explored for decades. These materials include DNA, RNA and proteins, while the processes include transcription, translation, signal transduction and regulation. Recently, the use of bacteria themselves as living computers has been explored but this use generally falls within the classical paradigm of computing. Computer scientists, however, have a variety of problems to which they seek solutions, while microbiologists are having new insights into the problems bacteria are solving and how they are solving them. Here, we envisage that bacteria might be used for new sorts of computing. These could be based on the capacity of bacteria to grow, move and adapt to a myriad different fickle environments both as individuals and as populations of bacteria plus bacteriophage. New principles might be based on the way that bacteria explore phenotype space via hyperstructure dynamics and the fundamental nature of the cell cycle. This computing might even extend to developing a high level language appropriate to using populations of bacteria and bacteriophage. Here, we offer a speculative tour of what we term bactoputing, namely the use of the natural behaviour of bacteria for calculating
The Complete Genome Sequence of âCandidatus Liberibacter solanacearumâ, the Bacterium Associated with Potato Zebra Chip Disease
Zebra Chip (ZC) is an emerging plant disease that causes aboveground decline of
potato shoots and generally results in unusable tubers. This disease has led to
multi-million dollar losses for growers in the central and western United States
over the past decade and impacts the livelihood of potato farmers in Mexico and
New Zealand. ZC is associated with âCandidatus
Liberibacter solanacearumâ, a fastidious alpha-proteobacterium that is
transmitted by a phloem-feeding psyllid vector, Bactericera
cockerelli Sulc. Research on this disease has been hampered by a
lack of robust culture methods and paucity of genome sequence information for
âCa. L. solanacearumâ. Here we present the
sequence of the 1.26 Mbp metagenome of âCa. L.
solanacearumâ, based on DNA isolated from potato psyllids. The coding
inventory of the âCa. L. solanacearumâ genome was
analyzed and compared to related Rhizobiaceae to better
understand âCa. L. solanacearumâ physiology and
identify potential targets to develop improved treatment strategies. This
analysis revealed a number of unique transporters and pathways, all potentially
contributing to ZC pathogenesis. Some of these factors may have been acquired
through horizontal gene transfer. Taxonomically, âCa. L.
solanacearumâ is related to âCa. L.
asiaticusâ, a suspected causative agent of citrus huanglongbing, yet many
genome rearrangements and several gene gains/losses are evident when comparing
these two Liberibacter. species. Relative to âCa. L.
asiaticusâ, âCa. L. solanacearumâ probably
has reduced capacity for nucleic acid modification, increased amino acid and
vitamin biosynthesis functionalities, and gained a high-affinity iron transport
system characteristic of several pathogenic microbes