    BESC knowledgebase public portal†

    The BioEnergy Science Center (BESC) is undertaking large experimental campaigns to understand the biosynthesis and biodegradation of biomass and to develop biofuel solutions. BESC is generating large volumes of diverse data, including genome sequences, omics data and assay results. The purpose of the BESC Knowledgebase is to serve as a centralized repository for experimentally generated data and to provide an integrated, interactive and user-friendly analysis framework. The Portal makes available tools for visualization, integration and analysis of data either produced by BESC or obtained from external resources

    Systems Biology Knowledgebase for a New Era in Biology A Genomics:GTL Report from the May 2008 Workshop

    BER Science Network Requirements

    The Switchgrass Genome: Tools and Strategies

    Switchgrass ( L.) is a perennial grass species receiving significant focus as a potential bioenergy crop. In the last 5 yr the switchgrass research community has produced a genetic linkage map, an expressed sequence tag (EST) database, a set of single nucleotide polymorphism (SNP) markers that are distributed across the 18 linkage groups, 4x sampling of the AP13 genome in 400-bp reads, and bacterial artificial chromosome (BAC) libraries containing over 200,000 clones. These studies have revealed close collinearity of the switchgrass genome with those of sorghum [ (L.) Moench], rice ( L.), and (L.) P. Beauv. Switchgrass researchers have also developed several microarray technologies for gene expression studies. Switchgrass genomic resources will accelerate the ability of plant breeders to enhance productivity, pest resistance, and nutritional quality. Because switchgrass is a relative newcomer to the genomics world, many secrets of the switchgrass genome have yet to be revealed. To continue to efficiently explore basic and applied topics in switchgrass, it will be critical to capture and exploit the knowledge of plant geneticists and breeders on the next logical steps in the development and utilization of genomic resources for this species. To this end, the community has established a switchgrass genomics executive committee and work group ( [verified 28 Oct. 2011])

    Network and multi-scale signal analysis for the integration of large omic datasets: applications in \u3ci\u3ePopulus trichocarpa\u3c/i\u3e

    Poplar species are promising sources of cellulosic biomass for biofuels because of their fast growth rate, high cellulose content and moderate lignin content. There is an increasing movement on integrating multiple layers of ’omics data in a systems biology approach to understand gene-phenotype relationships and assist in plant breeding programs. This dissertation involves the use of network and signal processing techniques for the combined analysis of these various data types, for the goals of (1) increasing fundamental knowledge of P. trichocarpa and (2) facilitating the generation of hypotheses about target genes and phenotypes of interest. A data integration “Lines of Evidence” method is presented for the identification and prioritization of target genes involved in functions of interest. A new post-GWAS method, Pleiotropy Decomposition, is presented, which extracts pleiotropic relationships between genes and phenotypes from GWAS results, allowing for identification of genes with signatures favorable to genome editing. Continuous wavelet transform signal processing analysis is applied in the characterization of genome distributions of various features (including variant density, gene density, and methylation profiles) in order to identify chromosome structures such as the centromere. This resulted in the approximate centromere locations on all P. trichocarpa chromosomes, which had previously not been adequately reported in the scientific literature. Discrete wavelet transform signal processing followed by correlation analysis was applied to genomic features from various data types including transposable element density, methylation density, SNP density, gene density, centromere position and putative ancestral centromere position. Subsequent correlation analysis of the resulting wavelet coefficients identified scale-specific relationships between these genomic features, and provide insights into the evolution of the genome structure of P. trichocarpa. These methods have provided strategies to both increase fundamental knowledge about the P. trichocarpa system, as well as to identify new target genes related to biofuels targets. We intend that these approaches will ultimately be used in the designing of better plants for more efficient and sustainable production of bioenergy

    Insights from 20 years of bacterial genome sequencing

    Insights from 20 years of bacterial genome sequencing

    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome sequencing? There are many practical applications, such as genome-scale metabolic modeling, biosurveillance, bioforensics, and infectious disease epidemiology. In the near future, high-throughput sequencing of patient metagenomic samples could revolutionize medicine in terms of speed and accuracy of finding pathogens and knowing how to treat them

    Μελέτη της SOS απόκρισης στο αιθανολοπαραγωγό βακτήριο Zymomonas mobilis

Eπίσης, στο A. tumefaciens και κατόπιν SOS επαγωγής, υπήρξε οριακή αποκαταστολή των υποκινητών, αναδεικνύοντας και πάλι ότι με βελτίωση του ετερόλογου αυτού συστήματος δύναται να πραγματοποιηθεί η επιθυμητή μελέτη.The subject of this dissertation has been the study of the SOS response in Zymomonas mobilis. The SOS response is manifested in bacteria during conditions of extensive mutagenesis and DNA damage, and entails the coordinated induction of genes mainly involved in DNA replication and repair, and cell-cycle control. Regulators of genes expressed at SOS induction (the so-called SOS genes) are the proteins RecA and LexA, the first of which is the main homologous recombination protein and the second the transcriptional repressor of all SOS genes. In presence of damaged DNA and in specific ssDNA resulting from breaks, RecA is activated and assists LexA in its autocleavage and release from bound sites upstream from SOS genes. The last results in SOS-gene derepression, including that of genes coding for the SOS regulators. The gradual repair of lesions and the build-up of intact LexA stifle the response and restore the organism’s constitutive state. The SOS system has been predominantly studied in the γ-proteobacterium Escherichia coli; however, elements of its constituents are being continuously discovered in other proteobacterial classes, such as the α-proteobacteria, and even other bacterial phyla. Zymomonas mobilis is a fermentative α-proteobacterium with a considerable biotechnological impact due to its ability to produce ethanol at near perfect rates and yields, as well as other high-added-value products. Its ability to ferment starchy or lignocellulosic biomass hydrolysates has recently rendered the organism important for the production of first and second generation bioethanol. Our laboratory is working several decades now on Z. mobilis as a model organism and studies its genetics, genomics and amenability to strain engineering. The study of the SOS response in Z. mobilis was of interest in this doctoral thesis, as it would address the ability of the organism to withstand mutational stress. It would also lead to the extrapolation and characterization of the first inducible genetic system for this bacterium. Preliminary results enabling the study of SOS in Z. mobilis were: (a) the generation of a recA- knock-out strain for the industrial strain CP4, and (b) bioinformatic analyses of the recA and lexA genes and their predicted protein products, as well as whole-genome searches for LexA binding motifs (SOS boxes) in available Z. mobilis genome sequences, obtained by the candidate during previous research. In this thesis, the CP4 recA- knock-out strain constructed in the laboratory, namely UA1, was further studied to a molecular and phenotypic level. The insertion of the chloramphenicol acetyltransferase gene (catE) into the native recA gene by means of allele exchange was confirmed by PCR and southern hybridizations. Additionally, the resistance of UA1 to chloramphenicol, the insertional inactivation marker, was verified for many generations of UA1 growth without selection pressure, proving the stability of the new strain. UA1 was also studied for growth in complete and minimal medium, in aerobic and anaerobic conditions, and in the presence of mutagens such as ultraviolet light (UV) and the alkylating agent methyl methanesulfonate (MMS). It was generally slower in growth than the parental strain, less viable at stationary phase, and many orders of magnitude more sensitive to the mutagens. Finally, and in accordance with its recA- nature, UA1 was unable to perform homologous recombination. In order to examine whether the slow growth of UA1 also led to reduced ethanol production, the latter was monitored for UA1 and the parental CP4, at different stages of growth, by both liquid and gas chromatography. Both analytical methods demonstrated that UA1 produces 3.5 times more ethanol than CP4, and this difference is notably greater (5.6 times) at mid-log phase and when cell numbers are taken into account. Finally, to assess the usefulness of UA1 in genetic engineering applications, its ability to receive DNA via transformation and conjugation was tested. UA1 appeared to be transformed 4- to 10-times less than the parental CP4, while it received DNA in matings equally well to 10-fold less, depending on the plasmid transferred. The latter indicated that UA1 can sufficiently act as a host for foreign gene introduction purposes. recA-deficient strains of most microorganisms are extremely useful in genetic engineering endeavors, as they maintain the fidelity of the foreign genetic material introduced to them and eliminate homologous recombination-dependent rearrangements. Being the first recA-deficient Z. mobilis strain ever created, UA1 was submitted to the American Type Culture Collection (ATCC) and received the temporary code number AcqID-01168. In order to verify the existence of an SOS gene network in the parental to UA1 strain CP4, a computational search was performed for genes carrying conserved α-proteobacterial SOS boxes upstream from their transcriptional starts. Perfect boxes were identified upstream from 10 chromosomal and 1 plasmid genes, while boxes with one nucleotide mismatch were found upstream from 58 and 9 genes, respectively. Among these were DNA repair genes, metabolic, structural, respiratory, regulatory genes, and genes of other functions. Similar conserved operator sequences were found upstream from same alleles in all Z. mobilis subsp. mobilis strains tested – those whose genomes were to date known (ATCC 31821, ATCC 10988, ATCC 29191 and NCIMB 11163) – while differences were observed in the only sequenced representative of the Z. mobilis subsp. pomaceae taxon (strain ATCC 29192), which was found to carry several SOS genes in common with subsp. mobilis strains as well as new ones. SOS genes preceded by boxes with 0-1 nucleotide mismatch were also detected in the plasmids of all strains, with most such genes being notably strain-specific. Τhe SOS induction was extensively studied in strains UA1 and CP4 via transcriptome sequencing. The strains were challenged with the mutagen MMS at concentrations ranging from 0 to 0.2 mM for UA1 and to 15 mM for CP4. These concentrations were chosen on grounds of being inductive yet preventing survival drops more than one order of magnitude in each case. For transcriptomic analysis, an optimal RNA isolation protocol for Z. mobilis was developed and the samples sent for library construction and sequencing at the US DOE – Joint Genome Institute. Transcript analysis demonstrated that under normal conditions 97% of the predicted chromosome coding genes and 89% of plasmid genes are transcribed in both strains. In terms of differential expression, and at the low MMS concentrations employed in both strains, a total of 9 UA1 genes and 67 CP4 genes, different from those of UA1, appeared to be induced or suppressed. Among genes induced in CP4, present were three well-characterized SOS genes. In UA1 no DNA repair gene was induced, confirming its permanently suppressed phenotype. The group of CP4 genes that exhibited differential expression under SOS induction – the so-called SOS stimulon – numbered from 60 genes at the lowest mutagen concentration to 1,215 at the highest (73% of transcribed genes). Typical over-expressed genes, often co-transcribed, were genes for DNA repair, phage assembly, stress response, antimicrobial resistance and regulation, while characteristic under-expressed genes were those of the translation machinery. To determine the strictly-sensed SOS system in Z. mobilis, e.g. the SOS (LexA) regulon within the SOS stimulon, the DNA repair genes of CP4 were sought computationally, since repair genes stand as archetypical SOS members in all organisms studied. It appeared that CP4 carries 34 DNA repair genes orthologous to well-characterized ones of the γ-proteobacteria, 17 of which were induced by MMS at any of the concentrations employed. Upstream from all 17 genes, SOS boxes were detected with 0 to 3 nucleotide mismatches compared to the perfect consensus motif. This criterion trained the search for differentially expressed genes carrying similarly heterologous SOS boxes among all differentially expressed genes. 340 over-expressed and 310 under-expressed genes were detected, which apparently comprise the Z. mobilis SOS regulon. Apart from DNA repair genes, genes involved in transportation, protein folding, carbohydrate and amino acid metabolism, energy production and the translation machinery were encountered. SOS-regulated genes were also identified in the plasmids and found to be responsible for functions such as DNA transposition, DNA restriction and modification, and phage assembly. Most genes in this category are characterized as SOS-dependent for the first time. The determination of SOS genes in Z. mobilis led to the discovery of the particular motif characterizing the SOS box in the organism. This motif can serve in bioinformatic searches for SOS-regulated genes in yet other Z. mobilis strains or even pose as paradigm for SOS regulation in the α-proteobacteria in general. It appeared that in the Z. mobilis SOS-box motif, the internal dinucleotides of motif tetramers are over 95% conserved, in contrast to outer nucleotides of the tetramers (30-40% conserved). In addition, a trend for A/T preference was observed both in spacer regions and in regions flanking the motif. Finally, it was found that 87% of the SOS motifs are located up to 200 bp from translational starts of regulated genes. Examining the convergence between the in silico predictions and the experimental results of the transcriptomic analysis in the present study, it became apparent that the bioinformatic predictions for genes with SOS boxes bearing 0 or 1 nucleotide mismatch were 91.7% and 53.5% confirmed, respectively. However, these predictions accounted for only 9.4% of genes recognized in this work as SOS genes by means of both the computational and expression criteria set. Given that the search for boxes bearing 0 or 1 mismatch has been the norm in characterizing SOS genes