476 research outputs found
Towards Interoperability in Genome Databases: The MAtDB (MIPS Arabidopsis Thaliana Database) Experience
Increasing numbers of whole-genome sequences are available, but to interpret them fully requires more than listing all genes. Genome databases are faced
with the challenges of integrating heterogenous data and enabling data mining.
In comparison to a data warehousing approach, where integration is achieved
through replication of all relevant data in a unified schema, distributed approaches
provide greater flexibility and maintainability. These are important in a field
where new data is generated rapidly and our understanding of the data changes.
Interoperability between distributed data sources allows data maintenance to be
separated from integration and analysis. Simple ways to access the data can facilitate
the development of new data mining tools and the transition from model genome
analysis to comparative genomics. With the MIPS Arabidopsis thaliana genome
database (MAtDB, http://mips.gsf.de/proj/thal/db) our aim is to go beyond a data
repository towards creating an integrated knowledge resource. To this end, the
Arabidopsis genome has been a backbone against which to structure and integrate
heterogenous data. The challenges to be met are continuous updating of data, the
design of flexible data models that can evolve with new data, the integration of
heterogenous data, e.g. through the use of ontologies, comprehensive views and
visualization of complex information, simple interfaces for application access locally
or via the Internet, and knowledge transfer across species
Computational Tools for Brassica–Arabidopsis Comparative Genomics
Recent advances, such as the availability of extensive genome survey sequence (GSS)
data and draft physical maps, are radically transforming the means by which we
can dissect Brassica genome structure and systematically relate it to the Arabidopsis
model. Hitherto, our view of the co-linearities between these closely related genomes
had been largely inferred from comparative RFLP data, necessitating substantial
interpolation and expert interpretation. Sequencing of the Brassica rapa genome
by the Multinational Brassica Genome Project will, however, enable an entirely
computational approach to this problem. Meanwhile we have been developing
databases and bioinformatics tools to support our work in Brassica comparative
genomics, including a recently completed draft physical map of B. rapa integrated
with anchor probes derived from the Arabidopsis genome sequence. We are also
exploring new ways to display the emerging Brassica–Arabidopsis sequence homology
data. We have mapped all publicly available Brassica sequences in silico to the
Arabidopsis TIGR v5 genome sequence and published this in the ATIDB database
that uses Generic Genome Browser (GBrowse). This in silico approach potentially
identifies all paralogous sequences and so we colour-code the significance of the
mappings and offer an integrated, real-time multiple alignment tool to partition them
into paralogous groups. The MySQL database driving GBrowse can also be directly
interrogated, using the powerful API offered by the Perl Bio∷DB∷GFF methods,
facilitating a wide range of data-mining possibilities
Resistance gene enrichment sequencing (RenSeq) enables reannotation of the NB-LRR gene family from sequenced plant genomes and rapid mapping of resistance loci in segregating populations
RenSeq is a NB-LRR (nucleotide binding-site leucine-rich repeat) gene-targeted, Resistance gene enrichment and sequencing method that enables discovery and annotation of pathogen resistance gene family members in plant genome sequences. We successfully applied RenSeq to the sequenced potato Solanum tuberosum clone DM, and increased the number of identified NB-LRRs from 438 to 755. The majority of these identified R gene loci reside in poorly or previously unannotated regions of the genome. Sequence and positional details on the 12 chromosomes have been established for 704 NB-LRRs and can be accessed through a genome browser that we provide. We compared these NB-LRR genes and the corresponding oligonucleotide baits with the highest sequence similarity and demonstrated that ~80% sequence identity is sufficient for enrichment. Analysis of the sequenced tomato S. lycopersicum ‘Heinz 1706’ extended the NB-LRR complement to 394 loci. We further describe a methodology that applies RenSeq to rapidly identify molecular markers that co-segregate with a pathogen resistance trait of interest. In two independent segregating populations involving the wild Solanum species S. berthaultii (Rpi-ber2) and S. ruiz-ceballosii (Rpi-rzc1), we were able to apply RenSeq successfully to identify markers that co-segregate with resistance towards the late blight pathogen Phytophthora infestans. These SNP identification workflows were designed as easy-to-adapt Galaxy pipelines
The Biofuel Feedstock Genomics Resource: a web-based portal and database to enable functional genomics of plant biofuel feedstock species
Major feedstock sources for future biofuel production are likely to be high biomass producing plant species such as poplar, pine, switchgrass, sorghum and maize. One active area of research in these species is genome-enabled improvement of lignocellulosic biofuel feedstock quality and yield. To facilitate genomic-based investigations in these species, we developed the Biofuel Feedstock Genomic Resource (BFGR), a database and web-portal that provides high-quality, uniform and integrated functional annotation of gene and transcript assembly sequences from species of interest to lignocellulosic biofuel feedstock researchers. The BFGR includes sequence data from 54 species and permits researchers to view, analyze and obtain annotation at the gene, transcript, protein and genome level. Annotation of biochemical pathways permits the identification of key genes and transcripts central to the improvement of lignocellulosic properties in these species. The integrated nature of the BFGR in terms of annotation methods, orthologous/paralogous relationships and linkage to seven species with complete genome sequences allows comparative analyses for biofuel feedstock species with limited sequence resources
REGIA, An EU Project on Functional Genomics of Transcription Factors From Arabidopsis Thaliana
Transcription factors (TFs) are regulatory proteins that have played a pivotal role in the
evolution of eukaryotes and that also have great biotechnological potential. REGIA
(REgulatory Gene Initiative in Arabidopsis) is an EU-funded project involving 29
European laboratories with the objective of determining the function of virtually all
transcription factors from the model plant, Arabidopsis thaliana. REGIA involves: 1. the
definition of TF gene expression patterns in Arabidopsis; 2. the identification of mutations
at TF loci; 3. the ectopic expression of TFs (or derivatives) in Arabidopsis and in crop
plants; 4. phenotypic analysis of the mutants and mis-expression lines, including both RNA
and metabolic profiling; 5. the systematic analysis of interactions between TFs; and 6. the
generation of a bioinformatics infrastructure to access and integrate all this information.
We expect that this programme will establish the full biotechnological potential of plant
TFs, and provide insights into hierarchies, redundancies, and interdependencies, and their
evolution. The project involves the preparation of both a TF gene array for expression
analysis and a normalised full length open reading frame (ORF) library of TFs in a yeast
two hybrid vector; the applications of these resources should extend beyond the scope of
this programme
Plant Ontology (PO): a Controlled Vocabulary of Plant Structures and Growth Stages
The Plant Ontology Consortium (POC) (www.plantontology.org) is a collaborative
effort among several plant databases and experts in plant systematics, botany
and genomics. A primary goal of the POC is to develop simple yet robust
and extensible controlled vocabularies that accurately reflect the biology of plant
structures and developmental stages. These provide a network of vocabularies linked
by relationships (ontology) to facilitate queries that cut across datasets within
a database or between multiple databases. The current version of the ontology
integrates diverse vocabularies used to describe Arabidopsis, maize and rice (Oryza
sp.) anatomy, morphology and growth stages. Using the ontology browser, over 3500
gene annotations from three species-specific databases, The Arabidopsis Information
Resource (TAIR) for Arabidopsis, Gramene for rice and MaizeGDB for maize, can
now be queried and retrieved
Genome-wide evidence for local DNA methylation spreading from small RNA-targeted sequences in Arabidopsis
Transposable elements (TEs) and their relics play major roles in genome evolution. However, mobilization of TEs is usually deleterious and strongly repressed. In plants and mammals, this repression is typically associated with DNA methylation, but the relationship between this epigenetic mark and TE sequences has not been investigated systematically. Here, we present an improved annotation of TE sequences and use it to analyze genome-wide DNA methylation maps obtained at single-nucleotide resolution in Arabidopsis. We show that although the majority of TE sequences are methylated, ∼26% are not. Moreover, a significant fraction of TE sequences densely methylated at CG, CHG and CHH sites (where H = A, T or C) have no or few matching small interfering RNA (siRNAs) and are therefore unlikely to be targeted by the RNA-directed DNA methylation (RdDM) machinery. We provide evidence that these TE sequences acquire DNA methylation through spreading from adjacent siRNA-targeted regions. Further, we show that although both methylated and unmethylated TE sequences located in euchromatin tend to be more abundant closer to genes, this trend is least pronounced for methylated, siRNA-targeted TE sequences located 5′ to genes. Based on these and other findings, we propose that spreading of DNA methylation through promoter regions explains at least in part the negative impact of siRNA-targeted TE sequences on neighboring gene expression
A genomic analysis of disease-resistance genes encoding nucleotide binding sites in Sorghum bicolor
A large set of candidate nucleotide-binding site (NBS)-encoding genes related to disease resistance was identified in the sorghum (Sorghum bicolor) genome. These resistance (R) genes were characterized based on their structural diversity, physical chromosomal location and phylogenetic relationships. Based on their N-terminal motifs and leucine-rich repeats (LRR), 50 non-regular NBS genes and 224 regular NBS genes were identified in 274 candidate NBS genes. The regular NBS genes were classified into ten types: CNL, CN, CNLX, CNX, CNXL, CXN, NX, N, NL and NLX. The vast majority (97%) of NBS genes occurred in gene clusters, indicating extensive gene duplication in the evolution of S. bicolor NBS genes. Analysis of the S. bicolor NBS phylogenetic tree revealed two major clades. Most NBS genes were located at the distal tip of the long arms of the ten sorghum chromosomes, a pattern significantly different from rice and Arabidopsis, the NBS genes of which have a random chromosomal distribution
Cytogenetic characterization and genome size of the medicinal plant Catharanthus roseus (L.) G. Don
The genome size and organization of the important medicinal plant Catharanthus roseus is shown to correspond to 1C = 0.76 pg (~738 Mbps) and 2n = 16 chromosomes. The data provide a sound basis for future studies including cytogenetic mapping, genomics and breeding
- …