31 research outputs found
GMOD for Evolutionary Biology
The Generic Model Organism Database (GMOD, "http://gmod.org":http://gmod.org) project provides interoperable, open source software tools for managing, visualizing and annotating biological data. GMOD is also a community of people addressing common challenges with biological data. Some well known software in GMOD includes GBrowse and JBrowse for genome browsing, Apollo for genome annotation, Chado for managing data, CMap for comparative map viewing, Galaxy for workflow creation and persistence, and BioMart for warehousing biological data.

This talk will focus on three areas of particular interest to iEvoBio participants. 
1) GBrowse_syn comparative genomics viewer
2) Natural Diversity Module of the Chado database schema
3) GMOD evolutionary biology hackathon 

The GBrowse_syn comparative genomics viewer displays synteny between a reference and any number of related species. It shows inversions, duplications, and indels, and can show synteny across non-contiguous regions. It is built on the widely used GBrowse genome viewer. The Natural Diversity Module is an extension to GMOD’s Chado database schema to enable Chado to support natural diversity, population genomics, individuals, breeding, phenotypes and geolocation information. This module is the first extension to Chado to be designed by the community, rather than at one institution. We will close by soliciting nominations and ideas for a GMOD Evolutionary Biology Hackathon. This hackathon will be held November 8-12, at NESCent, which is sponsoring the event. There will be an open call for participation in August.

SGN Database: From QTLs to Genomes
Quantitative trait loci (QTL) analysis is used to dissect the genetic basis underlying polygenic traits. Several public databases have been storing and making QTL data available to research communities. To our knowledge, current QTL databases rely on manual curation where curators read literature and extract relevant QTL information to store in databases. Evidently, this approach is expensive in terms of expert manpower and time use and limits the type of data that can be curated. At the Solanaceae Genomics Network (SGN) ("http://sgn.cornell.edu":http://sgn.cornell.edu), we have developed a database to store raw phenotype and genotype data from QTL studies, perform, on the fly, QTL analysis using R/QTL statistical software ("http://www.rqtl.org":http://www.rqtl.org) and visualize QTLs on a genetic map. Users can identify peak, and flanking markers for QTLs of traits of interest. The QTL database is integrated with other SGN databases (eg. Marker, BACs, and Unigenes), and analysis tools such as the Comparative Map Viewer. Using the comparative map viewer, users can compare chromosome with QTL regions to genetic maps of interest from the same or different Solanaceae species. As the tomato genome sequencing advances, users can also identify corresponding BAC sequences or locations on the tomato physical map, which can be suggestive of candidate genes for a trait of interest.

Furthermore at SGN, images, quantitative phenotype and genotype data, publications, genetic maps generated by QTL studies are displayed and available for download. Currently, data from three F2 and two backcross population QTL studies on fruit morphology traits (18 – 46 traits per population) is available at the SGN website for viewing at population, accession, and trait levels. Traits are described using ontology terms. Phenotype data is presented in tabular and graphical formats such as frequency distributions with basic descriptive statistics. Mapping data showing location of parental alleles on individual accession genetic maps is also available.

SGN is a public database hosted at Boyce Thomson Institute, Cornell University, and funded by USDA CSREES and NSF
The SOL Genomics Network Model: Making Community Annotation Work
The concept of community annotation is a growing discipline for achieving participation of the research community in depositing up‐to‐date knowledge in biological databases.
The Solanaceae Genomics Network ("SGN":http://sgn.cornell.edu/) is a clade‐oriented database (COD) focusing on plants of the nightshade family, including tomato, potato, pepper, eggplant, and tobacco, and is one of the bioinformatics nodes of the international tomato genome sequencing project. One of our major efforts is linking Solanaceae phenotype information with the underlying genes, and subsequently the genome. As part of this goal, SGN has introduced a database for locus names and descriptors, and a database for phenotypes of natural and induced variation. These two databases have web interfaces that allow cross references, associations with tomato gene models, and in‐house curated information of sequences, literature, ontologies, gene networks, and the Solanaceae biochemical pathways database ("SolCyc":http://solcyc.sgn.cornell.edu). All of our curator tools are open for online community annotation, through specially assigned “submitter” accounts. 

Currently the community database consists of 5,548 phenotyped accessions, and 5,739 curated loci, out of which more than 300 loci where contributed or annotated by 66 active submitters, creating a database that is truly community driven.
This framework is easily adaptable for other projects working on other taxa (for example see "http://chlamybase.org":http://chlamybase.org), greatly expanding the application of this user‐friendly online annotation system. Community participation is fostered by an active outreach program that includes contacting potential submitters via emails, at meetings and conferences, and by promoting featured user submitted annotations on the SGN homepage. The source code and database schema for all SGN functionalities are freely available. Please contact SGN at "sgn‐feedback[at]sgn.cornell.edu":mailto:[email protected] for more information
solQTL: a tool for QTL analysis, visualization and linking to genomes at SGN database
BACKGROUND: A common approach to understanding the genetic basis of complex traits is through identification of associated quantitative trait loci (QTL). Fine mapping QTLs requires several generations of backcrosses and analysis of large populations, which is time-consuming and costly effort. Furthermore, as entire genomes are being sequenced and an increasing amount of genetic and expression data are being generated, a challenge remains: linking phenotypic variation to the underlying genomic variation. To identify candidate genes and understand the molecular basis underlying the phenotypic variation of traits, bioinformatic approaches are needed to exploit information such as genetic map, expression and whole genome sequence data of organisms in biological databases. DESCRIPTION: The Sol Genomics Network (SGN, http://solgenomics.net) is a primary repository for phenotypic, genetic, genomic, expression and metabolic data for the Solanaceae family and other related Asterids species and houses a variety of bioinformatics tools. SGN has implemented a new approach to QTL data organization, storage, analysis, and cross-links with other relevant data in internal and external databases. The new QTL module, solQTL, http://solgenomics.net/qtl/, employs a user-friendly web interface for uploading raw phenotype and genotype data to the database, R/QTL mapping software for on-the-fly QTL analysis and algorithms for online visualization and cross-referencing of QTLs to relevant datasets and tools such as the SGN Comparative Map Viewer and Genome Browser. Here, we describe the development of the solQTL module and demonstrate its application. CONCLUSIONS: solQTL allows Solanaceae researchers to upload raw genotype and phenotype data to SGN, perform QTL analysis and dynamically cross-link to relevant genetic, expression and genome annotations. Exploration and synthesis of the relevant data is expected to help facilitate identification of candidate genes underlying phenotypic variation and markers more closely linked to QTLs. solQTL is freely available on SGN and can be used in private or public mode
Comparative BAC end sequence analysis of tomato and potato reveals overrepresentation of specific gene families in potato
<p>Abstract</p> <p>Background</p> <p>Tomato (<it>Solanum lycopersicon</it>) and potato (<it>S. tuberosum</it>) are two economically important crop species, the genomes of which are currently being sequenced. This study presents a first genome-wide analysis of these two species, based on two large collections of BAC end sequences representing approximately 19% of the tomato genome and 10% of the potato genome.</p> <p>Results</p> <p>The tomato genome has a higher repeat content than the potato genome, primarily due to a higher number of retrotransposon insertions in the tomato genome. On the other hand, simple sequence repeats are more abundant in potato than in tomato. The two genomes also differ in the frequency distribution of SSR motifs. Based on EST and protein alignments, potato appears to contain up to 6,400 more putative coding regions than tomato. Major gene families such as cytochrome P450 mono-oxygenases and serine-threonine protein kinases are significantly overrepresented in potato, compared to tomato. Moreover, the P450 superfamily appears to have expanded spectacularly in both species compared to <it>Arabidopsis thaliana</it>, suggesting an expanded network of secondary metabolic pathways in the <it>Solanaceae</it>. Both tomato and potato appear to have a low level of microsynteny with <it>A. thaliana</it>. A higher degree of synteny was observed with <it>Populus trichocarpa</it>, specifically in the region between 15.2 and 19.4 Mb on <it>P. trichocarpa </it>chromosome 10.</p> <p>Conclusion</p> <p>The findings in this paper present a first glimpse into the evolution of Solanaceous genomes, both within the family and relative to other plant species. When the complete genome sequences of these species become available, whole-genome comparisons and protein- or repeat-family specific studies may shed more light on the observations made here.</p
JBrowse: a dynamic web platform for genome visualization and analysis
BACKGROUND: JBrowse is a fast and full-featured genome browser built with JavaScript and HTML5. It is easily embedded into websites or apps but can also be served as a standalone web page. RESULTS: Overall improvements to speed and scalability are accompanied by specific enhancements that support complex interactive queries on large track sets. Analysis functions can readily be added using the plugin framework; most visual aspects of tracks can also be customized, along with clicks, mouseovers, menus, and popup boxes. JBrowse can also be used to browse local annotation files offline and to generate high-resolution figures for publication. CONCLUSIONS: JBrowse is a mature web application suitable for genome visualization and analysis
Recommended from our members
JBrowse Connect: A server API to connect JBrowse instances and users
We describe JBrowse Connect, an optional expansion to the JBrowse genome browser, targeted at developers. JBrowse Connect allows live messaging, notifications for new annotation tracks, heavy-duty analyses initiated by the user from within the browser, and other dynamic features. We present example applications of JBrowse Connect that allow users 1) to specify and execute BLAST searches by either running on the same host as the webserver, with a self-contained BLAST module leveraging NCBI Blast+ commands, or via a managed Galaxy instance that can optionally run on a different host, and 2) to run the primer design service Primer3. JBrowse Connect allows users to track job progress and view results in the context of the browser. The software is available under a choice of open source licenses including LGPL and the Artistic License