1,963 research outputs found

    Statistical Viewer: a tool to upload and integrate linkage and association data as plots displayed within the Ensembl genome browser

    Get PDF
    BACKGROUND: To facilitate efficient selection and the prioritization of candidate complex disease susceptibility genes for association analysis, increasingly comprehensive annotation tools are essential to integrate, visualize and analyze vast quantities of disparate data generated by genomic screens, public human genome sequence annotation and ancillary biological databases. We have developed a plug-in package for Ensembl called "Statistical Viewer" that facilitates the analysis of genomic features and annotation in the regions of interest defined by linkage analysis. RESULTS: Statistical Viewer is an add-on package to the open-source Ensembl Genome Browser and Annotation System that displays disease study-specific linkage and/or association data as 2 dimensional plots in new panels in the context of Ensembl's Contig View and Cyto View pages. An enhanced upload server facilitates the upload of statistical data, as well as additional feature annotation to be displayed in DAS tracts, in the form of Excel Files. The Statistical View panel, drawn directly under the ideogram, illustrates lod score values for markers from a study of interest that are plotted against their position in base pairs. A module called "Get Map" easily converts the genetic locations of markers to genomic coordinates. The graph is placed under the corresponding ideogram features a synchronized vertical sliding selection box that is seamlessly integrated into Ensembl's Contig- and Cyto- View pages to choose the region to be displayed in Ensembl's "Overview" and "Detailed View" panels. To resolve Association and Fine mapping data plots, a "Detailed Statistic View" plot corresponding to the "Detailed View" may be displayed underneath. CONCLUSION: Features mapping to regions of linkage are accentuated when Statistic View is used in conjunction with the Distributed Annotation System (DAS) to display supplemental laboratory information such as differentially expressed disease genes in private data tracks. Statistic View is a novel and powerful visual feature that enhances Ensembl's utility as valuable resource for integrative genomic-based approaches to the identification of candidate disease susceptibility genes. At present there are no other tools that provide for the visualization of 2-dimensional plots of quantitative data scores against genomic coordinates in the context of a primary public genome annotation browser

    MaizeGDB's new data types, resources and activities

    Get PDF
    MaizeGDB is the Maize Genetics and Genomics Database. Available at MaizeGDB are diverse data that support maize research including maps, gene product information, loci and their various alleles, phenotypes (both naturally occurring and as a result of directed mutagenesis), stocks, sequences, molecular markers, references and contact information for maize researchers worldwide. Also available through MaizeGDB are various community support service bulletin boards including the Editorial Board's list of high-impact papers, information about the Annual Maize Genetics Conference and the Jobs board where employment opportunities are posted. Reported here are data updates, improvements to interfaces and changes to standard operating procedures that have been made during the past 2 years. MaizeGDB is freely available and can be accessed online at

    TRAM (Transcriptome Mapper): database-driven creation and analysis of transcriptome maps from multiple sources

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Several tools have been developed to perform global gene expression profile data analysis, to search for specific chromosomal regions whose features meet defined criteria as well as to study neighbouring gene expression. However, most of these tools are tailored for a specific use in a particular context (e.g. they are species-specific, or limited to a particular data format) and they typically accept only gene lists as input.</p> <p>Results</p> <p>TRAM (Transcriptome Mapper) is a new general tool that allows the simple generation and analysis of quantitative transcriptome maps, starting from any source listing gene expression values for a given gene set (e.g. expression microarrays), implemented as a relational database. It includes a parser able to assign univocal and updated gene symbols to gene identifiers from different data sources. Moreover, TRAM is able to perform intra-sample and inter-sample data normalization, including an original variant of quantile normalization (scaled quantile), useful to normalize data from platforms with highly different numbers of investigated genes. When in 'Map' mode, the software generates a quantitative representation of the transcriptome of a sample (or of a pool of samples) and identifies if segments of defined lengths are over/under-expressed compared to the desired threshold. When in 'Cluster' mode, the software searches for a set of over/under-expressed consecutive genes. Statistical significance for all results is calculated with respect to genes localized on the same chromosome or to all genome genes. Transcriptome maps, showing differential expression between two sample groups, relative to two different biological conditions, may be easily generated. We present the results of a biological model test, based on a meta-analysis comparison between a sample pool of human CD34+ hematopoietic progenitor cells and a sample pool of megakaryocytic cells. Biologically relevant chromosomal segments and gene clusters with differential expression during the differentiation toward megakaryocyte were identified.</p> <p>Conclusions</p> <p>TRAM is designed to create, and statistically analyze, quantitative transcriptome maps, based on gene expression data from multiple sources. The release includes FileMaker Pro database management runtime application and it is freely available at <url>http://apollo11.isto.unibo.it/software/</url>, along with preconfigured implementations for mapping of human, mouse and zebrafish transcriptomes.</p

    The Sorghum bicolor reference genome: improved assembly, gene annotations, a transcriptome atlas, and signatures of genome organization.

    Get PDF
    Sorghum bicolor is a drought tolerant C4 grass used for the production of grain, forage, sugar, and lignocellulosic biomass and a genetic model for C4 grasses due to its relatively small genome (approximately 800&nbsp;Mbp), diploid genetics, diverse germplasm, and colinearity with other C4 grass genomes. In this study, deep sequencing, genetic linkage analysis, and transcriptome data were used to produce and annotate a high-quality reference genome sequence. Reference genome sequence order was improved, 29.6&nbsp;Mbp of additional sequence was incorporated, the number of genes annotated increased 24% to 34&nbsp;211, average gene length and N50 increased, and error frequency was reduced 10-fold to 1 per 100&nbsp;kbp. Subtelomeric repeats with characteristics of Tandem Repeats in Miniature (TRIM) elements were identified at the termini of most chromosomes. Nucleosome occupancy predictions identified nucleosomes positioned immediately downstream of transcription start sites and at different densities across chromosomes. Alignment of more than 50 resequenced genomes from diverse sorghum genotypes to the reference genome identified approximately 7.4&nbsp;M single nucleotide polymorphisms (SNPs) and 1.9&nbsp;M indels. Large-scale variant features in euchromatin were identified with periodicities of approximately 25&nbsp;kbp. A transcriptome atlas of gene expression was constructed from 47 RNA-seq profiles of growing and developed tissues of the major plant organs (roots, leaves, stems, panicles, and seed) collected during the juvenile, vegetative and reproductive phases. Analysis of the transcriptome data indicated that tissue type and protein kinase expression had large influences on transcriptional profile clustering. The updated assembly, annotation, and transcriptome data represent a resource for C4 grass research and crop improvement

    Mirage: A Novel Multiple Protein Sequence Alignment Tool

    Get PDF
    A fundamental problem in computational biology is the organization of many related sequences into a multiple sequence alignment (MSA) [2]. MSAs have a range of research applications, such as inferring phylogeny [22] and identifying regions of conserved sequence that indicate functional similarity [18]. In the case of protein isoforms, MSAs are valuable tools for transitively annotating post-translational modifications (PTMs) by enabling information transfer between known PTM sites and the sites that they align to [11]. For protein MSA tools, one challenging biological phenomenon is alternative splicing, wherein identical genomic sequence will differentially select from a subset of available coding regions (exons), depending on the biochemical environment [21]. Traditional methods struggle to align the islands of non-homologous sequence produced by alternative splicing, and frequently compensate for the penalties incurred from aligning non-identical characters by aligning small pieces of relatively similar sequence from alternative exons in a way that avoids extreme gap penalties but falsely indicates sequence homology. Presented here is Mirage, a novel protein MSA tool capable of accurately aligning alternatively spliced proteins by first mapping proteins to the genomic sequence that encoded them and then aligning proteins to one another based on the relative positions of their coding DNA. This method of transitive alignment demonstrates an awareness of intron splice site locations and resolves the problems associated with alternative splicing in traditional MSA tools

    DArT markers for the rye genome - genetic diversity and mapping

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Implementation of molecular breeding in rye (<it>Secale cereale </it>L.) improvement programs depends on the availability of high-density molecular linkage maps. However, the number of sequence-specific PCR-based markers available for the species is limited. Diversity Arrays Technology (DArT) is a microarray-based method allowing for detection of DNA polymorphism at several thousand loci in a single assay without relying on DNA sequence information. The objective of this study was the development and application of Diversity Arrays technology for rye.</p> <p>Results</p> <p>Using the <it>Pst</it>I/<it>Taq</it>I method of complexity reduction we created a rye diversity panel from DNA of 16 rye varieties and 15 rye inbred lines, including parents of a mapping population consisting of 82 recombinant inbred lines. The usefulness of a wheat diversity panel for identification of DArT markers for rye was also demonstrated. We identified 1022 clones that were polymorphic in the genotyped ILs and varieties and 1965 clones that differentiated the parental lines L318 and L9 and segregated in the mapping population. Hierarchical clustering and ordination analysis were performed based on the 1022 DArT markers to reveal genetic relationships between the rye varieties and inbred lines included in the study. Chromosomal location of 1872 DArT markers was determined using wheat-rye addition lines and 1818 DArT markers (among them 1181 unique, non-cosegregating) were placed on a genetic linkage map of the cross L318 × L9, providing an average density of one unique marker every 2.68 cM. This is the most saturated rye linkage map based solely on transferable markers available at the moment, providing rye breeders and researches with a better choice of markers and a higher probability of finding polymorphic markers in the region of interest.</p> <p>Conclusion</p> <p>The Diversity Arrays Technology can be efficiently and effectively used for rye genome analyses - assessment of genetic similarity and linkage mapping. The 11520-clone rye genotyping panel with several thousand markers with determined chromosomal location and accessible through an inexpensive genotyping service is a valuable resource for studies on rye genome organization and in molecular breeding of the species.</p

    USDA Plant Genome Research Program

    Get PDF
    The U.S. Congress appropriated funds in 1991 for the USDA Plant Genome Research Program, four years after its initial conception in 1987. The goal of the USDA Plant Genome Research Program is to improve plants (agronomic, horticultural, and forest tree species) by locating marker DNA or genes on chromosomes, determining gene structure, and transferring genes to improve plant performance with accompanying reduced environmental impact to meet marketplace needs and niches. The Plant Genome Research Program is one program with two parts: National Research Initiative and Plant Genome Database (PGD). The PGD is now a real and functioning information and data resource for agricultural and other plant science genome researchers, and it is in the public domain. Additional progress is given according to major plant groups. The PGD is a suite of several information products produced at the National Agricultural Library (NAL) in collaboration with the Agricultural Research Service and Forest Service species coordinators
    corecore