13,627 research outputs found

    Improved annotation of 3' untranslated regions and complex loci by combination of strand-specific direct RNA sequencing, RNA-seq and ESTs

    Get PDF
    The reference annotations made for a genome sequence provide the framework for all subsequent analyses of the genome. Correct annotation is particularly important when interpreting the results of RNA-seq experiments where short sequence reads are mapped against the genome and assigned to genes according to the annotation. Inconsistencies in annotations between the reference and the experimental system can lead to incorrect interpretation of the effect on RNA expression of an experimental treatment or mutation in the system under study. Until recently, the genome-wide annotation of 3-prime untranslated regions received less attention than coding regions and the delineation of intron/exon boundaries. In this paper, data produced for samples in Human, Chicken and A. thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing technology from Helicos Biosciences which locates 3-prime polyadenylation sites to within +/- 2 nt, were combined with archival EST and RNA-Seq data. Nine examples are illustrated where this combination of data allowed: (1) gene and 3-prime UTR re-annotation (including extension of one 3-prime UTR by 5.9 kb); (2) disentangling of gene expression in complex regions; (3) clearer interpretation of small RNA expression and (4) identification of novel genes. While the specific examples displayed here may become obsolete as genome sequences and their annotations are refined, the principles laid out in this paper will be of general use both to those annotating genomes and those seeking to interpret existing publically available annotations in the context of their own experimental dataComment: 44 pages, 9 figure

    Using surveys of Affymetrix GeneChips to study antisense expression.

    Get PDF
    We have used large surveys of Affymetrix GeneChip data in the public domain to conduct a study of antisense expression across diverse conditions. We derive correlations between groups of probes which map uniquely to the same exon in the antisense direction. When there are no probes assigned to an exon in the sense direction we find that many of the antisense groups fail to detect a coherent block of transcription. We find that only a minority of these groups contain coherent blocks of antisense expression suggesting transcription. We also derive correlations between groups of probes which map uniquely to the same exon in both sense and antisense direction. In some of these cases the locations of sense probes overlap with the antisense probes, and the sense and antisense probe intensities are correlated with each other. This configuration suggests the existence of a Natural Antisense Transcript (NAT) pair. We find the majority of such NAT pairs detected by GeneChips are formed by a transcript of an established gene and either an EST or an mRNA. In order to determine the exact antisense regulatory mechanism indicated by the correlation of sense probes with antisense probes, a further investigation is necessary for every particular case of interest. However, the analysis of microarray data has proved to be a good method to reconfirm known NATs, discover new ones, as well as to notice possible problems in the annotation of antisense transcripts

    RNA-seq transcriptional profiling of peripheral blood leukocytes from cattle infected with Mycobacterium bovis

    Get PDF
    Bovine tuberculosis, caused by infection with Mycobacterium bovis, is a major endemic disease affecting cattle populations worldwide, despite the implementation of stringent surveillance and control programs in many countries. The development of high-throughput functional genomics technologies, including gene expression microarrays and RNA-sequencing (RNA-seq), has enabled detailed analysis of the host transcriptome to M. bovis infection, particularly at the macrophage and peripheral blood level. In the present study, we have analyzed the peripheral blood leukocyte (PBL) transcriptome of eight natural M. bovis-infected and eight age- and sex-matched non-infected control Holstein-Friesian animals using RNA-seq. In addition, we compared gene expression profiles generated using RNA-seq with those previously generated using the high-density Affymetrix(®) GeneChip(®) Bovine Genome Array platform from the same PBL-extracted RNA. A total of 3,250 differentially expressed (DE) annotated genes were detected in the M. bovis-infected samples relative to the controls (adjusted P-value ≤0.05), with the number of genes displaying decreased relative expression (1,671) exceeding those with increased relative expression (1,579). Ingenuity(®) Systems Pathway Analysis (IPA) of all DE genes revealed enrichment for genes with immune function. Notably, transcriptional suppression was observed among several of the top-ranking canonical pathways including Leukocyte Extravasation Signaling. Comparative platform analysis demonstrated that RNA-seq detected a larger number of annotated DE genes (3,250) relative to the microarray (1,398), of which 917 genes were common to both technologies and displayed the same direction of expression. Finally, we show that RNA-seq had an increased dynamic range compared to the microarray for estimating differential gene expression

    MODBASE, a database of annotated comparative protein structure models and associated resources.

    Get PDF
    MODBASE (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by MODPIPE, an automated modeling pipeline that relies primarily on MODELLER for fold assignment, sequence-structure alignment, model building and model assessment (http:/salilab.org/modeller). MODBASE currently contains 5,152,695 reliable models for domains in 1,593,209 unique protein sequences; only models based on statistically significant alignments and/or models assessed to have the correct fold are included. MODBASE also allows users to calculate comparative models on demand, through an interface to the MODWEB modeling server (http://salilab.org/modweb). Other resources integrated with MODBASE include databases of multiple protein structure alignments (DBAli), structurally defined ligand binding sites (LIGBASE), predicted ligand binding sites (AnnoLyze), structurally defined binary domain interfaces (PIBASE) and annotated single nucleotide polymorphisms and somatic mutations found in human proteins (LS-SNP, LS-Mut). MODBASE models are also available through the Protein Model Portal (http://www.proteinmodelportal.org/)

    Enhanced gene trapping in mouse embryonic stem cells

    Get PDF
    Gene trapping is used to introduce insertional mutations into genes of mouse embryonic stem cells (ESCs). It is performed with gene trap vectors that simultaneously mutate and report the expression of the endogenous gene at the site of insertion and provide a DNA tag for rapid identification of the disrupted gene. Gene traps have been employed worldwide to assemble libraries of mouse ESC lines harboring mutations in single genes, which can be used to make mutant mice. However, most of the employed gene trap vectors require gene expression for reporting a gene trap event and therefore genes that are poorly expressed may be under-represented in the existing libraries. To address this problem, we have developed a novel class of gene trap vectors that can induce gene expression at insertion sites, thereby bypassing the problem of intrinsic poor expression. We show here that the insertion of the osteopontin enhancer into several conventional gene trap vectors significantly increases the gene trapping efficiency in high-throughput screens and facilitates the recovery of poorly expressed genes

    WormBase 2007

    Get PDF
    WormBase (www.wormbase.org) is the major publicly available database of information about Caenorhabditis elegans, an important system for basic biological and biomedical research. Derived from the initial ACeDB database of C. elegans genetic and sequence information, WormBase now includes the genomic, anatomical and functional information about C. elegans, other Caenorhabditis species and other nematodes. As such, it is a crucial resource not only for C. elegans biologists but the larger biomedical and bioinformatics communities. Coverage of core areas of C. elegans biology will allow the biomedical community to make full use of the results of intensive molecular genetic analysis and functional genomic studies of this organism. Improved search and display tools, wider cross-species comparisons and extended ontologies are some of the features that will help scientists extend their research and take advantage of other nematode species genome sequences
    corecore