30 research outputs found

    Fine Expression Profiling of Full-length Transcripts using a Size-unbiased cDNA Library Prepared with the Vector-capping Method

    Get PDF
    Recently, we have developed a vector-capping method for constructing a full-length cDNA library. In the present study, we performed in-depth analysis of the vector-capped cDNA library prepared from a single type of cell. As a result of single-pass sequencing analysis of 24 000 clones randomly isolated from the unamplified library, we identified 19 951 full-length cDNA clones whose intactness was confirmed by the presence of an additional G at their 5' end. The full-length cDNA content was >95%. Mapping these sequences to the human genome, we identified 4513 transcriptional units that include 36 antisense transcripts against known genes. Comparison of the frequencies of abundant clones showed that the expression profiles of different libraries, including the distribution of transcriptional start sites (TSSs), were reproducible. The analysis of long-sized cDNAs showed that this library contained many cDNAs with a long-sized insert up to 11 199 bp of golgin B, including multiple slicing variants for filamin A and filamin B. These results suggest that the size-unbiased full-length cDNA library constructed using the vector-capping method will be an ideal resource for fine expression profiling of transcriptional variants with alternative TSSs and alternative splicing

    The UCSC Genome Browser Database: update 2009

    Get PDF
    The UCSC Genome Browser Database (GBD, http://genome.ucsc.edu) is a publicly available collection of genome assembly sequence data and integrated annotations for a large number of organisms, including extensive comparative-genomic resources. In the past year, 13 new genome assemblies have been added, including two important primate species, orangutan and marmoset, bringing the total to 46 assemblies for 24 different vertebrates and 39 assemblies for 22 different invertebrate animals. The GBD datasets may be viewed graphically with the UCSC Genome Browser, which uses a coordinate-based display system allowing users to juxtapose a wide variety of data. These data include all mRNAs from GenBank mapped to all organisms, RefSeq alignments, gene predictions, regulatory elements, gene expression data, repeats, SNPs and other variation data, as well as pairwise and multiple-genome alignments. A variety of other bioinformatics tools are also provided, including BLAT, the Table Browser, the Gene Sorter, the Proteome Browser, VisiGene and Genome Graphs

    The UCSC Genome Browser database: update 2010

    Get PDF
    The University of California, Santa Cruz (UCSC) Genome Browser website (http://genome.ucsc.edu/) provides a large database of publicly available sequence and annotation data along with an integrated tool set for examining and comparing the genomes of organisms, aligning sequence to genomes, and displaying and sharing users’ own annotation data. As of September 2009, genomic sequence and a basic set of annotation ‘tracks’ are provided for 47 organisms, including 14 mammals, 10 non-mammal vertebrates, 3 invertebrate deuterostomes, 13 insects, 6 worms and a yeast. New data highlights this year include an updated human genome browser, a 44-species multiple sequence alignment track, improved variation and phenotype tracks and 16 new genome-wide ENCODE tracks. New features include drag-and-zoom navigation, a Wiki track for user-added annotations, new custom track formats for large datasets (bigBed and bigWig), a new multiple alignment output tool, links to variation and protein structure tools, in silico PCR utility enhancements, and improved track configuration tools

    Using ESTs to improve the accuracy of de novo gene prediction

    Get PDF
    BACKGROUND: ESTs are a tremendous resource for determining the exon-intron structures of genes, but even extensive EST sequencing tends to leave many exons and genes untouched. Gene prediction systems based exclusively on EST alignments miss these exons and genes, leading to poor sensitivity. De novo gene prediction systems, which ignore ESTs in favor of genomic sequence, can predict such "untouched" exons, but they are less accurate when predicting exons to which ESTs align. TWINSCAN is the most accurate de novo gene finder available for nematodes and N-SCAN is the most accurate for mammals, as measured by exact CDS gene prediction and exact exon prediction. RESULTS: TWINSCAN_EST is a new system that successfully combines EST alignments with TWINSCAN. On the whole C. elegans genome TWINSCAN_EST shows 14% improvement in sensitivity and 13% in specificity in predicting exact gene structures compared to TWINSCAN without EST alignments. Not only are the structures revealed by EST alignments predicted correctly, but these also constrain the predictions without alignments, improving their accuracy. For the human genome, we used the same approach with N-SCAN, creating N-SCAN_EST. On the whole genome, N-SCAN_EST produced a 6% improvement in sensitivity and 1% in specificity of exact gene structure predictions compared to N-SCAN. CONCLUSION: TWINSCAN_EST and N-SCAN_EST are more accurate than TWINSCAN and N-SCAN, while retaining their ability to discover novel genes to which no ESTs align. Thus, we recommend using the EST versions of these programs to annotate any genome for which EST information is available. TWINSCAN_EST and N-SCAN_EST are part of the TWINSCAN open source software package

    Estimation of alternative splicing isoform frequencies from RNA-Seq data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Massively parallel whole transcriptome sequencing, commonly referred as RNA-Seq, is quickly becoming the technology of choice for gene expression profiling. However, due to the short read length delivered by current sequencing technologies, estimation of expression levels for alternative splicing gene isoforms remains challenging.</p> <p>Results</p> <p>In this paper we present a novel expectation-maximization algorithm for inference of isoform- and gene-specific expression levels from RNA-Seq data. Our algorithm, referred to as IsoEM, is based on disambiguating information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information when available. The open source Java implementation of IsoEM is freely available at <url>http://dna.engr.uconn.edu/software/IsoEM/</url>.</p> <p>Conclusions</p> <p>Empirical experiments on both synthetic and real RNA-Seq datasets show that IsoEM has scalable running time and outperforms existing methods of isoform and gene expression level estimation. Simulation experiments confirm previous findings that, for a fixed sequencing cost, using reads longer than 25-36 bases does not necessarily lead to better accuracy for estimating expression levels of annotated isoforms and genes.</p

    The UCSC Genome Browser Database: 2008 update

    Get PDF
    The University of California, Santa Cruz, Genome Browser Database (GBD) provides integrated sequence and annotation data for a large collection of vertebrate and model organism genomes. Seventeen new assemblies have been added to the database in the past year, for a total coverage of 19 vertebrate and 21 invertebrate species as of September 2007. For each assembly, the GBD contains a collection of annotation data aligned to the genomic sequence. Highlights of this year's additions include a 28-species human-based vertebrate conservation annotation, an enhanced UCSC Genes set, and more human variation, MGC, and ENCODE data. The database is optimized for fast interactive performance with a set of web-based tools that may be used to view, manipulate, filter and download the annotation data. New toolset features include the Genome Graphs tool for displaying genome-wide data sets, session saving and sharing, better custom track management, expanded Genome Browser configuration options and a Genome Browser wiki site. The downloadable GBD data, the companion Genome Browser toolset and links to documentation and related information can be found at: http://genome.ucsc.edu/

    The unfolded protein response governs integrity of the haematopoietic stem-cell pool during stress.

    Get PDF
    The blood system is sustained by a pool of haematopoietic stem cells (HSCs) that are long-lived due to their capacity for self-renewal. A consequence of longevity is exposure to stress stimuli including reactive oxygen species (ROS), nutrient fluctuation and DNA damage. Damage that occurs within stressed HSCs must be tightly controlled to prevent either loss of function or the clonal persistence of oncogenic mutations that increase the risk of leukaemogenesis. Despite the importance of maintaining cell integrity throughout life, how the HSC pool achieves this and how individual HSCs respond to stress remain poorly understood. Many sources of stress cause misfolded protein accumulation in the endoplasmic reticulum (ER), and subsequent activation of the unfolded protein response (UPR) enables the cell to either resolve stress or initiate apoptosis. Here we show that human HSCs are predisposed to apoptosis through strong activation of the PERK branch of the UPR after ER stress, whereas closely related progenitors exhibit an adaptive response leading to their survival. Enhanced ER protein folding by overexpression of the co-chaperone ERDJ4 (also called DNAJB9) increases HSC repopulation capacity in xenograft assays, linking the UPR to HSC function. Because the UPR is a focal point where different sources of stress converge, our study provides a framework for understanding how stress signalling is coordinated within tissue hierarchies and integrated with stemness. Broadly, these findings reveal that the HSC pool maintains clonal integrity by clearance of individual HSCs after stress to prevent propagation of damaged stem cells

    Using multiple alignments to improve gene prediction

    No full text
    Abstract. The multiple species de novo gene prediction problem can be stated as follows: given an alignment of genomic sequences from two or more organisms, predict the location and structure of all protein-coding genes in one or more of the sequences. Here, we present a new system, N-SCAN (a.k.a. TWINSCAN 3.0), for addressing this problem. N-SCAN has the ability to model dependencies between the aligned sequences, context-dependent substitution rates, and insertions and deletions in the sequences. An implementation of N-SCAN was created and used to generate predictions for the entire human genome. An analysis of the predictions reveals that N-SCAN’s predictive accuracy in human exceeds that of all previously published whole-genome de novo gene predictors. In addition, predictions were generated for the genome of the fruit fly Drosophila melanogaster to demonstrate the applicability of N-SCAN to invertebrate gene prediction.

    A large family of ancient repeat elements in the human genome is under strong selection

    No full text
    Although conserved noncoding elements (CNEs) constitute the majority of sequences under purifying selection in the human genome, they remain poorly understood. CNEs seem to be largely unique, with no large families of similar elements reported to date. Here, we search for CNEs among the ancestral repeat classes in the human genome and report the discovery of a large CNE family containing >900 members. This family belongs to the MER121 class of repeats. Although the MER121 family members show considerable sequence variation among one another, the individual copies show striking conservation in orthologous locations across the human, dog, mouse, and rat genomes. The element is also present and conserved in orthologous locations in the marsupial, but its genome-wide dispersal postdates the divergence from birds. The comparative genomic data indicate that MER121 does not encode a family of either protein-coding or RNA genes. Although the precise function of these elements remains unknown, the evidence suggests that this unusual family may play a cis-regulatory or structural role in mammalian genomes
    corecore