39 research outputs found

    SeqOthello: Query over RNA-seq experiments at scale

    No full text
    <p><strong>SeqOthello</strong> is an ultra-fast and memory-efficient indexing structure to support arbitrary sequence query against large collections of RNA-seq experiments. Taking a sequence as query input, SeqOthello returns either the total <em>k</em>-mer hits of the query sequence or the detailed presence/absence information of individual k-mers across all the indexed experiments.</p

    The number of splice junctions and genes specific to each tissue.

    No full text
    <p>Maximum values of the y-axis are set to the number of junctions/genes found in testes (9,234 splice junctions and 1,028 genes).</p

    Dot plots depicting sequence comparisons between the genomic regions of the four unannotated equine transcripts and the corresponding regions in the human, canine, and bovine genomes (A).

    No full text
    <p>The genomic intervals of the unannotated transcripts are highlighted in yellow and the nearest conserved Ensembl protein-coding gene prediction in the flanking regions are highlighted in blue. Dot plots depicting sequence comparisons between the specific interval of the unannotated equine transcripts (yellow segment in panel A) and the corresponding regions in the human, canine, and bovine genomes (B).</p

    Analysis of Unannotated Equine Transcripts Identified by mRNA Sequencing

    Get PDF
    <div><p>Sequencing of equine mRNA (RNA-seq) identified 428 putative transcripts which do not map to any previously annotated or predicted horse genes. Most of these encode the equine homologs of known protein-coding genes described in other species, yet the potential exists to identify novel and perhaps equine-specific gene structures. A set of 36 transcripts were prioritized for further study by filtering for levels of expression (depth of RNA-seq read coverage), distance from annotated features in the equine genome, the number of putative exons, and patterns of gene expression between tissues. From these, four were selected for further investigation based on predicted open reading frames of greater than or equal to 50 amino acids and lack of detectable homology to known genes across species. Sanger sequencing of RT-PCR amplicons from additional equine samples confirmed expression and structural annotation of each transcript. Functional predictions were made by conserved domain searches. A single transcript, expressed in the cerebellum, contains a putative kruppel-associated box (KRAB) domain, suggesting a potential function associated with zinc finger proteins and transcriptional regulation. Overall levels of conserved synteny and sequence conservation across a 1MB region surrounding each transcript were approximately 73% compared to the human, canine, and bovine genomes; however, the four loci display some areas of low conservation and sequence inversion in regions that immediately flank these previously unannotated equine transcripts. Taken together, the evidence suggests that these four transcripts are likely to be equine-specific.</p></div

    Text mining validation.

    No full text
    <p>Text mining results indicating the relationship between genes containing the tissue-specific RNA-seq splice junctions in each tissue (x-axis) and top tissue concepts (y-axis).</p

    Plots for filtering splice junctions.

    No full text
    <p>(A) An example, from heart, plotting entropy versus average mismatches for annotated and novel splice junctions. Selected thresholds (0.75 entropy and 1.5 or 1 average mismatches, paired and single-end, respectively) are indicated by the dark lines and the lower right quadrants retained for further splice junction analyses. (B) Plots are for annotated only and novel splice junctions across all tissues for both paired-end and single end data. A vertical dark line indicates the applied threshold of 50 nucleotides. The main graphs are zoomed in to <1000bp intron sizes, while the sub-graphs show all natural log scaled intron sizes.</p

    Example of an unannotated equine transcript.

    No full text
    <p>The upper panel shows approximately 3KB of ECA14 containing a single unannotated transcript (A). The black peaks represent depth of coverage by the RNA-seq reads and the red lines represent putative splice junctions identified by MapSplice <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0070125#pone.0070125-Wang1" target="_blank">[20]</a>. The gene model immediately below is the annotation for this transcript derived from the RNA-seq data. The lower panel shows a 700 KB region of ECA14 surrounding the transcript (dotted box outline) illustrating that there is no annotated gene or <i>in silico</i> gene prediction overlapping this genomic interval (B). The nearest RNA-seq data not included in the transcript model is approximately 60 KB away and the nearest gene prediction is nearly 120 KB away.</p

    Analysis of homology by discontiguous megaBLAST.

    No full text
    <p>One hundred and ninety-seven (46%) of the unannotated equine transcripts aligned to sequences annotated as genes in other species, 55 (13%) aligned to unannotated sequences or below significance threshold, and the remaining 176 (41%) generated no alignments at all.</p

    Gene and splice junction distribution.

    No full text
    <p>The percentage of genes or splice junctions found in the indicated number of tissues over a set threshold of reads from paired-end (2x50bp) and single end (75bp) reads. In parenthesis is the total number of genes or splice junctions found over a set threshold of reads.</p
    corecore