386 research outputs found

    An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The data produced by an Illumina flow cell with all eight lanes occupied, produces well over a terabyte worth of images with gigabytes of reads following sequence alignment. The ability to translate such reads into meaningful annotation is therefore of great concern and importance. Very easily, one can get flooded with such a great volume of textual, unannotated data irrespective of read quality or size. CASAVA, a optional analysis tool for Illumina sequencing experiments, enables the ability to understand INDEL detection, SNP information, and allele calling. To not only extract from such analysis, a measure of gene expression in the form of tag-counts, but furthermore to annotate such reads is therefore of significant value.</p> <p>Findings</p> <p>We developed TASE (Tag counting and Analysis of Solexa Experiments), a rapid tag-counting and annotation software tool specifically designed for Illumina CASAVA sequencing datasets. Developed in Java and deployed using jTDS JDBC driver and a SQL Server backend, TASE provides an extremely fast means of calculating gene expression through tag-counts while annotating sequenced reads with the gene's presumed function, from any given CASAVA-build. Such a build is generated for both DNA and RNA sequencing. Analysis is broken into two distinct components: DNA sequence or read concatenation, followed by tag-counting and annotation. The end result produces output containing the homology-based functional annotation and respective gene expression measure signifying how many times sequenced reads were found within the genomic ranges of functional annotations.</p> <p>Conclusions</p> <p>TASE is a powerful tool to facilitate the process of annotating a given Illumina Solexa sequencing dataset. Our results indicate that both homology-based annotation and tag-count analysis are achieved in very efficient times, providing researchers to delve deep in a given CASAVA-build and maximize information extraction from a sequencing dataset. TASE is specially designed to translate sequence data in a CASAVA-build into functional annotations while producing corresponding gene expression measurements. Achieving such analysis is executed in an ultrafast and highly efficient manner, whether the analysis be a single-read or paired-end sequencing experiment. TASE is a user-friendly and freely available application, allowing rapid analysis and annotation of any given Illumina Solexa sequencing dataset with ease.</p

    Re-annotation of the woodland strawberry (Fragaria vesca) genome

    Get PDF
    Fragaria vesca is a low-growing, small-fruited diploid strawberry species commonly called woodland strawberry. It is native to temperate regions of Eurasia and North America and while it produces edible fruits, it is most highly useful as an experimental perennial plant system that can serve as a model for the agriculturally important Rosaceae family. A draft of the F. vesca genome sequence was published in 2011 [Nat Genet 43:223,2011]. The first generation annotation (version 1.1) were developed using GeneMark-ES+[Nuc Acids Res 33:6494,2005]which is a self-training gene prediction tool that relies primarily on the combination of ab initio predictions with mapping high confidence ESTs in addition to mapping gene deserts from transposable elements. Based on over 25 different tissue transcriptomes, we have revised the F. vesca genome annotation, thereby providing several improvements over version 1.1. The new annotation, which was achieved using Maker, describes many more predicted protein coding genes compared to the GeneMark generated annotation that is currently hosted at the Genome Database for Rosaceae (http://www.rosaceae.org/). Our new annotation also results in an increase in the overall total coding length, and the number of coding regions found. The total number of gene predictions that do not overlap with the previous annotations is 2286, most of which were found to be homologous to other plant genes. We have experimentally verified one of the new gene model predictions to validate our results. Using the RNA-Seq transcriptome sequences from 25 diverse tissue types, the re-annotation pipeline improved existing annotations by increasing the annotation accuracy based on extensive transcriptome data. It uncovered new genes, added exons to current genes, and extended or merged exons. This complete genome re-annotation will significantly benefit functional genomic studies of the strawberry and other members of the Rosaceae.https://doi.org/10.1186/s12864-015-1221-

    MAPT and PAICE: Tools for time series and single time point transcriptionist visualization and knowledge discovery

    Get PDF
    With the advent of next-generation sequencing, -omics fields such as transcriptomics have experienced increases in data throughput on the order of magnitudes. In terms of analyzing and visually representing these huge datasets, an intuitive and computationally tractable approach is to map quantified transcript expression onto biochemical pathways while employing datamining and visualization principles to accelerate knowledge discovery. We present two cross-platform tools: MAPT (Mapping and Analysis of Pathways through Time) and PAICE (Pathway Analysis and Integrated Coloring of Experiments), an easy to use analysis suite to facilitate time series and single time point transcriptomics analysis. In unison, MAPT and PAICE serve as a visual workbench for transcriptomics knowledge discovery, data-mining and functional annotation. Both PAICE and MAPT are two distinct but yet inextricably linked tools. The former is specifically designed to map EC accessions onto KEGG pathways while handling multiple gene copies, detection-call analysis, as well as UN/annotated EC accessions lacking quantifiable expression. The latter tool integrates PAICE datasets to drive visualization, annotation, and data-mining

    Population-specific gene expression in the plant pathogenic nematode Heterodera glycines exists prior to infection and during the onset of a resistant or susceptible reaction in the roots of the Glycine max genotype Peking

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A single <it>Glycine max </it>(soybean) genotype (Peking) reacts differently to two different populations of <it>Heterodera glycines </it>(soybean cyst nematode) within the first twelve hours of infection during resistant (R) and susceptible (S) reactions. This suggested that <it>H. glycines </it>has population-specific gene expression signatures. A microarray analysis of 7539 probe sets representing 7431 transcripts on the Affymetrix<sup>® </sup>soybean GeneChip<sup>® </sup>were used to identify population-specific gene expression signatures in pre-infective second stage larva (pi-L2) prior to their infection of Peking. Other analyses focused on the infective L2 at 12hours post infection (i-L2<sub>12h</sub>), and the infective sedentary stages at 3days post infection (i-L2<sub>3d</sub>) and 8days post infection (i-L2/L3<sub>8d</sub>).</p> <p>Results</p> <p>Differential expression and false discovery rate (FDR) analyses comparing populations of pi-L2 (i.e., incompatible population, NL1-RHg to compatible population, TN8) identified 71 genes that were induced in NL1-RHg as compared to TN8. These genes included putative gland protein G23G12, putative esophageal gland protein Hgg-20 and arginine kinase. The comparative analysis of pi-L2 identified 44 genes that were suppressed in NL1-RHg as compared to TN8. These genes included a different Hgg-20 gene, an EXPB1 protein and a cuticular collagen. By 12 h, there were 7 induced genes and 0 suppressed genes in NL1-RHg. By 3d, there were 9 induced and 10 suppressed genes in NL1-RHg. Substantial changes in gene expression became evident subsequently. At 8d there were 13 induced genes in NL1-RHg. This included putative gland protein G20E03, ubiquitin extension protein, putative gland protein G30C02 and β-1,4 endoglucanase. However, 1668 genes were found to be suppressed in NL1-RHg. These genes included steroid alpha reductase, serine proteinase and a collagen protein.</p> <p>Conclusion</p> <p>These analyses identify a genetic expression signature for these two populations both prior to and subsequently as they undergo an R or S reaction. The identification of genes like steroid alpha reductase and serine proteinase that are involved in feeding and nutritional uptake as being highly suppressed during the R response at 8d may indicate genes that the plant is targeting. The analyses also identified numerous putative parasitism genes that are differentially expressed. The 1668 genes that are suppressed in NL1-RHg, and hence induced in TN8 may represent genes that are important during the parasitic stages of <it>H. glycines </it>development. The potential for different arrays of putative parasitism genes to be expressed in different nematode populations may indicate how <it>H. glycines </it>evolve mechanisms to overcome resistance.</p

    Microarray Detection Call Methodology as a Means to Identify and Compare Transcripts Expressed within Syncytial Cells from Soybean (Glycine max) Roots Undergoing Resistant and Susceptible Reactions to the Soybean Cyst Nematode (Heterodera glycines)

    Get PDF
    Background. A comparative microarray investigation was done using detection call methodology (DCM) and differential expression analyses. The goal was to identify genes found in specific cell populations that were eliminated by differential expression analysis due to the nature of differential expression methods. Laser capture microdissection (LCM) was used to isolate nearly homogeneous populations of plant root cells. Results. The analyses identified the presence of 13,291 transcripts between the 4 different sample types. The transcripts filtered down into a total of 6,267 that were detected as being present in one or more sample types. A comparative analysis of DCM and differential expression methods showed a group of genes that were not differentially expressed, but were expressed at detectable amounts within specific cell types. Conclusion. The DCM has identified patterns of gene expression not shown by differential expression analyses. DCM has identified genes that are possibly cell-type specific and/or involved in important aspects of plant nematode interactions during the resistance response, revealing the uniqueness of a particular cell population at a particular point during its differentiation process

    BBGD: an online database for blueberry genomic data

    Get PDF
    BACKGROUND: Blueberry is a member of the Ericaceae family, which also includes closely related cranberry and more distantly related rhododendron, azalea, and mountain laurel. Blueberry is a major berry crop in the United States, and one that has great nutritional and economical value. Extreme low temperatures, however, reduce crop yield and cause major losses to US farmers. A better understanding of the genes and biochemical pathways that are up- or down-regulated during cold acclimation is needed to produce blueberry cultivars with enhanced cold hardiness. To that end, the blueberry genomics database (BBDG) was developed. Along with the analysis tools and web-based query interfaces, the database serves both the broader Ericaceae research community and the blueberry research community specifically by making available ESTs and gene expression data in searchable formats and in elucidating the underlying mechanisms of cold acclimation and freeze tolerance in blueberry. DESCRIPTION: BBGD is the world's first database for blueberry genomics. BBGD is both a sequence and gene expression database. It stores both EST and microarray data and allows scientists to correlate expression profiles with gene function. BBGD is a public online database. Presently, the main focus of the database is the identification of genes in blueberry that are significantly induced or suppressed after low temperature exposure. CONCLUSION: By using the database, researchers have developed EST-based markers for mapping and have identified a number of "candidate" cold tolerance genes that are highly expressed in blueberry flower buds after exposure to low temperatures

    Геолого-промышленные типы месторождений германия, методика поисков и разведки

    Get PDF
    Recent genetic studies found the A allele of the variant rs1006737 in the alpha 1C subunit of the L-type voltage-gated calcium channel (CACNA1C) gene to be overrepresented in patients suffering from bipolar disorder, schizophrenia or major depression. While the functions underlying the pathophysiology of these psychiatric disorders are yet unknown, impaired performance in verbal fluency tasks is an often replicated finding. We investigated the influence of the rs1006737 single nucleotide polymorphism (SNP) on verbal fluency and its neural correlates.Brain activation was measured with functional magnetic resonance imaging (fMRI) during a semantic verbal fluency task in 63 healthy male individuals. They additionally performed more demanding verbal fluency tasks outside the scanner. All subjects were genotyped for CACNA1C rs1006737.For the behavioral measures outside the scanner, rs1006737genotype had an effect on semantic but not on lexical verbal fluency with decreased performance in risk-allele carriers. In the fMRI experiment, while there were no differences in behavioural performance, increased activation in the left inferior frontal gyrus as well as the left precuneus was found in risk-allele carriers in the semantic verbal fluency task.The rs1006737 variant does influence language production on a semantic level in conjunction with the underlying neural systems. These findings are in line with results of studies in bipolar disorder, schizophrenia and major depression and may explain some of the cognitive and brain activation variation found in these disorders

    SMRT Sequencing of Paramecium Bursaria Chlorella Virus-1 Reveals Diverse Methylation Stability in Adenines Targeted by Restriction Modification Systems

    Get PDF
    Chloroviruses (family Phycodnaviridae) infect eukaryotic, freshwater, unicellular green algae. A unique feature of these viruses is an abundance of DNA methyltransferases, with isolates dedicating up to 4.5% of their protein coding potential to these genes. This diversity highlights just one of the long-standing values of the chlorovirus model system; where group-wide epigenomic characterization might begin to elucidate the function(s) of DNA methylation in large dsDNA viruses. We characterized DNA modifications in the prototype chlorovirus, PBCV-1, using single-molecule real time (SMRT) sequencing (aka PacBio). Results were compared to total available sites predicted in silico based on DNA sequence alone. SMRT-software detected N6-methyl-adenine (m6A) at GATC and CATG recognition sites, motifs previously shown to be targeted by PBCV-1 DNA methyltransferases M.CviAI and M. CviAII, respectively. At the same time, PacBio analyses indicated that 10.9% of the PBCV-1 genome had large interpulse duration ratio (ipdRatio) values, the primary metric for DNA modification identification. These events represent 20.6x more sites than can be accounted for by all available adenines in GATC and CATG motifs, suggesting base or backbone modifications other than methylation might be present. To define methylation stability, we cross-compared methylation status of each GATC and CATG sequence in three biological replicates and found ∼81% of sites were stably methylated, while ∼2% consistently lack methylation. The remaining 17% of sites were stochastically methylated. When methylation status was analyzed for both strands of each target, we show that palindromes existed in completely non-methylated states, fully-methylated states, or hemi-methylated states, though GATC sites more often lack methylation than CATG sequences. Given that both sequences are targeted by not just methyltransferases, but by restriction endonucleases that are together encoded by PBCV-1 as virus-originating restriction modification (RM) systems, there is strong selective pressure to modify all target sites. The finding that most instances of non-methylation are associated with hemi-methylation is congruent with observations that hemi-methylated palindromes are resistant to cleavage by restriction endonucleases. However, sites where hemi-methylation is conserved might represent a unique regulatory function for PBCV-1. This study serves as a baseline for future investigation into the epigenomics of chloroviruses and their giant virus relatives

    SGR: an online genomic resource for the woodland strawberry

    Get PDF
    Fragaria vesca, a diploid strawberry species commonly known as the alpine or woodland strawberry, is a versatile experimental plant system and an emerging model for the Rosaceae family. An ancestral F. vesca genome contributed to the genome of the octoploid dessert strawberry (F. ×ananassa), and the extant genome exhibits synteny with other commercially important members of the Rosaceae family such as apple and peach. To provide a molecular description of floral organ and fruit development at the resolution of specific tissues and cell types, RNAs from flowers and early developmental stage fruit tissues of the inbred F. vesca line YW5AF7 were extracted and the resulting cDNA libraries sequenced using an Illumina HiSeq2000. To enable easy access as well as mining of this two-dimensional (stage and tissue) transcriptome dataset, a web-based database, the Strawberry Genomic Resource (SGR), was developed. SGR is a web accessible database that contains sample description, sample statistics, gene annotation, and gene expression analysis. This information can be accessed publicly from a web-based interface at http://bioinformatics.towson.edu/strawberry/Default.aspx . The SGR website provides user friendly search and browse capabilities for all the data stored in the database. Users are able to search for genes using a gene ID or description or obtain differentially expressed genes by entering different comparison parameters. Search results can be downloaded in a tabular format compatible with Microsoft excel application. Aligned reads to individual genes and exon/intron structures are displayed using the genome browser, facilitating gene re-annotation by individual users. The SGR database was developed to facilitate dissemination and data mining of extensive floral and fruit transcriptome data in the woodland strawberry. It enables users to mine the data in different ways to study different pathways or biological processes during reproductive development.https://doi.org/10.1186/1471-2229-13-22
    corecore