77 research outputs found

    Development and Evaluation of Quality Metrics for Bioinformatics Analysis of Viral Insertion Site Data Generated Using High Throughput Sequencing

    Get PDF
    Integration of viral vectors into a host genome is associated with insertional mutagenesis and subjects in clinical gene therapy trials must be monitored for this adverse event. Several PCR based methods such as ligase-mediated (LM) PCR, linear-amplification-mediated (LAM) PCR and non-restrictive (nr) LAM PCR were developed to identify sites of vector integration. Coupling the power of next-generation sequencing technologies with various PCR approaches will provide a comprehensive and genome-wide profiling of insertion sites and increase throughput. In this bioinformatics study, we aimed to develop and apply quality metrics to viral insertion data obtained using next-generation sequencing. We developed five simple metrics for assessing next-generation sequencing data from different PCR products and showed how the metrics can be used to objectively compare runs performed with the same methodology as well as data generated using different PCR techniques. The results will help researchers troubleshoot complex methodologies, understand the quality of sequencing data, and provide a starting point for developing standardization of vector insertion site data analysis

    Cyberinfrastructure resources enabling creation of the loblolly pine reference transcriptome

    Get PDF
    This paper was presented at XSEDE 15 conference.Today's genomics technologies generate more sequence data than ever before possible, and at substantially lower costs, serving researchers across biological disciplines in transformative ways. Building transcriptome assemblies from RNA sequencing reads is one application of next-generation sequencing (NGS) that has held a central role in biological discovery in both model and non- model organisms, with and without whole genome sequence references. A major limitation in effective building of transcriptome references is no longer the sequencing data generation itself, but the computing infrastructure and expertise needed to assemble, analyze and manage the data. Here we describe a currently available resource dedicated to achieving such goals, and its use for extensive RNA assembly of up to 1.3 billion reads representing the massive transcriptome of loblolly pine, using four major assembly software installations. The Mason cluster, an XSEDE second tier resource at Indiana University, provides the necessary fast CPU cycles, large memory, and high I/O throughput for conducting large-scale genomics research. The National Center for Genome Analysis Support, or NCGAS, provides technical support in using HPC systems, bioinformatic support for determining the appropriate method to analyze a given dataset, and practical assistance in running computations. We demonstrate that a sufficient supercomputing resource and good workflow design are elements that are essential to large eukaryotic genomics and transcriptomics projects such as the complex transcriptome of loblolly pine, gene expression data that inform annotation and functional interpretation of the largest genome sequence reference to date.This work was supported in part by USDA NIFA grant 2011- 67009-30030, PineRefSeq, led by the University of California, Davis, and NCGAS funded by NSF under award No. 1062432

    Toward production of jet fuel functionality in oilseeds: identification of FatB acyl-acyl carrier protein thioesterases and evaluation of combinatorial expression strategies in \u3ci\u3eCamelina\u3c/i\u3e seeds

    Get PDF
    Seeds of members of the genus Cuphea accumulate medium-chain fatty acids (MCFAs; 8:0–14:0). MCFA- and palmitic acid- (16:0) rich vegetable oils have received attention for jet fuel production, given their similarity in chain length to Jet A fuel hydrocarbons. Studies were conducted to test genes, including those from Cuphea, for their ability to confer jet fuel-type fatty acid accumulation in seed oil of the emerging biofuel crop Camelina sativa. Transcriptomes from Cuphea viscosissima and Cuphea pulcherrima developing seeds that accumulate \u3e90% of C8 and C10 fatty acids revealed three FatB cDNAs (CpuFatB3, CvFatB1, and CpuFatB4) expressed predominantly in seeds and structurally divergent from typical FatB thioesterases that release 16:0 from acyl carrier protein (ACP). Expression of CpuFatB3 and CvFatB1 resulted in Camelina oil with capric acid (10:0), and CpuFatB4 expression conferred myristic acid (14:0) production and increased 16:0. Co-expression of combinations of previously characterized Cuphea and California bay FatBs produced Camelina oils with mixtures of C8–C16 fatty acids, but amounts of each fatty acid were less than obtained by expression of individual FatB cDNAs. Increases in lauric acid (12:0) and 14:0, but not 10:0, in Camelina oil and at the sn-2 position of triacylglycerols resulted from inclusion of a coconut lysophosphatidic acid acyltransferase specialized for MCFAs. RNA interference (RNAi) suppression of Camelina β-ketoacyl-ACP synthase II, however, reduced 12:0 in seeds expressing a 12:0-ACP-specific FatB. Camelina lines presented here provide platforms for additional metabolic engineering targeting fatty acid synthase and specialized acyltransferases for achieving oils with high levels of jet fuel-type fatty acids

    Toward production of jet fuel functionality in oilseeds: identification of FatB acyl-acyl carrier protein thioesterases and evaluation of combinatorial expression strategies in \u3ci\u3eCamelina\u3c/i\u3e seeds

    Get PDF
    Seeds of members of the genus Cuphea accumulate medium-chain fatty acids (MCFAs; 8:0–14:0). MCFA- and palmitic acid- (16:0) rich vegetable oils have received attention for jet fuel production, given their similarity in chain length to Jet A fuel hydrocarbons. Studies were conducted to test genes, including those from Cuphea, for their ability to confer jet fuel-type fatty acid accumulation in seed oil of the emerging biofuel crop Camelina sativa. Transcriptomes from Cuphea viscosissima and Cuphea pulcherrima developing seeds that accumulate \u3e90% of C8 and C10 fatty acids revealed three FatB cDNAs (CpuFatB3, CvFatB1, and CpuFatB4) expressed predominantly in seeds and structurally divergent from typical FatB thioesterases that release 16:0 from acyl carrier protein (ACP). Expression of CpuFatB3 and CvFatB1 resulted in Camelina oil with capric acid (10:0), and CpuFatB4 expression conferred myristic acid (14:0) production and increased 16:0. Co-expression of combinations of previously characterized Cuphea and California bay FatBs produced Camelina oils with mixtures of C8–C16 fatty acids, but amounts of each fatty acid were less than obtained by expression of individual FatB cDNAs. Increases in lauric acid (12:0) and 14:0, but not 10:0, in Camelina oil and at the sn-2 position of triacylglycerols resulted from inclusion of a coconut lysophosphatidic acid acyltransferase specialized for MCFAs. RNA interference (RNAi) suppression of Camelina β-ketoacyl-ACP synthase II, however, reduced 12:0 in seeds expressing a 12:0-ACP-specific FatB. Camelina lines presented here provide platforms for additional metabolic engineering targeting fatty acid synthase and specialized acyltransferases for achieving oils with high levels of jet fuel-type fatty acids

    Toward production of jet fuel functionality in oilseeds: identification of FatB acyl-acyl carrier protein thioesterases and evaluation of combinatorial expression strategies in \u3ci\u3eCamelina\u3c/i\u3e seeds

    Get PDF
    Seeds of members of the genus Cuphea accumulate medium-chain fatty acids (MCFAs; 8:0–14:0). MCFA- and palmitic acid- (16:0) rich vegetable oils have received attention for jet fuel production, given their similarity in chain length to Jet A fuel hydrocarbons. Studies were conducted to test genes, including those from Cuphea, for their ability to confer jet fuel-type fatty acid accumulation in seed oil of the emerging biofuel crop Camelina sativa. Transcriptomes from Cuphea viscosissima and Cuphea pulcherrima developing seeds that accumulate \u3e90% of C8 and C10 fatty acids revealed three FatB cDNAs (CpuFatB3, CvFatB1, and CpuFatB4) expressed predominantly in seeds and structurally divergent from typical FatB thioesterases that release 16:0 from acyl carrier protein (ACP). Expression of CpuFatB3 and CvFatB1 resulted in Camelina oil with capric acid (10:0), and CpuFatB4 expression conferred myristic acid (14:0) production and increased 16:0. Co-expression of combinations of previously characterized Cuphea and California bay FatBs produced Camelina oils with mixtures of C8–C16 fatty acids, but amounts of each fatty acid were less than obtained by expression of individual FatB cDNAs. Increases in lauric acid (12:0) and 14:0, but not 10:0, in Camelina oil and at the sn-2 position of triacylglycerols resulted from inclusion of a coconut lysophosphatidic acid acyltransferase specialized for MCFAs. RNA interference (RNAi) suppression of Camelina β-ketoacyl-ACP synthase II, however, reduced 12:0 in seeds expressing a 12:0-ACP-specific FatB. Camelina lines presented here provide platforms for additional metabolic engineering targeting fatty acid synthase and specialized acyltransferases for achieving oils with high levels of jet fuel-type fatty acids

    A garter snake transcriptome: pyrosequencing, de novo assembly, and sex-specific differences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The reptiles, characterized by both diversity and unique evolutionary adaptations, provide a comprehensive system for comparative studies of metabolism, physiology, and development. However, molecular resources for ectothermic reptiles are severely limited, hampering our ability to study the genetic basis for many evolutionarily important traits such as metabolic plasticity, extreme longevity, limblessness, venom, and freeze tolerance. Here we use massively parallel sequencing (454 GS-FLX Titanium) to generate a transcriptome of the western terrestrial garter snake (<it>Thamnophis elegans</it>) with two goals in mind. First, we develop a molecular resource for an ectothermic reptile; and second, we use these sex-specific transcriptomes to identify differences in the presence of expressed transcripts and potential genes of evolutionary interest.</p> <p>Results</p> <p>Using sex-specific pools of RNA (one pool for females, one pool for males) representing 7 tissue types and 35 diverse individuals, we produced 1.24 million sequence reads, which averaged 366 bp in length after cleaning. Assembly of the cleaned reads from both sexes with NEWBLER and MIRA resulted in 96,379 contigs containing 87% of the cleaned reads. Over 34% of these contigs and 13% of the singletons were annotated based on homology to previously identified proteins. From these homology assignments, additional clustering, and ORF predictions, we estimate that this transcriptome contains ~13,000 unique genes that were previously identified in other species and over 66,000 transcripts from unidentified protein-coding genes. Furthermore, we use a graph-clustering method to identify contigs linked by NEWBLER-split reads that represent divergent alleles, gene duplications, and alternatively spliced transcripts. Beyond gene identification, we identified 95,295 SNPs and 31,651 INDELs. From these sex-specific transcriptomes, we identified 190 genes that were only present in the mRNA sequenced from one of the sexes (84 female-specific, 106 male-specific), and many highly variable genes of evolutionary interest.</p> <p>Conclusions</p> <p>This is the first large-scale, multi-organ transcriptome for an ectothermic reptile. This resource provides the most comprehensive set of EST sequences available for an individual ectothermic reptile species, increasing the number of snake ESTs 50-fold. We have identified genes that appear to be under evolutionary selection and those that are sex-specific. This resource will assist studies on gene expression and comparative genomics, and will facilitate the study of evolutionarily important traits at the molecular level.</p

    The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color

    Get PDF
    Background Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. Results We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation. Conclusions We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits

    Deep Sequencing of the Mexican Avocado Transcriptome, an Ancient Angiosperm with a High Content of Fatty Acids

    Get PDF
    Background: Avocado (Persea americana) is an economically important tropical fruit considered to be a good source of fatty acids. Despite its importance, the molecular and cellular characterization of biochemical and developmental processes in avocado is limited due to the lack of transcriptome and genomic information. Results: The transcriptomes of seeds, roots, stems, leaves, aerial buds and flowers were determined using different sequencing platforms. Additionally, the transcriptomes of three different stages of fruit ripening (pre-climacteric, climacteric and post-climacteric) were also analyzed. The analysis of the RNAseqatlas presented here reveals strong differences in gene expression patterns between different organs, especially between root and flower, but also reveals similarities among the gene expression patterns in other organs, such as stem, leaves and aerial buds (vegetative organs) or seed and fruit (storage organs). Important regulators, functional categories, and differentially expressed genes involved in avocado fruit ripening were identified. Additionally, to demonstrate the utility of the avocado gene expression atlas, we investigated the expression patterns of genes implicated in fatty acid metabolism and fruit ripening. Conclusions: A description of transcriptomic changes occurring during fruit ripening was obtained in Mexican avocado, contributing to a dynamic view of the expression patterns of genes involved in fatty acid biosynthesis and the fruit ripening process

    Oil Biosynthesis in a Basal Angiosperm: Transcriptome Analysis of Persea Americana Mesocarp

    Get PDF
    The mechanism by which plants synthesize and store high amounts of triacylglycerols (TAG) in tissues other than seeds is not well understood. The comprehension of controls for carbon partitioning and oil accumulation in nonseed tissues is essential to generate oil-rich biomass in perennial bioenergy crops. Persea americana (avocado), a basal angiosperm with unique features that are ancestral to most flowering plants, stores ~ 70 % TAG per dry weight in its mesocarp, a nonseed tissue. Transcriptome analyses of select pathways, from generation of pyruvate and leading up to TAG accumulation, in mesocarp tissues of avocado was conducted and compared with that of oil-rich monocot (oil palm) and dicot (rapeseed and castor) tissues to identify tissue- and species-specific regulation and biosynthesis of TAG in plants
    corecore