773 research outputs found

    Manipulation of FASTQ data with Galaxy

    Get PDF
    Summary: Here, we describe a tool suite that functions on all of the commonly known FASTQ format variants and provides a pipeline for manipulating next generation sequencing data taken from a sequencing machine all the way through the quality filtering steps

    Identification and validation of microsatellite markers in strawberry tree (Arbutusunedo L.)

    Get PDF
    Strawberry tree (Arbutus unedo L.), an evergreen shrub/small tree of the family Ericaceae, is a main constituent of the Mediterranean basin flora; although it is also found in southwestern Prance, Macaronesia, and Ireland. The small fruits are edible but mostly used for preparation of preserves and jams, and for liquors such as the Portuguese traditional "aguardente de medronho". Traditionally cultivated by small farmers, often in consociation with Quercus sp., strawberry tree is presently emerging as a new important fruit crop cultivated in large orchards by modern export-oriented enterprises. This change of paradigm requires a growing role of plant breeding, upstream of the production process. Genomic tools for this species are mostly limited to the chloroplast genome sequence and to genomic data described in this work. In order to identify strawberry tree microsatellite (SSR) loci we performed partial genome next-generation sequencing using the Ion Torrent technology. The sequenced similar to 24.6M nucleotides resulted in the identification of 1185 microsatellite markers mostly constituted by dinucleotide motifs. The relative amount of microsatellite dinucleotide motifs (AG/CT - 71.7%, AC/GT - 20.5%, AT/AT - 2.9%, and CG/CG - 0.3%) is similar to the one observed in other Ericaceae species. Among a tested sample of 40 SSR primer pairs, 20 amplified well-defined PCR products, 12 (30%) were validated as polymorphic. Used in our collaborative project for molecular identification of selected and improved clones, the identified SSR loci constitute a strong tool for a large panoply of applied and fundamental studies of this emerging fruit crop.Pluriannual Funding Program of the Portuguese National Foundation for Science and Technologyinfo:eu-repo/semantics/publishedVersio

    A case study for cloud based high throughput analysis of NGS data using the globus genomics system

    Get PDF
    AbstractNext generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the “Globus Genomics” system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-endNGS analysis requirements. The Globus Genomics system is built on Amazon's cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research

    GTO : A toolkit to unify pipelines in genomic and proteomic research

    Get PDF
    Next-generation sequencing triggered the production of a massive volume of publicly available data and the development of new specialised tools. These tools are dispersed over different frameworks, making the management and analyses of the data a challenging task. Additionally, new targeted tools are needed, given the dynamics and specificities of the field. We present GTO, a comprehensive toolkit designed to unify pipelines in genomic and proteomic research, which combines specialised tools for analysis, simulation, compression, development, visualisation, and transformation of the data. This toolkit combines novel tools with a modular architecture, being an excellent platform for experimental scientists, as well as a useful resource for teaching bioinformatics enquiry to students in life sciences. GTO is implemented in C language and is available, under the MIT license, at https://bioinformatics.ua.pt/gto. (C) 2020 The Authors. Published by Elsevier B.V.Peer reviewe

    SMART-RDA: A Galaxy Workflow for RNA-Seq Data Analysis

    Get PDF
    RNA-seq using the Next Generation Sequencing (NGS) approach is a common technology to analyze large-scale RNA transcript data for gene expression studies. However, an appropriate bioinformatics tool is needed to analyze a large amount of transcriptomes data from RNA-seq experiment. The aim of this study was to construct a system that can be easily applied to analyze RNA-seq data. RNA-seq analysis tool as SMART-RDA was constructed in this study. It is a computational workflow based on Galaxy framework to be used for analyzing RNA-seq raw data into gene expression information. This workflow was adapted from a well-known Tuxedo Protocol for RNA-seq analysis with some modifications. Expression value from each transcriptome was quantitatively stated as Fragments Per Kilobase of exon per Million fragments (FPKM). RNA-seq data of sterile and fertile oil palm (Pisifera) pollens derived from Sequence Read Archive (SRA) NCBI were used to test this workflow in local facility Galaxy server. The results showed that differentially gene expression in pollens might be responsible for sterile and fertile characteristics in palm oil Pisifera.Keywords: FPKM; Galaxy workflow; Gene expression; RNA sequencing

    Introduction to Galaxy Platform for NGS Variant Calling Pipeline

    Get PDF
    Background: Galaxy web-based platform for Next Generation Sequence (NGS) data analysis provides unprecedented opportunities to characterize, analyze and computationally visualize genomic landscapes with limited-resources. An initiative was taken to explore this pipeline for NGS data-analysis by using Galaxy platform, for its relative accessibility, reproducibility, transparency and scalability.  Methods: Variant calling and associated workflows were executed on NGS pooled-seq data of 12 Pakistani Teddy goats. Different tools used in this pipeline are FastQC for quality checks, Trimmomatic for trimming data, SAM/BAM tools for conversion of file formats, Picard tools for marking deduplicates, VCFtools/FreeBayes for genomic variant detection and SnpSift to annotate the variants.Results: Highly associated functionally untrivial 43,712 loci were percolated having 87,510 alleles. Besides, 1,548 variants with 1,134 SNPs, 23 mixed variants, 76 MNP, 183 insertions and 132 deletions were observed in Teddy breed using San Clement ARS1 reference genome. Furthermore, 1,283 homozygous and 265 heterozygous variant were also divulged out of 43,447 loci. These variants are likely to be liable for general phenotypic traits of Teddy with smaller body-size, tender meat quality and agility along with other breed specific traits. Conclusion: Galaxy fulfills the core function of reproducibility and easy accessibility by removing the gaps between large data analysis and its interpretations. This variant calling pipeline reveals the genomic differences of Teddy specific characteristics as compare to ARS1 reference genome.Keywords: Galaxy platform; NGS data; Teddy goat; Variant calling; Bioinformatic

    Targeted parallel sequencing of large genetically-defined genomic regions for identifying mutations in Arabidopsis

    Get PDF
    Large-scale genetic screens in Arabidopsis are a powerful approach for molecular dissection of complex signaling networks. However, map-based cloning can be time-consuming or even hampered due to low chromosomal recombination. Current strategies using next generation sequencing for molecular identification of mutations require whole genome sequencing and advanced computational devises and skills, which are not readily accessible or affordable to every laboratory. We have developed a streamlined method using parallel massive sequencing for mutant identification in which only targeted regions are sequenced. This targeted parallel sequencing (TPSeq) method is more cost-effective, straightforward enough to be easily done without specialized bioinformatics expertise, and reliable for identifying multiple mutations simultaneously. Here, we demonstrate its use by identifying three novel nitrate-signaling mutants in Arabidopsis

    TRAPLINE: a standardized and automated pipeline for RNA sequencing data analysis, evaluation and annotation

    Get PDF
    Background: Technical advances in Next Generation Sequencing (NGS) provide a means to acquire deeper insights into cellular functions. The lack of standardized and automated methodologies poses a challenge for the analysis and interpretation of RNA sequencing data. We critically compare and evaluate state-of-the-art bioinformatics approaches and present a workflow that integrates the best performing data analysis, data evaluation and annotation methods in a Transparent, Reproducible and Automated PipeLINE (TRAPLINE) for RNA sequencing data processing (suitable for Illumina, SOLiD and Solexa). Results: Comparative transcriptomics analyses with TRAPLINE result in a set of differentially expressed genes, their corresponding protein-protein interactions, splice variants, promoter activity, predicted miRNA-target interactions and files for single nucleotide polymorphism (SNP) calling. The obtained results are combined into a single file for downstream analysis such as network construction. We demonstrate the value of the proposed pipeline by characterizing the transcriptome of our recently described stem cell derived antibiotic selected cardiac bodies ('aCaBs'). Conclusion: TRAPLINE supports NGS-based research by providing a workflow that requires no bioinformatics skills, decreases the processing time of the analysis and works in the cloud. The pipeline is implemented in the biomedical research platform Galaxy and is freely accessible via www.sbi.uni-rostock.de/RNAseqTRAPLINE or the specific Galaxy manual page (https://usegalaxy.org/u/mwolfien/p/trapline-manual)
    corecore