Search CORE

773 research outputs found

Manipulation of FASTQ data with Galaxy

Author: A. Gordon
A. Nekrutenko
Blankenberg
D. Blankenberg
G. Von Kuster
J. Taylor
N. Coraor
Publication venue: Oxford University Press
Publication date: 01/07/2010
Field of study

Summary: Here, we describe a tool suite that functions on all of the commonly known FASTQ format variants and provides a pipeline for manipulating next generation sequencing data taken from a sequencing machine all the way through the quality filtering steps

Crossref

Cold Spring Harbor Laboratory Institutional Repository

PubMed Central

Identification and validation of microsatellite markers in strawberry tree (Arbutusunedo L.)

Author: Carlier Jorge
Fazenda Pedro
Fonseca Maria
Leitão José
Pereira Ricardo
Publication venue: 'The Scientific and Technological Research Council of Turkey'
Publication date: 01/01/2019
Field of study

Strawberry tree (Arbutus unedo L.), an evergreen shrub/small tree of the family Ericaceae, is a main constituent of the Mediterranean basin flora; although it is also found in southwestern Prance, Macaronesia, and Ireland. The small fruits are edible but mostly used for preparation of preserves and jams, and for liquors such as the Portuguese traditional "aguardente de medronho". Traditionally cultivated by small farmers, often in consociation with Quercus sp., strawberry tree is presently emerging as a new important fruit crop cultivated in large orchards by modern export-oriented enterprises. This change of paradigm requires a growing role of plant breeding, upstream of the production process. Genomic tools for this species are mostly limited to the chloroplast genome sequence and to genomic data described in this work. In order to identify strawberry tree microsatellite (SSR) loci we performed partial genome next-generation sequencing using the Ion Torrent technology. The sequenced similar to 24.6M nucleotides resulted in the identification of 1185 microsatellite markers mostly constituted by dinucleotide motifs. The relative amount of microsatellite dinucleotide motifs (AG/CT - 71.7%, AC/GT - 20.5%, AT/AT - 2.9%, and CG/CG - 0.3%) is similar to the one observed in other Ericaceae species. Among a tested sample of 40 SSR primer pairs, 20 amplified well-defined PCR products, 12 (30%) were validated as polymorphic. Used in our collaborative project for molecular identification of selected and improved clones, the identified SSR loci constitute a strong tool for a large panoply of applied and fundamental studies of this emerging fruit crop.Pluriannual Funding Program of the Portuguese National Foundation for Science and Technologyinfo:eu-repo/semantics/publishedVersio

Crossref

Sapientia

A case study for cloud based high throughput analysis of NGS data using the globus genomics system

Author: Bhuvaneshwar Krithika
Dave Utpal
Foster Ian
Gauba Robinder
Gusev Yuriy
Lacinski Lukasz
Madduri Ravi
Madhavan Subha
Rodriguez Alex
Sulakhe Dinanath
Publication venue: Published by Elsevier B.V.
Publication date: 07/11/2014
Field of study

AbstractNext generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the “Globus Genomics” system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-endNGS analysis requirements. The Globus Genomics system is built on Amazon's cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research

Elsevier - Publisher Connector

Directory of Open Access Journals

PubMed Central

GTO : A toolkit to unify pipelines in genomic and proteomic research

Author: Almeida Joao R.
Fajarda Olga
Oliveira Jose L.
Pinho Armando J.
Pratas Diogo
Publication venue
Publication date: 01/01/2020
Field of study

Next-generation sequencing triggered the production of a massive volume of publicly available data and the development of new specialised tools. These tools are dispersed over different frameworks, making the management and analyses of the data a challenging task. Additionally, new targeted tools are needed, given the dynamics and specificities of the field. We present GTO, a comprehensive toolkit designed to unify pipelines in genomic and proteomic research, which combines specialised tools for analysis, simulation, compression, development, visualisation, and transformation of the data. This toolkit combines novel tools with a modular architecture, being an excellent platform for experimental scientists, as well as a useful resource for teaching bioinformatics enquiry to students in life sciences. GTO is implemented in C language and is available, under the MIT license, at https://bioinformatics.ua.pt/gto. (C) 2020 The Authors. Published by Elsevier B.V.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

SMART-RDA: A Galaxy Workflow for RNA-Seq Data Analysis

Author: Aditama Redi
Liwang Toni
Sudania Widyartini Made
Tanjung Zulfikar Achmad
Publication venue: 'Knowledge E'
Publication date: 11/07/2017
Field of study

RNA-seq using the Next Generation Sequencing (NGS) approach is a common technology to analyze large-scale RNA transcript data for gene expression studies. However, an appropriate bioinformatics tool is needed to analyze a large amount of transcriptomes data from RNA-seq experiment. The aim of this study was to construct a system that can be easily applied to analyze RNA-seq data. RNA-seq analysis tool as SMART-RDA was constructed in this study. It is a computational workflow based on Galaxy framework to be used for analyzing RNA-seq raw data into gene expression information. This workflow was adapted from a well-known Tuxedo Protocol for RNA-seq analysis with some modifications. Expression value from each transcriptome was quantitatively stated as Fragments Per Kilobase of exon per Million fragments (FPKM). RNA-seq data of sterile and fertile oil palm (Pisifera) pollens derived from Sequence Read Archive (SRA) NCBI were used to test this workflow in local facility Galaxy server. The results showed that differentially gene expression in pollens might be responsible for sterile and fertile characteristics in palm oil Pisifera.Keywords: FPKM; Galaxy workflow; Gene expression; RNA sequencing

KnE Publishing Platform

Introduction to Galaxy Platform for NGS Variant Calling Pipeline

Author: Ahmad Talha Saleem
Asif Fatima
Ejaz Aniqa
Mehmood Tania
Mohammad Alghanem Suliman
Saif Rashid
Publication venue: National Center of Excellence in Molecular Biology (CEMB)
Publication date: 01/06/2020
Field of study

Background: Galaxy web-based platform for Next Generation Sequence (NGS) data analysis provides unprecedented opportunities to characterize, analyze and computationally visualize genomic landscapes with limited-resources. An initiative was taken to explore this pipeline for NGS data-analysis by using Galaxy platform, for its relative accessibility, reproducibility, transparency and scalability. Methods: Variant calling and associated workflows were executed on NGS pooled-seq data of 12 Pakistani Teddy goats. Different tools used in this pipeline are FastQC for quality checks, Trimmomatic for trimming data, SAM/BAM tools for conversion of file formats, Picard tools for marking deduplicates, VCFtools/FreeBayes for genomic variant detection and SnpSift to annotate the variants.Results: Highly associated functionally untrivial 43,712 loci were percolated having 87,510 alleles. Besides, 1,548 variants with 1,134 SNPs, 23 mixed variants, 76 MNP, 183 insertions and 132 deletions were observed in Teddy breed using San Clement ARS1 reference genome. Furthermore, 1,283 homozygous and 265 heterozygous variant were also divulged out of 43,447 loci. These variants are likely to be liable for general phenotypic traits of Teddy with smaller body-size, tender meat quality and agility along with other breed specific traits. Conclusion: Galaxy fulfills the core function of reproducibility and easy accessibility by removing the gaps between large data analysis and its interpretations. This variant calling pipeline reveals the genomic differences of Teddy specific characteristics as compare to ARS1 reference genome.Keywords: Galaxy platform; NGS data; Teddy goat; Variant calling; Bioinformatic

Advancements in Life Sciences (E-Journal, University of the Punjab)

Targeted parallel sequencing of large genetically-defined genomic regions for identifying mutations in Arabidopsis

Author: Liu Kun-hsiang
McCormack Matthew
Sheen Jen
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Large-scale genetic screens in Arabidopsis are a powerful approach for molecular dissection of complex signaling networks. However, map-based cloning can be time-consuming or even hampered due to low chromosomal recombination. Current strategies using next generation sequencing for molecular identification of mutations require whole genome sequencing and advanced computational devises and skills, which are not readily accessible or affordable to every laboratory. We have developed a streamlined method using parallel massive sequencing for mutant identification in which only targeted regions are sequenced. This targeted parallel sequencing (TPSeq) method is more cost-effective, straightforward enough to be easily done without specialized bioinformatics expertise, and reliable for identifying multiple mutations simultaneously. Here, we demonstrate its use by identifying three novel nitrate-signaling mutants in Arabidopsis

Crossref

Harvard University - DASH

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

TRAPLINE: a standardized and automated pipeline for RNA sequencing data analysis, evaluation and annotation

Author: David Robert
Jung Julia Jeannine
Krebs Stefan
Rimmbach Christian
Schmitz Ulf
Steinhoff Gustav
Wolfien Markus
Wolkenhauer Olaf
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Background: Technical advances in Next Generation Sequencing (NGS) provide a means to acquire deeper insights into cellular functions. The lack of standardized and automated methodologies poses a challenge for the analysis and interpretation of RNA sequencing data. We critically compare and evaluate state-of-the-art bioinformatics approaches and present a workflow that integrates the best performing data analysis, data evaluation and annotation methods in a Transparent, Reproducible and Automated PipeLINE (TRAPLINE) for RNA sequencing data processing (suitable for Illumina, SOLiD and Solexa). Results: Comparative transcriptomics analyses with TRAPLINE result in a set of differentially expressed genes, their corresponding protein-protein interactions, splice variants, promoter activity, predicted miRNA-target interactions and files for single nucleotide polymorphism (SNP) calling. The obtained results are combined into a single file for downstream analysis such as network construction. We demonstrate the value of the proposed pipeline by characterizing the transcriptome of our recently described stem cell derived antibiotic selected cardiac bodies ('aCaBs'). Conclusion: TRAPLINE supports NGS-based research by providing a workflow that requires no bioinformatics skills, decreases the processing time of the analysis and works in the cloud. The pipeline is implemented in the biomedical research platform Galaxy and is freely accessible via www.sbi.uni-rostock.de/RNAseqTRAPLINE or the specific Galaxy manual page (https://usegalaxy.org/u/mwolfien/p/trapline-manual)

Springer - Publisher Connector

ResearchOnline at James Cook University

Open Access LMU

PubMed Central

Stellenbosch University SUNScholar Repository

FigShare