19 research outputs found
Rare Germline Variants in DNA Repair Genes Detected in BRCA-Negative Finnish Patients with Early-Onset Breast Cancer
Peer reviewe
ShAn: An easy-to-use tool for interactive and integrated variant annotation.
MOTIVATION:Annotation of large amounts of generated sequencing data is a demanding task. Most of the currently available robust annotation tools, like ANNOVAR, are command-line based tools which require a certain degree of programming skills. User-friendly tools for variant annotation of sequencing data with graphical interface are under-represented. RESULTS:We have developed an interactive application, which harnesses the easy usability of R Shiny and combines it with the versatile annotation features of ANNOVAR. This application is easy to use and gives comprehensive annotations for user supplied vcf files using multiples databases. The output table contains the list of variants and their corresponding annotation presented within the graphical interface. In addition, the annotation results are downloadable as text file
Kuura-An automated workflow for analyzing WES and WGS data.
The advent of high-throughput sequencing technologies has revolutionized the field of genomic sciences by cutting down the cost and time associated with standard sequencing methods. This advancement has not only provided the research community with an abundance of data but has also presented the challenge of analyzing it. The paramount challenge in analyzing the copious amount of data is in using the optimal resources in terms of available tools. To address this research gap, we propose "Kuura-An automated workflow for analyzing WES and WGS data", which is optimized for both whole exome and whole genome sequencing data. This workflow is based on the nextflow pipeline scripting language and uses docker to manage and deploy the workflow. The workflow consists of four analysis stages-quality control, mapping to reference genome & quality score recalibration, variant calling & variant recalibration and variant consensus & annotation. An important feature of the DNA-seq workflow is that it uses the combination of multiple variant callers (GATK Haplotypecaller, DeepVariant, VarScan2, Freebayes and Strelka2), generating a list of high-confidence variants in a consensus call file. The workflow is flexible as it integrates the fragmented tools and can be easily extended by adding or updating tools or amending the parameters list. The use of a single parameters file enhances reproducibility of the results. The ease of deployment and usage of the workflow further increases computational reproducibility providing researchers with a standardized tool for the variant calling step in different projects. The source code, instructions for installation and use of the tool are publicly available at our github repository https://github.com/dhanaprakashj/kuura_pipeline
Screenshot showing a successfully executed pipeline and the information presented while the pipeline is running.
Screenshot showing a successfully executed pipeline and the information presented while the pipeline is running.</p
Detailed installation and usage instructions.
The advent of high-throughput sequencing technologies has revolutionized the field of genomic sciences by cutting down the cost and time associated with standard sequencing methods. This advancement has not only provided the research community with an abundance of data but has also presented the challenge of analyzing it. The paramount challenge in analyzing the copious amount of data is in using the optimal resources in terms of available tools. To address this research gap, we propose “Kuura—An automated workflow for analyzing WES and WGS data”, which is optimized for both whole exome and whole genome sequencing data. This workflow is based on the nextflow pipeline scripting language and uses docker to manage and deploy the workflow. The workflow consists of four analysis stages—quality control, mapping to reference genome & quality score recalibration, variant calling & variant recalibration and variant consensus & annotation. An important feature of the DNA-seq workflow is that it uses the combination of multiple variant callers (GATK Haplotypecaller, DeepVariant, VarScan2, Freebayes and Strelka2), generating a list of high-confidence variants in a consensus call file. The workflow is flexible as it integrates the fragmented tools and can be easily extended by adding or updating tools or amending the parameters list. The use of a single parameters file enhances reproducibility of the results. The ease of deployment and usage of the workflow further increases computational reproducibility providing researchers with a standardized tool for the variant calling step in different projects. The source code, instructions for installation and use of the tool are publicly available at our github repository https://github.com/dhanaprakashj/kuura_pipeline.</div
Complete validation results.
In the revision process, the pipeline was validated on gold standard data sets HG003, HG004, HG006 and HG007, data sets generated with the same sequencing protocol in the same study as data sets HG001, HG002 and HG005. The table shows the number of variants identified by each variant caller, their precision and recall values. *The table contains only SNP information. (XLSX)</p
Validation results using each variant caller.
The table shows the number of variants identified by each variant caller, their precision and recall values. *The table contains only SNP information.</p
Summary of the tools and their respective <i>docker</i> containers used in each stage.
Summary of the tools and their respective docker containers used in each stage.</p
Summary of the steps executed by the Kuura pipeline.
Summary of the steps executed by the Kuura pipeline.</p
