4 research outputs found
A USER-FRIENDLY TOOL FOR SIMPLIFIED GENOMICS DATA MINING FROM LARGE VCF FILES
Introduction: High-throughput sequencing platforms generate a massive amount of high-dimensional
genomic datasets that are available for analysis. Modern and user-friendly bioinformatics tools for analysis
and interpretation of genomics data becomes essential during the analysis of sequencing data. Variant
Call Format (VCF) is a standard format containing genomic information and variants of sequenced
samples. Existing tools for processing VCF files don’t usually have an intuitive graphical interface, but
instead have just a command-line interface that may be challenging to use for the broader biomedical
community interested in genomics data analysis. We present re-Searcher, a new bioinformatics application
with a user-friendly GUI developed to simplify genomic data mining from VCF files.
Methods: re-Searcher application was written in a Python 3. Pandas library solves the problem of analyzing
large VCF files by not loading the whole file directly into RAM, but instead pre-processing it in
chunks. Simple and intuitive GUI was built using Tkinter library.
Results: The generalized workflow of the re-Searcher consists of several steps: selecting an input file,
setting up necessary filtering parameters, data processing, and exporting a filtered output VCF file.
re-Searcher browses and opens VCF files with extensions .txt or .vcf, before performing the following
filtering and extraction options: header extraction, keyword search, sample extraction, and genotype
format conversion.
Conclusion: Exploring and analyzing VCF files generated after the bioinformatics processing of
sequencing data is one of the important steps performed by researchers during analysis and metaanalysis
of genotype/phenotype associations. We have developed and introduced an easy-to-use
bioinformatics tool, re-Searcher, with several unique features for mining big VCF files and realized with
a simple graphical user interface that makes it easily available for clinicians and researchers without
any computational skills. The software publicly available on the GitHub repository (https://github.com/
LabBandSB/re-Searcher
Universal whole-genome Oxford nanopore sequencing of SARS-CoV-2 using tiled amplicons
There is need to develop a universally applicable end-to-end viral outbreak sample handling platforms to generate real-time epidemiological information that can be interpreted and applied by public health authorities. Highly sensitive and efficient whole-genome sequencing of the SARS-CoV-2 virus is critical for understanding viral transmission dynamics. Here, we developed a comprehensive multiplexed set of primers adapted for the Oxford Nanopore Rapid Barcoding library kit that allows universal SARS-CoV-2 genome sequencing. This primer set is designed to set up any variants of the primers pool for whole-genome sequencing of SARS-CoV-2 using single- or double-tiled amplicons from 1.2 kb to 4.8 kb with the Oxford Nanopore. This multiplexed set of primers is also applicable for tasks like targeted SARS-CoV-2 genome sequencing. We here proposed an optimized protocol to synthesize cDNA using Maxima H Minus Reverse Transcriptase with a set of SARS-CoV-2 specific primers, which has high yields of cDNA template for RNA and is capable of long-length cDNA synthesis from a wide range of RNA amounts and quality. The protocol proposed allows whole-genome sequencing of the SARS-CoV-2 virus with tiled amplicons up to 4.8 kb on low-titter virus samples and even where RNA degradation has occurred. This protocol reduces the time and cost from RNA to genome sequence compared to the Midnight multiplex PCR method for SARS-CoV-2 genome sequencing using the Oxford Nanopore.Peer reviewe
RE-SEARCHER: GUI-BASED BIOINFORMATICS TOOL FOR SIMPLIFIED GENOMICS DATA MINING OF VCF FILES
Background. High-throughput sequencing platforms generate a massive amount of high-dimensional genomic datasets that are available for analysis. Modern and user-friendly bioinformatics tools for analysis and interpretation of genomics data becomes essential during the analysis of sequencing data. Different standard data types and file formats have been developed to store and analyze sequence and genomics data. Variant Call Format (VCF) is the most widespread genomics file type and standard format containing genomic information and variants of sequenced samples. Results. Existing tools for processing VCF files don’t usually have an intuitive graphical interface, but instead have just a command-line interface that may be challenging to use for the broader biomedical community interested in genomics data analysis. re-Searcher solves this problem by pre-processing VCF files by chunks to not load RAM of computer. The tool can be used as standalone user-friendly multiplatform GUI application as well as web application (https://nla-lbsb.nu.edu.kz). The software including source code as well as tested VCF files and additional information are publicly available on the GitHub repository (https://github.com/LabBandSB/re-Searcher)
Recommended from our members
WHOLE-GENOME SEQUENCING DATA OF KAZAKH INDIVIDUALS
Kazakhstan is a Central Asian crossroad of European and Asian populations situated along the way of the Great Silk Way. The territory of Kazakhstan has historically been inhabited by nomadic tribes and today is the multi-ethnic country with the dominant Kazakh ethnic group. We sequenced and analyzed the whole-genomes of five ethnic healthy Kazakh individuals with high coverage using next-generation sequencing platform. This whole-genome sequence data of healthy Kazakh individuals can be a valuable reference for biomedical studies investigating disease associations and population-wide genomic studies of ethnically diverse Central Asian region...