International conference "MODERN PERSPECTIVES FOR BIOMEDICAL SCIENCES: FROM BENCH TO BEDSIDE”; National Laboratory Astana
Abstract
Introduction: High-throughput sequencing platforms generate a massive amount of high-dimensional
genomic datasets that are available for analysis. Modern and user-friendly bioinformatics tools for analysis
and interpretation of genomics data becomes essential during the analysis of sequencing data. Variant
Call Format (VCF) is a standard format containing genomic information and variants of sequenced
samples. Existing tools for processing VCF files don’t usually have an intuitive graphical interface, but
instead have just a command-line interface that may be challenging to use for the broader biomedical
community interested in genomics data analysis. We present re-Searcher, a new bioinformatics application
with a user-friendly GUI developed to simplify genomic data mining from VCF files.
Methods: re-Searcher application was written in a Python 3. Pandas library solves the problem of analyzing
large VCF files by not loading the whole file directly into RAM, but instead pre-processing it in
chunks. Simple and intuitive GUI was built using Tkinter library.
Results: The generalized workflow of the re-Searcher consists of several steps: selecting an input file,
setting up necessary filtering parameters, data processing, and exporting a filtered output VCF file.
re-Searcher browses and opens VCF files with extensions .txt or .vcf, before performing the following
filtering and extraction options: header extraction, keyword search, sample extraction, and genotype
format conversion.
Conclusion: Exploring and analyzing VCF files generated after the bioinformatics processing of
sequencing data is one of the important steps performed by researchers during analysis and metaanalysis
of genotype/phenotype associations. We have developed and introduced an easy-to-use
bioinformatics tool, re-Searcher, with several unique features for mining big VCF files and realized with
a simple graphical user interface that makes it easily available for clinicians and researchers without
any computational skills. The software publicly available on the GitHub repository (https://github.com/
LabBandSB/re-Searcher