4 research outputs found

    Introducing deep learning -based methods into the variant calling analysis pipeline

    Get PDF
    Biological interpretation of the genetic variation enhances our understanding of normal and pathological phenotypes, and may lead to the development of new therapeutics. However, it is heavily dependent on the genomic data analysis, which might be inaccurate due to the various sequencing errors and inconsistencies caused by these errors. Modern analysis pipelines already utilize heuristic and statistical techniques, but the rate of falsely identified mutations remains high and variable, particular sequencing technology, settings and variant type. Recently, several tools based on deep neural networks have been published. The neural networks are supposed to find motifs in the data that were not previously seen. The performance of these novel tools is assessed in terms of precision and recall, as well as computational efficiency. Following the established best practices in both variant detection and benchmarking, the discussed tools demonstrate accuracy metrics and computational efficiency that spur further discussion

    Tool evaluation for the detection of variably sized indels from next generation whole genome and targeted sequencing data

    Get PDF
    Insertions and deletions (indels) in human genomes are associated with a wide range of phenotypes, including various clinical disorders. High-throughput, next generation sequencing (NGS) technologies enable the detection of short genetic variants, such as single nucleotide variants (SNVs) and indels. However, the variant calling accuracy for indels remains considerably lower than for SNVs. Here we present a comparative study of the performance of variant calling tools for indel calling, evaluated with a wide repertoire of NGS datasets. While there is no single optimal tool to suit all circumstances, our results demonstrate that the choice of variant calling tool greatly impacts the precision and recall of indel calling. Furthermore, to reliably detect indels, it is essential to choose NGS technologies that offer a long read length and high coverage coupled with specific variant calling tools.Author summaryThe development of next generation sequencing (NGS) technologies and computational algorithms enabled the large scale, simultaneous detection of a wide range of genetic variants, such as single nucleotide variants as well as insertions and deletions (indels), which may confer potential clinical significance. Recently, many studies have been conducted to evaluate variant calling tools for indel calling. However, the optimal indel size range for different variant calling tools remains unclear. A good benchmarking dataset for indel calling evaluation should contain biologically representative high-confident indels with a wide size range and preferably come from various sequencing settings. In this article, we created a semi-simulated whole genome sequencing dataset where the sequencing data were computationally generated. The indels in the semi-simulated genome were incorporated from a real human sample to represent biologically realistic indels and to avoid the inclusion of variants due to potential technical sequencing errors. Furthermore, we used three real-world NGS datasets generated by whole genome or targeted sequencing to further evaluate our candidate tools. Our results demonstrated that variant calling tools vary greatly in calling different sizes of indels. Deletion calling and insertion calling also showed differences among the tools. The sequencing settings in coverage and read length also had a great impact on indel calling. Our results suggest that the accuracy of indel calling was dependent on the combination of a variant calling tool, indel size range, and sequencing settings.</p

    Studying the genetic diversity of the varicella-zoster virus in selected regions of the Russian Federation using high-throughput sequencing

    Get PDF
    Introduction. Varicella-zoster virus (VZV), the causative agent of the disease of the same name and herpes zoster, is phylogenetically divided into 8 clades, the distribution of which is characterized by geographic reference to certain regions of the world. For most countries, VZV clades circulating in their territories have been identified, however, such information is almost unavailable for Russia. The purpose of the study is to develop an effective method for VZV typing using high-throughput sequencing technologies to identify the prevalence of various VZV clades in Moscow, Moscow Region, and Stavropol Territory. Materials and methods. To genotype VZV, it is enough to refer to 7 nucleotide positions. Their unique combinations can be used to assign the virus to one of the clades. Short sections of nucleotide sequences of open reading frames were obtained using a developed set of primers. Results. A VZV genotyping technique has been developed and optimized. Using this technique, primary data on the distribution of VZV clades in the studied regions have been obtained. Thus, it has been established that in Moscow and a number of other regions, the 1st, 3rd, and 5th clades of VZV are predominantly distributed. Conclusion. The developed technique, including a primer panel and a genotyping algorithm, allows VZV typing in a short time while reducing specimen preparation costs and simultaneously increasing the number of specimens in one sequencing cycle. The results obtained using this assay allow us to assume that in Moscow, Moscow Region, Stavropol Territory, VZV, clades 1, 3, and 5 are the most represented ones. To confirm this hypothesis, it is necessary to include a larger number of clinical specimens in subsequent studies, including from other regions of the country

    A prototype of the movie archive for research and publishing in structural biology

    Get PDF
    The aim of this work was to pilot technologies that could lead to the foundation of an archive of animations visualizing dynamics and functional changes in molecular structures. The scientific context is important as an implication of specific premises and requirements that were to be satisfied. The thesis involved the development of the web-based user interface for the display and management of videos and their further annotation with links to PDB and EMDB. This was done using technologies such the Django framework, the Popcorn.JS library and various APIs. The methods included the Agile SCRUM based iterative methodology, the user experience testing, and the version control by the means of SVN. Stepwise, the project had grown into a functional prototype. The essential details, the process and the challenges are described. The user testing revealed the usability issues and general expectations. In conclusion, this work had demonstrated the feasibility of the discussed movie archive and has the possibility of the further development
    corecore