Search CORE

25 research outputs found

The First Kazakh Whole Genomes: The First Report of NGS Data

Author: Akilzhanova Ainur
Kairov Ulykbek
Kim Jong-Il
Molkenov Askhat
Rakhimova Saule
Rhie Arang
Seo Jeong-Sun
Zhumadilov Zhaxybay
Publication venue: University Library System, University of Pittsburgh
Publication date: 12/12/2014
Field of study

Introduction: The human genome sequence will underpin human biology and medicine in the next century, providing a single, essential reference to all genetic information. Extraordinary technological advances and decreases in the cost of DNA sequencing have made the possibility of whole genome sequencing (WGS) feasible as a highly accessible test for numerous indications. The international project “Genetic architecture of Kazakh population” is well underway to determine the complete DNA. Next generation sequencing is a powerful tool for genetic analysis, which will enable us to uncover the association of loci at specific sites in the genome associated with disease. The aim of this study was to introduce first data on WGS of 6 Kazakh individuals.Methods: This pilot study is among the first WGS performed on 6 healthy Kazakh individuals, using next generation sequencing platform HiSeq2000, Illumina by manufacturer’s protocols. All generated *.bcl files were simultaneously converted and demultiplexed using bcl2fasta application. Alignment of sequence reads performed using bwa-mem against human b19 reference genome. Sorting, removing of intermediate files, *.bam files assembling, and marking duplicates were performed using PicardTools package. GATK haplotype caller tool was used for variant calling. ClinVar, SNPedia, and Cosmic databases were processed to identify clinical genomic variants in 6 Kazakh whole genomes. Java Runtime Environment and R. Bioconductor packages were installed to perform raw data processing and run program scripts.Results: The sequence alignment and mapping procedures on reference genome hg19 of each 6 healthy Kazakh individual were completed. Between 87,308,581,400 and 107,526,741,301 total base pairs were sequenced with average coverage x29.85. Between 98.85% and 99.58% base pairs were totally mapped and on average 96.07% were properly paired. Het/Hom and Ti/Tv ratios for each whole genome ranged from 1.35 to 1.52 and from 2.07 to 2.08, respectively. We compared and analyzed each genome with on existing clinical databases ClinVar, SNPedia, Cosmic and found from 20 to 25, from 269 to 288, from 7 to 12 SNP records, respectively. The availability of a reference Kazakh genome sequences provides the basis for studying the nature of sequence variation, particularly single nucleotide polymorphisms.Conclusion: The first whole genome sequencing of Kazakhs were performed. In this pilot study, we identified SNPs associated with different conditions. Further studies of WGS on Kazakh population are needed to identify possible unique genetic variants in Kazakhs

Central Asian Journal of Global Health

Genetic Diversity of IF?, IL1?, TLR2, and TLR8 Loci in Pulmonary Tuberculosis in Kazakhstan

Author: Abilmazhinova Aliya
Akhmetova Alma
Akilzhanova Ainur
Askapuli Ayken
Kairov Ulykbek
Molkenov Askhat
Nurkina Zhannur
Rakhimova Saule
Yerezhepov Dauren
Zhabagin Axat
Publication venue: University Library System, University of Pittsburgh
Publication date: 12/12/2014
Field of study

Introduction. Tuberculosis (TB) is caused by bacterium Mycobacterium tuberculosis (MTB), and according to the WHO, up to 30% of world population is infected with latent TB. Pathogenesis of TB is multifactorial, and its development depends on environmental, social, microbial, and genetic factors of both the bacterium and the host. The number of TB cases in Kazakhstan has decreased in the past decade, but multidrug-resistant (MDR) TB cases are dramatically increasing. Polymorphisms in genes responsible for immune response have been associated with TB susceptibility. The objective of this study was to investigate the risk of developing pulmonary TB (PTB) associated with polymorphisms in several inflammatory pathway genes among Kazakhstani population.Methods. 703 participants from 3 regions of Kazakhstan were recruited for a case-control study. 251 participants had pulmonary TB (PTB), and 452 were healthy controls (HC). Males and females represented 42.39% and 57.61%, respectively. Of all participants, 67.4% were Kazakhs, 22.8% Russians, 3.4% Ukrainians, and 6.4% were of other origins. Clinical and epidemiological data were collected from medical records, interviews, and questionnaires. DNA samples were genotyped using TaqMan assay on 4 polymorphisms: IFN? (rs2430561) and IL1? (rs16944), TLR2 (rs5743708) and TLR8 (rs3764880). Statistical data was analyzed using SPSS 19.Results. Genotyping by IF?, IL1?, TLR2 showed no significant association with PTB susceptibility (p > 0.05). TLR8 genotype A/G was significantly higher in females (F/M – 41.5%/1.3%) and G/G in males (M/F – 49%/20.7%) (?2=161.43, p < 0.001). A significantly increased risk of PTB development was observed for TLR A/G with an adjusted OR of 1.48 (95%, CI: 0.96 - 2.28), and a protective feature was revealed for TLR8 G/G genotype (OR: 0.81, 95%, CI: 0.56 - 1.16, p = 0.024). Additional grouping by gender revealed that TLR8 G/G contributes as protective genotype (OR: 1.83, 95%, CI: 1.18 - 2.83, p = 0.036) in males of the control group.Conclusion. Results indicate that heterozygous genotype A/G of TLR8 increases the risk of PTB development, while G/G genotype may serve as protection mechanism. A/A genotype is strongly associated with susceptibility to PTB. To clarify the role of other polymorphisms in susceptibility to PTB in Kazakhstani population, further investigations are needed.

Central Asian Journal of Global Health

Meta-Analysis of Esophageal Cancer Transcriptomes Using Independent Component Analysis

Author: Aigul Sharip
Ainur Seisenova
Ainur Seisenova
Andrei Zinovyev
Andrei Zinovyev
Askhat Molkenov
Asset Daniyarov
Ulykbek Kairov
Publication venue: 'Frontiers Media SA'
Publication date: 01/10/2021
Field of study

Independent Component Analysis is a matrix factorization method for data dimension reduction. ICA has been widely applied for the analysis of transcriptomic data for blind separation of biological, environmental, and technical factors affecting gene expression. The study aimed to analyze the publicly available esophageal cancer data using the ICA for identification and comprehensive analysis of reproducible signaling pathways and molecular signatures involved in this cancer type. In this study, four independent esophageal cancer transcriptomic datasets from GEO databases were used. A bioinformatics tool « BiODICA—Independent Component Analysis of Big Omics Data» was applied to compute independent components (ICs). Gene Set Enrichment Analysis (GSEA) and ToppGene uncovered the most significantly enriched pathways. Construction and visualization of gene networks and graphs were performed using the Cytoscape, and HPRD database. The correlation graph between decompositions into 30 ICs was built with absolute correlation values exceeding 0.3. Clusters of components—pseudocliques were observed in the structure of the correlation graph. The top 1,000 most contributing genes of each ICs in the pseudocliques were mapped to the PPI network to construct associated signaling pathways. Some cliques were composed of densely interconnected nodes and included components common to most cancer types (such as cell cycle and extracellular matrix signals), while others were specific to EC. The results of this investigation may reveal potential biomarkers of esophageal carcinogenesis, functional subsystems dysregulated in the tumor cells, and be helpful in predicting the early development of a tumor

Directory of Open Access Journals

Draft genome sequences of two clinical Isolates of mycobacterium tuberculosis from sputum of Kazakh patients

Author: Abilmazhinova Aliya
Abilova Zhannur
Akhmetova Ainur
Akilzhanova Ainur
Askapuli Ayken
Bismilda Venera
Chingisova Leila
Kairov Ulykbek E.
Kozhamkulov Ulan
Molkenov Askhat
Rakhimova Saule
Yerezhepov Dauren
Zhabagin Maxat
Zhumadilov Zhaxybay
Publication venue: Genome Announc
Publication date: 01/01/2015
Field of study

Here, we report the draft genome sequences of two clinical isolates of Mycobacterium tuberculosis (MTB-476 and MTB-489) isolated from sputum of Kazakh patients

PubMed Central

Nazarbayev University Repository

Draft genome sequence of Lactobacillus rhamnosus CLS17

Author: Issayeva Raushan B.
Kairov Ulykbek E.
Khassenbekova Zhanagul R.
Kozhakhmetov Samat S.
Kushugulova Almagul R.
Molkenov Askhat B.
Nurgozhin Talgat S.
Saduakhasova Saule A.
Shakhabayeva Gulnara S.
Zhumadilov Zhaxybay
Publication venue: Genome Announc
Publication date: 01/01/2015
Field of study

The human gut microbiome is an organ that provides primary barrier protection against foreign agents. Most of the microorganisms are different strains of commensal bacteria that are colonized in the gut. Gut flora influence food metabolism and have an antagonistic effect on different pathogens and immunomodulatory properties (1). One of the main species of gut flora is in the genus Lactobacillus...This work was supported by grant 0113PK00783 from the Ministry of Education and Science of the Republic of Kazakhstan

PubMed Central

Nazarbayev University Repository

Meta-Analysis of Esophageal Cancer Transcriptomes Using Independent Component Analysis

Author: Daniyarov Asset
Kairov Ulykbek
Molkenov Askhat
Seisenova Ainur
Sharip Aigul
Zinovyev Andrei
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2021
Field of study

International audienceIndependent Component Analysis is a matrix factorization method for data dimension reduction. ICA has been widely applied for the analysis of transcriptomic data for blind separation of biological, environmental, and technical factors affecting gene expression. The study aimed to analyze the publicly available esophageal cancer data using the ICA for identification and comprehensive analysis of reproducible signaling pathways and molecular signatures involved in this cancer type. In this study, four independent esophageal cancer transcriptomic datasets from GEO databases were used. A bioinformatics tool « BiODICA—Independent Component Analysis of Big Omics Data» was applied to compute independent components (ICs). Gene Set Enrichment Analysis (GSEA) and ToppGene uncovered the most significantly enriched pathways. Construction and visualization of gene networks and graphs were performed using the Cytoscape, and HPRD database. The correlation graph between decompositions into 30 ICs was built with absolute correlation values exceeding 0.3. Clusters of components—pseudocliques were observed in the structure of the correlation graph. The top 1,000 most contributing genes of each ICs in the pseudocliques were mapped to the PPI network to construct associated signaling pathways. Some cliques were composed of densely interconnected nodes and included components common to most cancer types (such as cell cycle and extracellular matrix signals), while others were specific to EC. The results of this investigation may reveal potential biomarkers of esophageal carcinogenesis, functional subsystems dysregulated in the tumor cells, and be helpful in predicting the early development of a tumor

PubMed Central

HAL Descartes

HAL-MINES ParisTech

Nazarbayev University Repository

Determining the optimal number of independent components for reproducible transcriptomic data analysis

Author: Barillot Emmanuel
Cantini Laura
Czerwinska Urszula
Greco Alessandro
Kairov Ulykbek
Molkenov Askhat
Zinovyev Andrei
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

International audienceBACKGROUND: Independent Component Analysis (ICA) is a method that models gene expression data as an action of a set of statistically independent hidden factors. The output of ICA depends on a fundamental parameter: the number of components (factors) to compute. The optimal choice of this parameter, related to determining the effective data dimension, remains an open question in the application of blind source separation techniques to transcriptomic data.RESULTS: Here we address the question of optimizing the number of statistically independent components in the analysis of transcriptomic data for reproducibility of the components in multiple runs of ICA (within the same or within varying effective dimensions) and in multiple independent datasets. To this end, we introduce ranking of independent components based on their stability in multiple ICA computation runs and define a distinguished number of components (Most Stable Transcriptome Dimension, MSTD) corresponding to the point of the qualitative change of the stability profile. Based on a large body of data, we demonstrate that a sufficient number of dimensions is required for biological interpretability of the ICA decomposition and that the most stable components with ranks below MSTD have more chances to be reproduced in independent studies compared to the less stable ones. At the same time, we show that a transcriptomics dataset can be reduced to a relatively high number of dimensions without losing the interpretability of ICA, even though higher dimensions give rise to components driven by small gene sets.CONCLUSIONS: We suggest a protocol of ICA application to transcriptomics data with a possibility of prioritizing components with respect to their reproducibility that strengthens the biological interpretation. Computing too few components (much less than MSTD) is not optimal for interpretability of the results. The components ranked within MSTD range have more chances to be reproduced in independent studies

Crossref

HAL-Inserm

Directory of Open Access Journals

HAL Descartes

HAL-MINES ParisTech

Nazarbayev University Repository

Induction of Apoptosis in U937 Cells by Using a Combination of Bortezomib and Low-Intensity Ultrasound

Author: Alimbetov Dauren
Begimbetova Dinara
Feril Jr. Loreto B.
Molkenov Askhat
Ogawa Koichi
Saliev Timur
Tachibana Katsuro
Watanabe Akiko
Publication venue: Medical Science Monitor
Publication date: 22/12/2016
Field of study

Background: We scrutinized the feasibility of apoptosis induction in blood cancer cells by means of low-intensity ultrasoundand the proteasome inhibitor bortezomib (Velcade). Material/Methods: Human leukemic monocyte lymphoma U937 cells were subjected to ultrasound in the presence of bortezomib and the echo contrast agent Sonazoid. Two types of acoustic intensity (0.18 W/cm2 and 0.05 W/cm2) were used for the experiments. Treated U937 cells were analyzed for viability and levels of early and late apoptosis. In addition, scanning electron microscopy analysis of treated cells was performed. Results: The percentage of cells that underwent early apoptosis in the group treated with ultrasound and Sonazoid was 8.0±1.31% (intensity 0.18 W/cm2) and 7.0±1.69% (0.05 W/cm2). However, coupling of bortezomib and Sonazoid resulted in an increase in the percentage of cells in the early apoptosis phase, up to 32.50±3.59% (intensity 0.18 W/cm2) and 33.0±4.90% (0.05 W/cm2). The percentage of U937 cells in the late apoptosis stage was not significantly different from that in the group treated with bortezomib only. Conclusions: Our findings indicate the feasibility of apoptosis induction in blood cancer cells by using a combination of bortezomib, ultrasound contrast agents, and low-intensity ultrasound

PubMed Central

Nazarbayev University Repository

A USER-FRIENDLY TOOL FOR SIMPLIFIED GENOMICS DATA MINING FROM LARGE VCF FILES

Author: Daniyarov Asset
Kairov Ulykbek
Karabayev Daniyar
Molkenov Askhat
Seisenova Ainur
Sharip Aigul
Yerulanuly Kaiyrgali
Zhumadilov Zhaxybay
Publication venue: International conference "MODERN PERSPECTIVES FOR BIOMEDICAL SCIENCES: FROM BENCH TO BEDSIDE”; National Laboratory Astana
Publication date: 01/01/2020
Field of study

Introduction: High-throughput sequencing platforms generate a massive amount of high-dimensional genomic datasets that are available for analysis. Modern and user-friendly bioinformatics tools for analysis and interpretation of genomics data becomes essential during the analysis of sequencing data. Variant Call Format (VCF) is a standard format containing genomic information and variants of sequenced samples. Existing tools for processing VCF files don’t usually have an intuitive graphical interface, but instead have just a command-line interface that may be challenging to use for the broader biomedical community interested in genomics data analysis. We present re-Searcher, a new bioinformatics application with a user-friendly GUI developed to simplify genomic data mining from VCF files. Methods: re-Searcher application was written in a Python 3. Pandas library solves the problem of analyzing large VCF files by not loading the whole file directly into RAM, but instead pre-processing it in chunks. Simple and intuitive GUI was built using Tkinter library. Results: The generalized workflow of the re-Searcher consists of several steps: selecting an input file, setting up necessary filtering parameters, data processing, and exporting a filtered output VCF file. re-Searcher browses and opens VCF files with extensions .txt or .vcf, before performing the following filtering and extraction options: header extraction, keyword search, sample extraction, and genotype format conversion. Conclusion: Exploring and analyzing VCF files generated after the bioinformatics processing of sequencing data is one of the important steps performed by researchers during analysis and metaanalysis of genotype/phenotype associations. We have developed and introduced an easy-to-use bioinformatics tool, re-Searcher, with several unique features for mining big VCF files and realized with a simple graphical user interface that makes it easily available for clinicians and researchers without any computational skills. The software publicly available on the GitHub repository (https://github.com/ LabBandSB/re-Searcher

Nazarbayev University Repository

The First Kazakh Whole Genomes: The First Report of NGS Data

Author: Ainur Akilzhanova
Arang Rhie
Askhat Molkenov
Jeong-Sun Seo
Jong-Il Kim
Saule Rakhimova
Ulykbek Kairov
Zhaxybay Zhumadilov
Publication venue: 'University Library System, University of Pittsburgh'
Publication date: 01/12/2014
Field of study

Introduction: The human genome sequence will underpin human biology and medicine in the next century, providing a single, essential reference to all genetic information. Extraordinary technological advances and decreases in the cost of DNA sequencing have made the possibility of whole genome sequencing (WGS) feasible as a highly accessible test for numerous indications. The international project “Genetic architecture of Kazakh population” is well underway to determine the complete DNA. Next generation sequencing is a powerful tool for genetic analysis, which will enable us to uncover the association of loci at specific sites in the genome associated with disease. The aim of this study was to introduce first data on WGS of 6 Kazakh individuals. Methods: This pilot study is among the first WGS performed on 6 healthy Kazakh individuals, using next generation sequencing platform HiSeq2000, Illumina by manufacturer’s protocols. All generated *.bcl files were simultaneously converted and demultiplexed using bcl2fasta application. Alignment of sequence reads performed using bwa-mem against human b19 reference genome. Sorting, removing of intermediate files, *.bam files assembling, and marking duplicates were performed using PicardTools package. GATK haplotype caller tool was used for variant calling. ClinVar, SNPedia, and Cosmic databases were processed to identify clinical genomic variants in 6 Kazakh whole genomes. Java Runtime Environment and R. Bioconductor packages were installed to perform raw data processing and run program scripts. Results: The sequence alignment and mapping procedures on reference genome hg19 of each 6 healthy Kazakh individual were completed. Between 87,308,581,400 and 107,526,741,301 total base pairs were sequenced with average coverage x29.85. Between 98.85% and 99.58% base pairs were totally mapped and on average 96.07% were properly paired. Het/Hom and Ti/Tv ratios for each whole genome ranged from 1.35 to 1.52 and from 2.07 to 2.08, respectively. We compared and analyzed each genome with on existing clinical databases ClinVar, SNPedia, Cosmic and found from 20 to 25, from 269 to 288, from 7 to 12 SNP records, respectively. The availability of a reference Kazakh genome sequences provides the basis for studying the nature of sequence variation, particularly single nucleotide polymorphisms. Conclusion: The first whole genome sequencing of Kazakhs were performed. In this pilot study, we identified SNPs associated with different conditions. Further studies of WGS on Kazakh population are needed to identify possible unique genetic variants in Kazakhs

Directory of Open Access Journals