25 research outputs found

    The First Kazakh Whole Genomes: The First Report of NGS Data

    Get PDF
    Introduction: The human genome sequence will underpin human biology and medicine in the next century, providing a single, essential reference to all genetic information. Extraordinary technological advances and decreases in the cost of DNA sequencing have made the possibility of whole genome sequencing (WGS) feasible as a highly accessible test for numerous indications. The international project “Genetic architecture of Kazakh population” is well underway to determine the complete DNA. Next generation sequencing is a powerful tool for genetic analysis, which will enable us to uncover the association of loci at specific sites in the genome associated with disease. The aim of this study was to introduce first data on WGS of 6 Kazakh individuals.Methods: This pilot study is among the first WGS performed on 6 healthy Kazakh individuals, using next generation sequencing platform HiSeq2000, Illumina by manufacturer’s protocols. All generated *.bcl files were simultaneously converted and demultiplexed using bcl2fasta application. Alignment of sequence reads performed using bwa-mem against human b19 reference genome. Sorting, removing of intermediate files, *.bam files assembling, and marking duplicates were performed using PicardTools package. GATK haplotype caller tool was used for variant calling. ClinVar, SNPedia, and Cosmic databases were processed to identify clinical genomic variants in 6 Kazakh whole genomes. Java Runtime Environment and R. Bioconductor packages were installed to perform raw data processing and run program scripts.Results: The sequence alignment and mapping procedures on reference genome hg19 of each 6 healthy Kazakh individual were completed. Between 87,308,581,400 and 107,526,741,301 total base pairs were sequenced with average coverage x29.85. Between 98.85% and 99.58% base pairs were totally mapped and on average 96.07% were properly paired. Het/Hom and Ti/Tv ratios for each whole genome ranged from 1.35 to 1.52 and from 2.07 to 2.08, respectively. We compared and analyzed each genome with on existing clinical databases ClinVar, SNPedia, Cosmic and found from 20 to 25, from 269 to 288, from 7 to 12 SNP records, respectively. The availability of a reference Kazakh genome sequences provides the basis for studying the nature of sequence variation, particularly single nucleotide polymorphisms.Conclusion: The first whole genome sequencing of Kazakhs were performed. In this pilot study, we identified SNPs associated with different conditions. Further studies of WGS on Kazakh population are needed to identify possible unique genetic variants in Kazakhs

    Genetic Diversity of IF?, IL1?, TLR2, and TLR8 Loci in Pulmonary Tuberculosis in Kazakhstan

    Get PDF
    Introduction. Tuberculosis (TB) is caused by bacterium Mycobacterium tuberculosis (MTB), and according to the WHO, up to 30% of world population is infected with latent TB. Pathogenesis of TB is multifactorial, and its development depends on environmental, social, microbial, and genetic factors of both the bacterium and the host. The number of TB cases in Kazakhstan has decreased in the past decade, but multidrug-resistant (MDR) TB cases are dramatically increasing. Polymorphisms in genes responsible for immune response have been associated with TB susceptibility. The objective of this study was to investigate the risk of developing pulmonary TB (PTB) associated with polymorphisms in several inflammatory pathway genes among Kazakhstani population.Methods. 703 participants from 3 regions of Kazakhstan were recruited for a case-control study. 251 participants had pulmonary TB (PTB), and 452 were healthy controls (HC). Males and females represented 42.39% and 57.61%, respectively. Of all participants, 67.4% were Kazakhs, 22.8% Russians, 3.4% Ukrainians, and 6.4% were of other origins. Clinical and epidemiological data were collected from medical records, interviews, and questionnaires. DNA samples were genotyped using TaqMan assay on 4 polymorphisms: IFN? (rs2430561) and IL1? (rs16944), TLR2 (rs5743708) and TLR8 (rs3764880). Statistical data was analyzed using SPSS 19.Results. Genotyping by IF?, IL1?, TLR2 showed no significant association with PTB susceptibility (p > 0.05). TLR8 genotype A/G was significantly higher in females (F/M – 41.5%/1.3%) and G/G in males (M/F – 49%/20.7%) (?2=161.43, p < 0.001). A significantly increased risk of PTB development was observed for TLR A/G with an adjusted OR of 1.48 (95%, CI: 0.96 - 2.28), and a protective feature was revealed for TLR8 G/G genotype (OR: 0.81, 95%, CI: 0.56 - 1.16, p = 0.024). Additional grouping by gender revealed that TLR8 G/G contributes as protective genotype (OR: 1.83, 95%, CI: 1.18 - 2.83, p = 0.036) in males of the control group.Conclusion. Results indicate that heterozygous genotype A/G of TLR8 increases the risk of PTB development, while G/G genotype may serve as protection mechanism. A/A genotype is strongly associated with susceptibility to PTB. To clarify the role of other polymorphisms in susceptibility to PTB in Kazakhstani population, further investigations are needed.

    Meta-Analysis of Esophageal Cancer Transcriptomes Using Independent Component Analysis

    Get PDF
    Independent Component Analysis is a matrix factorization method for data dimension reduction. ICA has been widely applied for the analysis of transcriptomic data for blind separation of biological, environmental, and technical factors affecting gene expression. The study aimed to analyze the publicly available esophageal cancer data using the ICA for identification and comprehensive analysis of reproducible signaling pathways and molecular signatures involved in this cancer type. In this study, four independent esophageal cancer transcriptomic datasets from GEO databases were used. A bioinformatics tool « BiODICA—Independent Component Analysis of Big Omics Data» was applied to compute independent components (ICs). Gene Set Enrichment Analysis (GSEA) and ToppGene uncovered the most significantly enriched pathways. Construction and visualization of gene networks and graphs were performed using the Cytoscape, and HPRD database. The correlation graph between decompositions into 30 ICs was built with absolute correlation values exceeding 0.3. Clusters of components—pseudocliques were observed in the structure of the correlation graph. The top 1,000 most contributing genes of each ICs in the pseudocliques were mapped to the PPI network to construct associated signaling pathways. Some cliques were composed of densely interconnected nodes and included components common to most cancer types (such as cell cycle and extracellular matrix signals), while others were specific to EC. The results of this investigation may reveal potential biomarkers of esophageal carcinogenesis, functional subsystems dysregulated in the tumor cells, and be helpful in predicting the early development of a tumor

    Draft genome sequence of Lactobacillus rhamnosus CLS17

    Get PDF
    The human gut microbiome is an organ that provides primary barrier protection against foreign agents. Most of the microorganisms are different strains of commensal bacteria that are colonized in the gut. Gut flora influence food metabolism and have an antagonistic effect on different pathogens and immunomodulatory properties (1). One of the main species of gut flora is in the genus Lactobacillus...This work was supported by grant 0113PK00783 from the Ministry of Education and Science of the Republic of Kazakhstan

    Meta-Analysis of Esophageal Cancer Transcriptomes Using Independent Component Analysis

    No full text
    International audienceIndependent Component Analysis is a matrix factorization method for data dimension reduction. ICA has been widely applied for the analysis of transcriptomic data for blind separation of biological, environmental, and technical factors affecting gene expression. The study aimed to analyze the publicly available esophageal cancer data using the ICA for identification and comprehensive analysis of reproducible signaling pathways and molecular signatures involved in this cancer type. In this study, four independent esophageal cancer transcriptomic datasets from GEO databases were used. A bioinformatics tool « BiODICA—Independent Component Analysis of Big Omics Data» was applied to compute independent components (ICs). Gene Set Enrichment Analysis (GSEA) and ToppGene uncovered the most significantly enriched pathways. Construction and visualization of gene networks and graphs were performed using the Cytoscape, and HPRD database. The correlation graph between decompositions into 30 ICs was built with absolute correlation values exceeding 0.3. Clusters of components—pseudocliques were observed in the structure of the correlation graph. The top 1,000 most contributing genes of each ICs in the pseudocliques were mapped to the PPI network to construct associated signaling pathways. Some cliques were composed of densely interconnected nodes and included components common to most cancer types (such as cell cycle and extracellular matrix signals), while others were specific to EC. The results of this investigation may reveal potential biomarkers of esophageal carcinogenesis, functional subsystems dysregulated in the tumor cells, and be helpful in predicting the early development of a tumor

    Determining the optimal number of independent components for reproducible transcriptomic data analysis

    Get PDF
    International audienceBACKGROUND: Independent Component Analysis (ICA) is a method that models gene expression data as an action of a set of statistically independent hidden factors. The output of ICA depends on a fundamental parameter: the number of components (factors) to compute. The optimal choice of this parameter, related to determining the effective data dimension, remains an open question in the application of blind source separation techniques to transcriptomic data.RESULTS: Here we address the question of optimizing the number of statistically independent components in the analysis of transcriptomic data for reproducibility of the components in multiple runs of ICA (within the same or within varying effective dimensions) and in multiple independent datasets. To this end, we introduce ranking of independent components based on their stability in multiple ICA computation runs and define a distinguished number of components (Most Stable Transcriptome Dimension, MSTD) corresponding to the point of the qualitative change of the stability profile. Based on a large body of data, we demonstrate that a sufficient number of dimensions is required for biological interpretability of the ICA decomposition and that the most stable components with ranks below MSTD have more chances to be reproduced in independent studies compared to the less stable ones. At the same time, we show that a transcriptomics dataset can be reduced to a relatively high number of dimensions without losing the interpretability of ICA, even though higher dimensions give rise to components driven by small gene sets.CONCLUSIONS: We suggest a protocol of ICA application to transcriptomics data with a possibility of prioritizing components with respect to their reproducibility that strengthens the biological interpretation. Computing too few components (much less than MSTD) is not optimal for interpretability of the results. The components ranked within MSTD range have more chances to be reproduced in independent studies

    Induction of Apoptosis in U937 Cells by Using a Combination of Bortezomib and Low-Intensity Ultrasound

    No full text
    Background: We scrutinized the feasibility of apoptosis induction in blood cancer cells by means of low-intensity ultrasoundand the proteasome inhibitor bortezomib (Velcade). Material/Methods: Human leukemic monocyte lymphoma U937 cells were subjected to ultrasound in the presence of bortezomib and the echo contrast agent Sonazoid. Two types of acoustic intensity (0.18 W/cm2 and 0.05 W/cm2) were used for the experiments. Treated U937 cells were analyzed for viability and levels of early and late apoptosis. In addition, scanning electron microscopy analysis of treated cells was performed. Results: The percentage of cells that underwent early apoptosis in the group treated with ultrasound and Sonazoid was 8.0±1.31% (intensity 0.18 W/cm2) and 7.0±1.69% (0.05 W/cm2). However, coupling of bortezomib and Sonazoid resulted in an increase in the percentage of cells in the early apoptosis phase, up to 32.50±3.59% (intensity 0.18 W/cm2) and 33.0±4.90% (0.05 W/cm2). The percentage of U937 cells in the late apoptosis stage was not significantly different from that in the group treated with bortezomib only. Conclusions: Our findings indicate the feasibility of apoptosis induction in blood cancer cells by using a combination of bortezomib, ultrasound contrast agents, and low-intensity ultrasound

    A USER-FRIENDLY TOOL FOR SIMPLIFIED GENOMICS DATA MINING FROM LARGE VCF FILES

    No full text
    Introduction: High-throughput sequencing platforms generate a massive amount of high-dimensional genomic datasets that are available for analysis. Modern and user-friendly bioinformatics tools for analysis and interpretation of genomics data becomes essential during the analysis of sequencing data. Variant Call Format (VCF) is a standard format containing genomic information and variants of sequenced samples. Existing tools for processing VCF files don’t usually have an intuitive graphical interface, but instead have just a command-line interface that may be challenging to use for the broader biomedical community interested in genomics data analysis. We present re-Searcher, a new bioinformatics application with a user-friendly GUI developed to simplify genomic data mining from VCF files. Methods: re-Searcher application was written in a Python 3. Pandas library solves the problem of analyzing large VCF files by not loading the whole file directly into RAM, but instead pre-processing it in chunks. Simple and intuitive GUI was built using Tkinter library. Results: The generalized workflow of the re-Searcher consists of several steps: selecting an input file, setting up necessary filtering parameters, data processing, and exporting a filtered output VCF file. re-Searcher browses and opens VCF files with extensions .txt or .vcf, before performing the following filtering and extraction options: header extraction, keyword search, sample extraction, and genotype format conversion. Conclusion: Exploring and analyzing VCF files generated after the bioinformatics processing of sequencing data is one of the important steps performed by researchers during analysis and metaanalysis of genotype/phenotype associations. We have developed and introduced an easy-to-use bioinformatics tool, re-Searcher, with several unique features for mining big VCF files and realized with a simple graphical user interface that makes it easily available for clinicians and researchers without any computational skills. The software publicly available on the GitHub repository (https://github.com/ LabBandSB/re-Searcher

    The First Kazakh Whole Genomes: The First Report of NGS Data

    No full text
    Introduction: The human genome sequence will underpin human biology and medicine in the next century, providing a single, essential reference to all genetic information. Extraordinary technological advances and decreases in the cost of DNA sequencing have made the possibility of whole genome sequencing (WGS) feasible as a highly accessible test for numerous indications. The international project “Genetic architecture of Kazakh population” is well underway to determine the complete DNA. Next generation sequencing is a powerful tool for genetic analysis, which will enable us to uncover the association of loci at specific sites in the genome associated with disease. The aim of this study was to introduce first data on WGS of 6 Kazakh individuals. Methods: This pilot study is among the first WGS performed on 6 healthy Kazakh individuals, using next generation sequencing platform HiSeq2000, Illumina by manufacturer’s protocols. All generated *.bcl files were simultaneously converted and demultiplexed using bcl2fasta application. Alignment of sequence reads performed using bwa-mem against human b19 reference genome. Sorting, removing of intermediate files, *.bam files assembling, and marking duplicates were performed using PicardTools package. GATK haplotype caller tool was used for variant calling. ClinVar, SNPedia, and Cosmic databases were processed to identify clinical genomic variants in 6 Kazakh whole genomes. Java Runtime Environment and R. Bioconductor packages were installed to perform raw data processing and run program scripts. Results: The sequence alignment and mapping procedures on reference genome hg19 of each 6 healthy Kazakh individual were completed. Between 87,308,581,400 and 107,526,741,301 total base pairs were sequenced with average coverage x29.85. Between 98.85% and 99.58% base pairs were totally mapped and on average 96.07% were properly paired. Het/Hom and Ti/Tv ratios for each whole genome ranged from 1.35 to 1.52 and from 2.07 to 2.08, respectively. We compared and analyzed each genome with on existing clinical databases ClinVar, SNPedia, Cosmic and found from 20 to 25, from 269 to 288, from 7 to 12 SNP records, respectively. The availability of a reference Kazakh genome sequences provides the basis for studying the nature of sequence variation, particularly single nucleotide polymorphisms. Conclusion: The first whole genome sequencing of Kazakhs were performed. In this pilot study, we identified SNPs associated with different conditions. Further studies of WGS on Kazakh population are needed to identify possible unique genetic variants in Kazakhs
    corecore