7 research outputs found
Sweet Sorghum Genotypes Testing in the High Latitude Rainfed Steppes of the Northern Kazakhstan (for Feed and Biofuel)
Twenty-eight sweet sorghum (Sorghum bicolor (L.) Moench) genotypes of the different ecological and geographic origins: Kazakhstan, Russia, India, Uzbekistan, and China were tested in the high latitude rainfed conditions of northern Kazakhstan. The genotypes demonstrated high biomass production (up to 100 t·ha-1 and more). The genotypes ripening to full reproductive seeds were selected for seed production and introduction in the northern Kazakhstan. Lactic acid bacteria Lactobacillus plantarum S-1, Streptococcus thermophilus F-1 and Lactococcus lactis F-4 essentially enhance the fermentation process, suppressing undesirable microbiological processes, reducing the loss of nutrient compounds, accelerating in 2 times maturation ensilage process and providing higher quality of the feed product
LAUNCH OF Q-SYMPHONY BIOINFORMATICS COMPUTING SYSTEM: A HIGH-PERFORMANCE CLUSTER FOR ANALYSIS OF LARGE-SCALE GENOMIC DATASETS
Introduction: One whole human genome, provided by next generation sequencing platforms, in raw
format takes 20 to 50 GB. In the course of bioinformatics analysis and data analysis, the data volume
increases to 300-500 GB per genome. with an increase in the number of samples, the occupied volume
increases. Such a large amount of data required for the analysis of whole genomes demands powerful
computing power in the form of servers and data warehouses combined into clusters. We at Laboratory
of Bioinformatics and Systems Biology have developed and launched Q-Symphony bioinformatics computing
system called (“Qazaq Symphony of Bioinformatics”) for bioinformatics analyses of solving large
scale genomic datasets.
Materials and methods: The Q-Symphony bioinformatics computing system consists 12high-performance
HPE servers: 1control node, 8 compute nodes, 1fat-memory compute node, and 2storage nodes.
The system runs on Red Hat Enterprise Linux. The management node controls access to user profiles,
data warehouse and Moab Workload Manager. The total number of processing cores is 172, the total
amount of RAM is 3072GB, and the total storage capacity is 198 TB, a peak performance of the system
of 7.3 TFlops. All nodes use high-speed Infiniband network connections, which allow the data exchange
between nodes at 100 Gbps speed. The computational capabilities of the Q-symphony system allow us
to evenly distribute resources for each task performed, monitor the load on processor and memory resources
in real time, and queue and execute sequentially large lists of tasks.
Results: Benchmark measurements performed on Q-symphony system showed an increase of subtasks
execution from 15 to 54 times compared to standard solutions built on similar computational
processors.
Conclusion: The presence of Q-Symphony, well-established and proven bioinformatics methods will
make it possible to successfully analyze large-scale human genomic data and determine structural genomic
variants and carry out complex comparative and population analysis
META-ANALYSIS OF CANCER TRANSCRIPTOMES USING INDEPENDENT COMPONENT ANALYSIS
Introduction: Independent Component Analysis (ICA) is a matrix factorization method for data dimension
reduction. ICA has been widely applied for the analysis of transcriptomic data for blind separation
of biological, environmental and technical factors affecting gene expression. This study aimed to analyze
cancer data using the ICA for identification and comprehensive analysis of reproducible signaling pathways
and molecular signatures in cancer.
Materials and Methods: In this study, four independent cancer transcriptomic datasets GSE26886,
GSE69925, GSE32701and GSE21293 (Affymetrix) from GEO databases were used. R Bioconductor and
Matlab have been used for normalization. A bioinformatics tool «BiODICA - Independent Component
Analysis of Big Omics Data» was applied to compute independent components (ICs). Gene Set Enrichment
Analysis (GSEA) and ToppGene uncovered the most significantly enriched pathways. Construction
and visualization of gene networks and graphs were performed using the OFTEN method, Cytoscape and
HPRD database.
Results: The correlation graph between decompositions into 30 ICs was built with absolute correlation
values exceeding 0.15. Clusters of components - pseudocliques were observed in the structure of the
correlation graph. Top 500 most contributing genes of each ICs in pseudocliques were mapped to the PPI
network to construct signaling pathways for gene interaction. Some cliques were composed of densely
interconnected nodes and included components common to most cancer types, while others were common
to some of them.
Conclusion: The results of this investigation may reveal potential biomarkers of carcinogenesis, functional
subsystems in the tumor cells, and helpful in predicting the early development of a tumor
IDENTIFICATION OF KAZAKH SPECIFIC GENOMIC VARIANTS USING COMPARATIVE GENOMICS ANALYSIS
Introduction: The modern development of high-performance genomic technologies opens up new
possibilities for studying the human genome. Large-scale genomic research generates huge amounts of
data, the active development of bioinformatics with the availability of modern methods and approaches
of analysis makes it possible to create detailed databases and comprehensively study genomic data. One
of contemporary task is to study and identify specific genomic variants of population by detailed analysis
of complete genome and complete exome data comparison with open large-scale population datasets.
Materials and methods: Materials of the study are 14 complete genomes and 125 complete exomes
of Kazakhstani individuals. Our dataset was replenished with data from large whole genome population
datasets (SGDP, PRJEB26349, HGDP and 1000 Genomes) for comparative population genomics and to
search and identify specific genomic variants. The data in the raw format was mapped and aligned on a
single reference genome hg19, then genomic variants were searched and an individual map of the found
variants was formed for each dataset in the VCF format. For replenished datasets formed a general map
of all variants, which were then excluded from the total number variants found for of Kazakh sampling to
search for specific genomic variants. Then the filtered variants were annotated and interpreted.
Results: For Kazakр whole exomes were found 9 heterozygous or mutant variants unique among
formed genomic databases. 7 variants located on the intron region, 1on the upstream and the last variant
frameshift deletion on exonic region.
For the Kazakh whole genomes were found 4732heterozygous or mutant variants, 517 variants presented
among all Kazakh samples and 144 variants were completely mutant. Only 8 SNVs are located at
exonic region: 4 synonymous SNV, 3 nonsynonymous SNV, and 1frameshift deletion.
Conclusion: We have discovered unique several genomic variants specific for now to the kazakh individuals.
These results can serve as a basis for the creation of a Kazakh reference genome, subsequent
research and comparative analysis of Kazakh individuals with various populations of the world.
Grant references: AP05135430; MES RK
Can conservation agriculture increase soil carbon sequestration? A modelling approach
Conservation agriculture (CA) involves complex and interactive processes that ultimately determine soil carbon (C) storage, making it difficult to identify clear patterns. To solve these problems, we used the ARMOSA process-based crop model to simulate the contribution of different CA components (minimum soil disturbance, permanent soil cover with crop residues and/or cover crops, and diversification of plant species) to soil organic carbon stock (SOC) sequestration at 0\u201330\u202fcm soil depth and to compare it with SOC evolution under conventional agricultural practices. We simulated SOC changes in three sites located in Central Asia (Almalybak, Kazakhstan), Northern Europe (Jokioinen, Finland) and Southern Europe (Lombriasco, Italy), which have contrasting soils, organic carbon contents, climates, crops and management intensity. Simulations were carried out for the current climate conditions (1998\u20132017) and future climatic scenario (period 2020\u20132040, scenario Representative Concentration Pathway RCP 6.0). Five cropping systems were simulated: conventional systems under ploughing with monoculture and residues removed (Conv\u202f 12\u202fR) or residues retained (Conv\u202f+\u202fR); no-tillage (NT); CA and CA with a cover crop, Italian ryegrass (CA\u202f+\u202fCC). In Conv\u202f 12\u202fR, Conv\u202f+\u202fR and NT, the simulated monocultures were spring barley in Almalybak and Jokioinen, and maize in Lombriasco. In all sites, conventional systems led to SOC decline of 170\u20131000\u202fkg\u202fha 121 yr 121, whereas NT can slightly increase the SOC. CA and CA\u202f+\u202fCC have the potential for a C sequestration rate of 0.4% yr 121 or higher in Almalybak and Jokioinen, and thus, the objective of the \u201c4 per 1000\u201d initiative can be achieved. Cover crops (in CA\u202f+\u202fCC) have a potential for a C sequestration rate of 0.36\u20130.5% yr 121 in Southern Finland and in Southern Kazakhstan under the current climate conditions, and their role will grow in importance in the future. Even if in Lombriasco it was not possible to meet the \u201c4 per 1000\u201d, there was a SOC increase under CA and CA\u202f+\u202fCC. In conclusion, the simultaneous adoption of all the three CA principles becomes more and more relevant in order to accomplish soil C sequestration as an urgent action to combat climate change and to ensure food security
Recommended from our members
WHOLE-GENOME SEQUENCING DATA OF KAZAKH INDIVIDUALS
Kazakhstan is a Central Asian crossroad of European and Asian populations situated along the way of the Great Silk Way. The territory of Kazakhstan has historically been inhabited by nomadic tribes and today is the multi-ethnic country with the dominant Kazakh ethnic group. We sequenced and analyzed the whole-genomes of five ethnic healthy Kazakh individuals with high coverage using next-generation sequencing platform. This whole-genome sequence data of healthy Kazakh individuals can be a valuable reference for biomedical studies investigating disease associations and population-wide genomic studies of ethnically diverse Central Asian region...