125 research outputs found

    Application of MATLAB in -Omics and Systems Biology

    Get PDF
    Biological data analysis has dramatically changed since the introduction of high-throughput -omics technologies, such as microarrays and next-generation sequencing. The key advantage of obtaining thousands of measurements from a single sample soon became a bottleneck limiting transformation of generated data into knowledge. It has become apparent that traditional statistical approaches are not suited to solve problems in the new reality of “big biological data.” From the other side, traditional computing languages such as C/C++ and Java, are not flexible enough to allow for quick development and testing of new algorithms, while MATLAB provides a powerful computing environment and a variety of sophisticated toolboxes for performing complex bioinformatics calculations

    Telomere Maintenance Pathway Activity Analysis Enables Tissue- and Gene-Level Inferences

    Get PDF
    Telomere maintenance is one of the mechanisms ensuring indefinite divisions of cancer and stem cells. Good understanding of telomere maintenance mechanisms (TMM) is important for studying cancers and designing therapies. However, molecular factors triggering selective activation of either the telomerase dependent (TEL) or the alternative lengthening of telomeres (ALT) pathway are poorly understood. In addition, more accurate and easy-to-use methodologies are required for TMM phenotyping. In this study, we have performed literature based reconstruction of signaling pathways for the ALT and TEL TMMs. Gene expression data were used for computational assessment of TMM pathway activities and compared with experimental assays for TEL and ALT. Explicit consideration of pathway topology makes bioinformatics analysis more informative compared to computational methods based on simple summary measures of gene expression. Application to healthy human tissues showed high ALT and TEL pathway activities in testis, and identified genes and pathways that may trigger TMM activation. Our approach offers a novel option for systematic investigation of TMM activation patterns across cancers and healthy tissues for dissecting pathway-based molecular markers with diagnostic impact

    PSF toolkit: an R package for pathway curation and topology-aware analysis

    Get PDF
    Most high throughput genomic data analysis pipelines currently rely on over-representation or gene set enrichment analysis (ORA/GSEA) approaches for functional analysis. In contrast, topology-based pathway analysis methods, which offer a more biologically informed perspective by incorporating interaction and topology information, have remained underutilized and inaccessible due to various limiting factors. These methods heavily rely on the quality of pathway topologies and often utilize predefined topologies from databases without assessing their correctness. To address these issues and make topology-aware pathway analysis more accessible and flexible, we introduce the PSF (Pathway Signal Flow) toolkit R package. Our toolkit integrates pathway curation and topology-based analysis, providing interactive and command-line tools that facilitate pathway importation, correction, and modification from diverse sources. This enables users to perform topology-based pathway signal flow analysis in both interactive and command-line modes. To showcase the toolkit’s usability, we curated 36 KEGG signaling pathways and conducted several use-case studies, comparing our method with ORA and the topology-based signaling pathway impact analysis (SPIA) method. The results demonstrate that the algorithm can effectively identify ORA enriched pathways while providing more detailed branch-level information. Moreover, in contrast to the SPIA method, it offers the advantage of being cut-off free and less susceptible to the variability caused by selection thresholds. By combining pathway curation and topology-based analysis, the PSF toolkit enhances the quality, flexibility, and accessibility of topology-aware pathway analysis. Researchers can now easily import pathways from various sources, correct and modify them as needed, and perform detailed topology-based pathway signal flow analysis. In summary, our PSF toolkit offers an integrated solution that addresses the limitations of current topology-based pathway analysis methods. By providing interactive and command-line tools for pathway curation and topology-based analysis, we empower researchers to conduct comprehensive pathway analyses across a wide range of applications

    Population Levels Assessment of the Distribution of Disease-Associated Variants With Emphasis on Armenians – A Machine Learning Approach

    Get PDF
    Background: During the last decades a number of genome-wide association studies (GWASs) has identified numerous single nucleotide polymorphisms (SNPs) associated with different complex diseases. However, associations reported in one population are often conflicting and did not replicate when studied in other populations. One of the reasons could be that most GWAS employ a case-control design in one or a limited number of populations, but little attention was paid to the global distribution of disease-associated alleles across different populations. Moreover, the majority of GWAS have been performed on selected European, African, and Chinese populations and the considerable number of populations remains understudied.Aim: We have investigated the global distribution of so far discovered disease-associated SNPs across worldwide populations of different ancestry and geographical regions with a special focus on the understudied population of Armenians.Data and Methods: We have used genotyping data from the Human Genome Diversity Project and of Armenian population and combined them with disease-associated SNP data taken from public repositories leading to a final dataset of 44,234 markers. Their frequency distribution across 1039 individuals from 53 populations was analyzed using self-organizing maps (SOM) machine learning. Our SOM portrayal approach reduces data dimensionality, clusters SNPs with similar frequency profiles and provides two-dimensional data images which enable visual evaluation of disease-associated SNPs landscapes among human populations.Results: We find that populations from Africa, Oceania, and America show specific patterns of minor allele frequencies of disease-associated SNPs, while populations from Europe, Middle East, Central South Asia, and Armenia mostly share similar patterns. Importantly, different sets of SNPs associated with common polygenic diseases, such as cancer, diabetes, neurodegeneration in populations from different geographic regions. Armenians are characterized by a set of SNPs that are distinct from other populations from the neighboring geographical regions.Conclusion: Genetic associations of diseases considerably vary across populations which necessitates health-related genotyping efforts especially for so far understudied populations. SOM portrayal represents novel promising methods in population genetic research with special strength in visualization-based comparison of SNP data

    Projection of High-Dimensional Genome-Wide Expression on SOM Transcriptome Landscapes

    Get PDF
    The self-organizing maps portraying has been proven to be a powerful approach for analysis of transcriptomic, genomic, epigenetic, single-cell, and pathway-level data as well as for “multi-omic” integrative analyses. However, the SOM method has a major disadvantage: it requires the retraining of the entire dataset once a new sample is added, which can be resource- and timedemanding. It also shifts the gene landscape, thus complicating the interpretation and comparison of results. To overcome this issue, we have developed two approaches of transfer learning that allow for extending SOM space with new samples, meanwhile preserving its intrinsic structure. The extension SOM (exSOM) approach is based on adding secondary data to the existing SOM space by “meta-gene adaptation”, while supervised SOM portrayal (supSOM) adds support vector machine regression model on top of the original SOM algorithm to “predict” the portrait of a new sample. Both methods have been shown to accurately combine existing and new data. With simulated data, exSOM outperforms supSOM for accuracy, while supSOM significantly reduces the computing time and outperforms exSOM for this parameter. Analysis of real datasets demonstrated the validity of the projection methods with independent datasets mapped on existing SOM space. Moreover, both methods well handle the projection of samples with new characteristics that were not present in training datasets

    Integrated Multi-Omics Maps of Lower-Grade Gliomas

    Get PDF
    Multi-omics high-throughput technologies produce data sets which are not restricted to only one but consist of multiple omics modalities, often as patient-matched tumour specimens. The integrative analysis of these omics modalities is essential to obtain a holistic view on the otherwise fragmented information hidden in this data. We present an intuitive method enabling the combined analysis of multi-omics data based on self-organizing maps machine learning. It “portrays” the expression, methylation and copy number variations (CNV) landscapes of each tumour using the same gene-centred coordinate system. It enables the visual evaluation and direct comparison of the different omics layers on a personalized basis. We applied this combined molecular portrayal to lower grade gliomas, a heterogeneous brain tumour entity. It classifies into a series of molecular subtypes defined by genetic key lesions, which associate with large-scale effects on DNA methylation and gene expression, and in final consequence, drive with cell fate decisions towards oligodendroglioma-, astrocytoma- and glioblastoma-like cancer cell lineages with different prognoses. Consensus modes of concerted changes of expression, methylation and CNV are governed by the degree of co-regulation within and between the omics layers. The method is not restricted to the triple-omics data used here. The similarity landscapes reflect partly independent effects of genetic lesions and DNA methylation with consequences for cancer hallmark characteristics such as proliferation, inflammation and blocked differentiation in a subtype specific fashion. It can be extended to integrate other omics features such as genetic mutation, protein expression data as well as extracting prognostic markers

    Looking for the tombs of dragons’: preliminary results of archaeo-geochemical prospecting studies at Tirinkatar - Karmir Sar area, southern slopes of Mt Aragats, Armenia.

    Get PDF
    This article reports on an interdisciplinary archaeogeochemical research on vishaps (stone stelae also known as dragon stones) that has been carried out for the first time in Armenia. The survey area is situated in the neighborhood of Tirinkatar and Karmir Sar volcanoes on the southern slopes of Mt. Aragats. The geochemical prospecting studies have been realized on a high mountain meadow (2850 m asl) with 12 vishaps and numerous circular stone structures known as cromlechs. Five cromlechs excavated until now did not yield any human remains and the main aim of the geochemical prospection was to check whether other cromlechs detected by archaeological surface survey and by ground-penetrating radar contained burials. The geochemical haloes of some chemical elements indicate their anthropogenic character and a very high probability that some of the cromlechs were tombs

    Normalizing to GADPH jeopardises correct quantification of gene expression in ovarian tumours – IPO8 and RPL4 are reliable reference genes

    Get PDF
    BACKGROUND: To ensure a correct interpretation of results obtained with quantitative real-time reverse transcription-polymerase chain reaction (RT-qPCR), it is critical to normalize to a reference gene with stable mRNA expression in the tissue of interest. GADPH is widely used as a reference gene in ovarian tumour studies, although lacking tissue-specific stability. The aim of this study was to identify alternative suitable reference genes for RT-qPCR studies on benign, borderline, and malignant ovarian tumours. METHODS: We assayed mRNA levels for 13 potential reference genes – ABL1, ACTB, CDKN1A, GADPH, GUSB, HPRT1, HSP90AB, IPO8, PPIA, RPL30, RPL4, RPLPO, and TBP –with RT-qPCR in 42 primary ovarian tumours, using commercially pre-designed RT-qPCR probes. Expression stability was subsequently analysed with four different statistical programs (GeNorm, NormFinder, BestKeeper, and the Equivalence test). RESULTS: Expression of IPO8, RPL4, TBP, RPLPO, and ACTB had the least variation in expression across the tumour samples according to GeNorm, NormFinder, and BestKeeper. The Equivalence test found variation in expression within a 3-fold expression change between tumour groups for: IPO8, RPL40, RPL30, GUSB, TBP, RPLPO, ACTB, ABL1, and CDKN1A. However, only IPO8 satisfied at a 2-fold change as a cut-off. Overall, IPO8 and RPL4 had the highest, whereas GADPH and HPRT1 the lowest expression stability. Employment of suitable reference genes (IPO8, RPL4) in comparison with unsuitable ones (GADPH, HPRT1), demonstrated divergent influence on the mRNA expression pattern of our target genes − GPER and uPAR. CONCLUSIONS: We found IPO8 and RPL4 to be suitable reference genes for normalization of target gene expression in benign, borderline, and malignant ovarian tumours. Moreover, IPO8 can be recommended as a single reference gene. Neither GADPH nor HPRT1 should be used as reference genes in studies on ovarian tumour tissue

    The Evolving Faces of the SARS-CoV-2 Genome

    Get PDF
    Surveillance of the evolving SARS-CoV-2 genome combined with epidemiological monitoring and emerging vaccination became paramount tasks to control the pandemic which is rapidly changing in time and space. Genomic surveillance must combine generation and sharing sequence data with appropriate bioinformatics monitoring and analysis methods. We applied molecular portrayal using self-organizing maps machine learning (SOM portrayal) to characterize the diversity of the virus genomes, their mutual relatedness and development since the beginning of the pandemic. The genetic landscape obtained visualizes the relevant mutations in a lineage-specific fashion and provides developmental paths in genetic state space from early lineages towards the variants of concern alpha, beta, gamma and delta. The different genes of the virus have specific footprints in the landscape reflecting their biological impact. SOM portrayal provides a novel option for ‘bioinformatics surveillance’ of the pandemic, with strong odds regarding visualization, intuitive perception and ‘personalization’ of the mutational patterns of the virus genomes
    • 

    corecore