22 research outputs found

    Friends-Enemies: Endogenous Retroviruses Are Major Transcriptional Regulators of Human DNA

    Get PDF
    Endogenous retroviruses are mobile genetic elements hardly distinguishable from infectious, or “exogenous,” retroviruses at the time of insertion in the host DNA. Human endogenous retroviruses (HERVs) are not rare. They gave rise to multiple families of closely related mobile elements that occupy ~8% of the human genome. Together, they shape genomic regulatory landscape by providing at least ~320,000 human transcription factor binding sites (TFBS) located on ~110,000 individual HERV elements. The HERVs host as many as 155,000 mapped DNaseI hypersensitivity sites, which denote loci active in the regulation of gene expression or chromatin structure. The contemporary view of the HERVs evolutionary dynamics suggests that at the early stages after insertion, the HERV is treated by the host cells as a foreign genetic element, and is likely to be suppressed by the targeted methylation and mutations. However, at the later stages, when significant number of mutations has been already accumulated and when the retroviral genes are broken, the regulatory potential of a HERV may be released and recruited to modify the genomic balance of transcription factor binding sites. This process goes together with further accumulation and selection of mutations, which reshape the regulatory landscape of the human DNA. However, developmental reprogramming, stress or pathological conditions like cancer, inflammation and infectious diseases, can remove the blocks limiting expression and HERV-mediated host gene regulation. This, in turn, can dramatically alter the gene expression equilibrium and shift it to a newer state, thus further amplifying instability and exacerbating the stressful situation

    FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier

    Get PDF
    Here, we propose a heuristic technique of data trimming for SVM termed FLOating Window Projective Separator (FloWPS), tailored for personalized predictions based on molecular data. This procedure can operate with high throughput genetic datasets like gene expression or mutation profiles. Its application prevents SVM from extrapolation by excluding non-informative features. FloWPS requires training on the data for the individuals with known clinical outcomes to create a clinically relevant classifier. The genetic profiles linked with the outcomes are broken as usual into the training and validation datasets. The unique property of FloWPS is that irrelevant features in validation dataset that don’t have significant number of neighboring hits in the training dataset are removed from further analyses. Next, similarly to the k nearest neighbors (kNN) method, for each point of a validation dataset, FloWPS takes into account only the proximal points of the training dataset. Thus, for every point of a validation dataset, the training dataset is adjusted to form a floating window. FloWPS performance was tested on ten gene expression datasets for 992 cancer patients either responding or not on the different types of chemotherapy. We experimentally confirmed by leave-one-out cross-validation that FloWPS enables to significantly increase quality of a classifier built based on the classical SVM in most of the applications, particularly for polynomial kernels

    In search for geroprotectors: in silico screening and in vitro validation of signalome-level mimetics of young healthy state

    Get PDF
    ABSTRACT Populations in developed nations throughout the world are rapidly aging, and the search for geroprotectors, or anti-aging interventions, has never been more important. Yet while hundreds of geroprotectors have extended lifespan in animal models, none have yet been approved for widespread use in humans. GeroScope is a computational tool that can aid prediction of novel geroprotectors from existing human gene expression data. GeroScope maps expression differences between samples from young and old subjects to aging-related signaling pathways, then profiles pathway activation strength (PAS) for each condition. Known substances are then screened and ranked for those most likely to target differential pathways and mimic the young signalome. Here we used GeroScope and shortlisted ten substances, all of which have lifespan-extending effects in animal models, and tested 6 of them for geroprotective effects in senescent human fibroblast cultures. PD-98059, a highly selective MEK1 inhibitor, showed both life-prolonging and rejuvenating effects. Natural compounds like N-acetyl-L-cysteine, Myricetin and Epigallocatechin gallate also improved several senescence-associated properties and were further investigated with pathway analysis. This work not only highlights several potential geroprotectors for further study, but also serves as a proof-of-concept for GeroScope, Oncofinder and other PAS-based methods in streamlining drug prediction, repurposing and personalized medicine

    Characteristic patterns of microRNA expression in human bladder cancer

    No full text
    MicroRNAs (miRNAs) are small, noncoding RNAs that post-transcriptionally regulate gene expression. Their altered expression and functional activity have been observed in many human cancers. MiRNAs represent promising diagnostic and prognostic molecular biomarkers, and also serve as novel therapeutic targets. We performed a systematic analysis of scientific reports that link differences in miRNA expression with the pathogenesis of bladder cancer. This literature review is the first comprehensive database of miRNA molecules with biased expression profiles in bladder cancer. Among the 95 differentially expressed miRNAs that we identified from the literature, we classify 48 as upregulated in bladder cancer, 35 as downregulated, and 12 as contradictory (contradictory data were reported in one or more studies on the gene). In addition, we discuss the possible roles of differentially expressed miRNAs in the regulation of intracellular signaling pathways in bladder cancer

    Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology

    No full text
    (1) Background: Machine learning (ML) methods are rarely used for an omics-based prescription of cancer drugs, due to shortage of case histories with clinical outcome supplemented by high-throughput molecular data. This causes overtraining and high vulnerability of most ML methods. Recently, we proposed a hybrid global-local approach to ML termed floating window projective separator (FloWPS) that avoids extrapolation in the feature space. Its core property is data trimming, i.e., sample-specific removal of irrelevant features. (2) Methods: Here, we applied FloWPS to seven popular ML methods, including linear SVM, k nearest neighbors (kNN), random forest (RF), Tikhonov (ridge) regression (RR), binomial naïve Bayes (BNB), adaptive boosting (ADA) and multi-layer perceptron (MLP). (3) Results: We performed computational experiments for 21 high throughput gene expression datasets (41–235 samples per dataset) totally representing 1778 cancer patients with known responses on chemotherapy treatments. FloWPS essentially improved the classifier quality for all global ML methods (SVM, RF, BNB, ADA, MLP), where the area under the receiver-operator curve (ROC AUC) for the treatment response classifiers increased from 0.61–0.88 range to 0.70–0.94. We tested FloWPS-empowered methods for overtraining by interrogating the importance of different features for different ML methods in the same model datasets. (4) Conclusions: We showed that FloWPS increases the correlation of feature importance between the different ML methods, which indicates its robustness to overtraining. For all the datasets tested, the best performance of FloWPS data trimming was observed for the BNB method, which can be valuable for further building of ML classifiers in personalized oncology

    Shambhala: a platform-agnostic data harmonizer for gene expression data

    No full text
    Abstract Background Harmonization techniques make different gene expression profiles and their sets compatible and ready for comparisons. Here we present a new bioinformatic tool termed Shambhala for harmonization of multiple human gene expression datasets obtained using different experimental methods and platforms of microarray hybridization and RNA sequencing. Results Unlike previously published methods enabling good quality data harmonization for only two datasets, Shambhala allows conversion of multiple datasets into the universal form suitable for further comparisons. Shambhala harmonization is based on the calibration of gene expression profiles using the auxiliary standardization dataset. Each profile is transformed to make it similar to the output of microarray hybridization platform Affymetrix Human Gene. This platform was chosen because it has the biggest number of human gene expression profiles deposited in public databases. We evaluated Shambhala ability to retain biologically important features after harmonization. The same four biological samples taken in multiple replicates were profiled independently using three and four different experimental platforms, respectively, then Shambhala-harmonized and investigated by hierarchical clustering. Conclusion Our results showed that unlike other frequently used methods: quantile normalization and DESeq/DESeq2 normalization, Shambhala harmonization was the only method supporting sample-specific and platform-independent biologically meaningful clustering for the data obtained from multiple experimental platforms

    Profiling of Human Molecular Pathways Affected by Retrotransposons at the Level of Regulation by Transcription Factor Proteins

    No full text
    Endogenous retroviruses and retrotransposons also termed retroelements (REs) are mobile genetic elements that were active until recently in human genome evolution. REs regulate gene expression by actively reshaping chromatin structure or by directly providing transcription factor binding sites (TFBSs). We aimed to identify molecular processes most deeply impacted by the REs in human cells at the level of TFBS regulation. By using ENCODE data, we identified ~2 million TFBS overlapping with putatively regulation-competent human REs located in 5-kb gene promoter neighborhood (~17% of all TFBS in promoter neighborhoods; ~9% of all RE-linked TFBS). Most of REs hosting TFBS were highly diverged repeats, and for the evolutionary young (0–8% diverged) elements we identified only ~7% of all RE-linked TFBS. The gene-specific distributions of RE-linked TFBS generally correlated with the distributions for all TFBS. However, several groups of molecular processes were highly enriched in the RE-linked TFBS regulation. They were strongly connected with the immunity and response to pathogens, with the negative regulation of gene transcription, ubiquitination, and protein degradation, extracellular matrix organization, regulation of STAT signaling, fatty acids metabolism, regulation of GTPase activity, protein targeting to Golgi, regulation of cell division and differentiation, development and functioning of perception organs and reproductive system. By contrast, the processes most weakly affected by the REs were linked with the conservative aspects of embryo development. We also identified differences in the regulation features by the younger and older fractions of the REs. The regulation by the older fraction of the REs was linked mainly with the immunity, cell adhesion, cAMP, IGF1R, Notch, Wnt, and integrin signaling, neuronal development, chondroitin sulfate and heparin metabolism, and endocytosis. The younger REs regulate other aspects of immunity, cell cycle progression and apoptosis, PDGF, TGF beta, EGFR, and p38 signaling, transcriptional repression, structure of nuclear lumen, catabolism of phospholipids, and heterocyclic molecules, insulin and AMPK signaling, retrograde Golgi-ER transport, and estrogen signaling. The immunity-linked pathways were highly represented in both categories, but their functional roles were different and did not overlap. Our results point to the most quickly evolving molecular pathways in the recent and ancient evolution of human genome

    Retroelement—Linked Transcription Factor Binding Patterns Point to Quickly Developing Molecular Pathways in Human Evolution

    No full text
    Background: Retroelements (REs) are transposable elements occupying ~40% of the human genome that can regulate genes by providing transcription factor binding sites (TFBS). RE-linked TFBS profile can serve as a marker of gene transcriptional regulation evolution. This approach allows for interrogating the regulatory evolution of organisms with RE-rich genomes. We aimed to characterize the evolution of transcriptional regulation for human genes and molecular pathways using RE-linked TFBS accumulation as a metric. Methods: We characterized human genes and molecular pathways either enriched or deficient in RE-linked TFBS regulation. We used ENCODE database with mapped TFBS for 563 transcription factors in 13 human cell lines. For 24,389 genes and 3124 molecular pathways, we calculated the score of RE-linked TFBS regulation reflecting the regulatory evolution rate at the level of individual genes and molecular pathways. Results: The major groups enriched by RE regulation deal with gene regulation by microRNAs, olfaction, color vision, fertilization, cellular immune response, and amino acids and fatty acids metabolism and detoxication. The deficient groups were involved in translation, RNA transcription and processing, chromatin organization, and molecular signaling. Conclusion: We identified genes and molecular processes that have characteristics of especially high or low evolutionary rates at the level of RE-linked TFBS regulation in human lineage
    corecore