7 research outputs found

    Graph Theoretic and Pearson Correlation-Based Discovery of Network Biomarkers for Cancer

    Get PDF
    Two graph theoretic concepts—clique and bipartite graphs—are explored to identify the network biomarkers for cancer at the gene network level. The rationale is that a group of genes work together by forming a cluster or a clique-like structures to initiate a cancer. After initiation, the disease signal goes to the next group of genes related to the second stage of a cancer, which can be represented as a bipartite graph. In other words, bipartite graphs represent the cross-talk among the genes between two disease stages. To prove this hypothesis, gene expression values for three cancers— breast invasive carcinoma (BRCA), colorectal adenocarcinoma (COAD) and glioblastoma multiforme (GBM)—are used for analysis. First, a co-expression gene network is generated with highly correlated gene pairs with a Pearson correlation coefficient ≥ 0.9. Second, clique structures of all sizes are isolated from the co-expression network. Then combining these cliques, three different biomarker modules are developed—maximal clique-like modules, 2-clique-1-bipartite modules, and 3-clique-2-bipartite modules. The list of biomarker genes discovered from these network modules are validated as the essential genes for causing a cancer in terms of network properties and survival analysis. This list of biomarker genes will help biologists to design wet lab experiments for further elucidating the complex mechanism of cancer

    Multi-run concrete autoencoder to identify prognostic lncRNAs for 12 cancers

    Get PDF
    Background: Long non-coding RNA plays a vital role in changing the expression profiles of various target genes that lead to cancer development. Thus, identifying prognostic lncRNAs related to different cancers might help in developing cancer therapy. Method: To discover the critical lncRNAs that can identify the origin of different cancers, we propose the use of the state-of-the-art deep learning algorithm concrete autoencoder (CAE) in an unsupervised setting, which efficiently identifies a subset of the most informative features. However, CAE does not identify reproducible features in different runs due to its stochastic nature. We thus propose a multi-run CAE (mrCAE) to identify a stable set of features to address this issue. The assumption is that a feature appearing in multiple runs carries more meaningful information about the data under consideration. The genome-wide lncRNA expression profiles of 12 different types of cancers, with a total of 4768 samples available in The Cancer Genome Atlas (TCGA), were analyzed to discover the key lncRNAs. The lncRNAs identified by multiple runs of CAE were added to a final list of key lncRNAs that are capable of identifying 12 different cancers. Results: Our results showed that mrCAE performs better in feature selection than single-run CAE, standard autoencoder (AE), and other state-of-the-art feature selection techniques. This study revealed a set of top-ranking 128 lncRNAs that could identify the origin of 12 different cancers with an accuracy of 95%. Survival analysis showed that 76 of 128 lncRNAs have the prognostic capability to differentiate high-and low-risk groups of patients with different cancers. Conclusion: The proposed mrCAE, which selects actual features, outperformed the AE even though it selects the latent or pseudo-features. By selecting actual features instead of pseudo-features, mrCAE can be valuable for precision medicine. The identified prognostic lncRNAs can be further studied to develop therapies for different cancers

    Network Based Prediction of Protein Localization Using Diffusion Kernel

    No full text
    With the availability of an overwhelming amount of high-throughput biological data, biologists and medical researchers increasingly depend on computational algorithms for hypothesis generation and prediction. One area of bioinformatics research is the development of algorithms for predicting subcellular localization of both monoplex and multiplex proteins. Most of current localization prediction algorithms employ features derived from protein sequence data and external functional annotations such as gene ontology or physicochemical properties. However, there is no method that can exploit rich localization information in a protein-protein correlation network since correlated proteins tend to be co-localized within the cell. Here we propose a novel diffusion kernel and logistic regression based algorithm, NetLoc, for protein localization prediction by exploiting protein correlation networks. NetLoc is applied to yeast protein localization prediction using four types of protein networks including physical protein-protein interaction (PPI) networks, genetic PPI networks, mixed PPI networks, and co-expressed PPI networks. Experiments showed that protein networks can provide rich information for localization prediction, achieving an AUC score up to 0.93. We also showed that networks with high connectivity and high percentage of co-localized PPI lead to better prediction performance. Compared to a previous network feature based prediction algorithm with an AUC score of 0.52 on the yeast PPI network, NetLoc achieved significantly better overall performance with an AUC of 0.74 on the same dataset. We also investigated how the prediction performance of NetLoc was affected by the network characteristics such as ratio of the number of co-localized PPI (coPPI) to the number of non-co-localized PPI (ncPPI) and the density of annotated coPPI in the network. For a given network with a specific number of proteins, NetLoc performance increases with increasing coPPI/ncPPI ratio and increasing density of annotated coPPI. Another limitation of current protein localization algorithms is that they are not capable of predicting multi-location proteins. NetLoc algorithm addressed this limitation by calculating probabilistic scores for all locations for each query protein. Evaluation on the Yeast multi-localization protein dataset showed that the overall success rate of NetLoc is 88%, which is much higher than the existing method (73%) tested on the same dataset. Finally, we proposed and evaluated two methods for network based localization prediction based on multiple protein correlation networks. One is by constructing a unified protein correlation network. The other is to use multiple network kernels. Experiment showed that both methods can improve the NetLoc performance compared to original individual network

    MOGAT: A Multi-Omics Integration Framework Using Graph Attention Networks for Cancer Subtype Prediction

    No full text
    Accurate cancer subtype prediction is crucial for personalized medicine. Integrating multi-omics data represents a viable approach to comprehending the intricate pathophysiology of complex diseases like cancer. Conventional machine learning techniques are not ideal for analyzing the complex interrelationships among different categories of omics data. Numerous models have been suggested using graph-based learning to uncover veiled representations and network formations unique to distinct types of omics data to heighten predictions regarding cancers and characterize patients’ profiles, amongst other applications aimed at improving disease management in medical research. The existing graph-based state-of-the-art multi-omics integration approaches for cancer subtype prediction, MOGONET, and SUPREME, use a graph convolutional network (GCN), which fails to consider the level of importance of neighboring nodes on a particular node. To address this gap, we hypothesize that paying attention to each neighbor or providing appropriate weights to neighbors based on their importance might improve the cancer subtype prediction. The natural choice to determine the importance of each neighbor of a node in a graph is to explore the graph attention network (GAT). Here, we propose MOGAT, a novel multi-omics integration approach, leveraging GAT models that incorporate graph-based learning with an attention mechanism. MOGAT utilizes a multi-head attention mechanism to extract appropriate information for a specific sample by assigning unique attention coefficients to neighboring samples. Based on our knowledge, our group is the first to explore GAT in multi-omics integration for cancer subtype prediction. To evaluate the performance of MOGAT in predicting cancer subtypes, we explored two sets of breast cancer data from TCGA and METABRIC. Our proposed approach, MOGAT, outperforms MOGONET by 32% to 46% and SUPREME by 2% to 16% in cancer subtype prediction in different scenarios, supporting our hypothesis. Our results also showed that GAT embeddings provide a better prognosis in differentiating the high-risk group from the low-risk group than raw features

    Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers

    Get PDF
    Background: Long non-coding RNA plays a vital role in changing the expression profiles of various target genes that lead to cancer development. Thus, identifying prognostic lncRNAs related to different cancers might help in developing cancer therapy. Method: To discover the critical lncRNAs that can identify the origin of different cancers, we propose the use of the state-of-the-art deep learning algorithm concrete autoencoder (CAE) in an unsupervised setting, which efficiently identifies a subset of the most informative features. However, CAE does not identify reproducible features in different runs due to its stochastic nature. We thus propose a multi-run CAE (mrCAE) to identify a stable set of features to address this issue. The assumption is that a feature appearing in multiple runs carries more meaningful information about the data under consideration. The genome-wide lncRNA expression profiles of 12 different types of cancers, with a total of 4768 samples available in The Cancer Genome Atlas (TCGA), were analyzed to discover the key lncRNAs. The lncRNAs identified by multiple runs of CAE were added to a final list of key lncRNAs that are capable of identifying 12 different cancers. Results: Our results showed that mrCAE performs better in feature selection than single-run CAE, standard autoencoder (AE), and other state-of-the-art feature selection techniques. This study revealed a set of top-ranking 128 lncRNAs that could identify the origin of 12 different cancers with an accuracy of 95%. Survival analysis showed that 76 of 128 lncRNAs have the prognostic capability to differentiate high- and low-risk groups of patients with different cancers. Conclusion: The proposed mrCAE, which selects actual features, outperformed the AE even though it selects the latent or pseudo-features. By selecting actual features instead of pseudo-features, mrCAE can be valuable for precision medicine. The identified prognostic lncRNAs can be further studied to develop therapies for different cancers

    Potential Autoimmunity Resulting from Molecular Mimicry between SARS-CoV-2 Spike and Human Proteins

    No full text
    Molecular mimicry between viral antigens and host proteins can produce cross-reacting antibodies leading to autoimmunity. The coronavirus SARS-CoV-2 causes COVID-19, a disease curiously resulting in varied symptoms and outcomes, ranging from asymptomatic to fatal. Autoimmunity due to cross-reacting antibodies resulting from molecular mimicry between viral antigens and host proteins may provide an explanation. Thus, we computationally investigated molecular mimicry between SARS-CoV-2 Spike and known epitopes. We discovered molecular mimicry hotspots in Spike and highlight two examples with tentative high autoimmune potential and implications for understanding COVID-19 complications. We show that a TQLPP motif in Spike and thrombopoietin shares similar antibody binding properties. Antibodies cross-reacting with thrombopoietin may induce thrombocytopenia, a condition observed in COVID-19 patients. Another motif, ELDKY, is shared in multiple human proteins, such as PRKG1 involved in platelet activation and calcium regulation, and tropomyosin, which is linked to cardiac disease. Antibodies cross-reacting with PRKG1 and tropomyosin may cause known COVID-19 complications such as blood-clotting disorders and cardiac disease, respectively. Our findings illuminate COVID-19 pathogenesis and highlight the importance of considering autoimmune potential when developing therapeutic interventions to reduce adverse reactions

    Evidence for structural discordance in the inverted metamorphic sequence of Sikkim himalaya: Towards resolving the main central thrust controversy

    No full text
    Inverted, metamorphism in the Himalayas is closely associated with the Main Central Thrust (MCT). In the western Himalayas, the Main Central Thrust conventionally separates high grade metamorphic rocks of the Higher Himalayan Crystalline Sequence (HHCS) from unmetamorphosed rocks of the Inner sedimentary Belt. In the eastern Himalayas, the Inner sedimentary Belt is absent, and the HHCS and meta-sedimentary Lesser Himalayan Sequence (LHS) apparently form a continuous Barrovian metamorphic sequence, leading to confusion about the precise location of the MCT. In this study, it is demonstrated that migmatitic gneisses of the sillimanite zone in the higher structural levels of the HHCS are multiply deformed, with two phases of penetrative fabric formation (S 1HHCS and S2HHCS) followed by third folding event associated with a spaced, NW-SE trending, north-east dipping foliation (S 3HHCS). The underlying LHS schists (kyanite zone and lower) are also multiply deformed, with the bedding S0 being isoclinally folded (F1LHS), and subsequently refolded (F2LHS and F 3LHS). The contact zone between, the HHCS and LHS is characterized by ductile, top-to-the southwest shearing and stabilization of a pervasive foliation that is consistently oriented NW-SE and dips northeast. This foliation is parallel to the S3HHCS foliation in the HHCS, and the S 2LHS in the LHS. Early lineations in the HHCS and LHS also show different dispersions across the contact shear zone, implying that pre-thrusting orientations of the two units were distinct. The contact shear zone is therefore interpreted to be a plane of structural discordance, shows a shear sense consistent with thrust movement and is associated with mineral growth during Barrovian metamorphism. It may well be considered to represent the MCT in this region
    corecore