13 research outputs found

    CapsNetYY1: identifying YY1-mediated chromatin loops based on a capsule network architecture

    No full text
    Abstract Background Previous studies have identified that chromosome structure plays a very important role in gene control. The transcription factor Yin Yang 1 (YY1), a multifunctional DNA binding protein, could form a dimer to mediate chromatin loops and active enhancer-promoter interactions. The deletion of YY1 or point mutations at the YY1 binding sites significantly inhibit the enhancer-promoter interactions and affect gene expression. To date, only a few computational methods are available for identifying YY1-mediated chromatin loops. Results We proposed a novel model named CapsNetYY1, which was based on capsule network architecture to identify whether a pair of YY1 motifs can form a chromatin loop. Firstly, we encode the DNA sequence using one-hot encoding method. Secondly, multi-scale convolution layer is used to extract local features of the sequence, and bidirectional gated recurrent unit is used to learn the features across time steps. Finally, capsule networks (convolution capsule layer and digital capsule layer) used to extract higher level features and recognize YY1-mediated chromatin loops. Compared with DeepYY1, the only prediction for YY1-mediated chromatin loops, our model CapsNetYY1 achieved the better performance on the independent datasets (AUC >0.99> 0.99 > 0.99 ). Conclusion The results indicate that CapsNetYY1 is an excellent method for identifying YY1-mediated chromatin loops. We believe that the CapsNetYY1 method will be used for predictive classification of other DNA sequences

    Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System

    No full text
    Hotspot residues are important in the determination of protein-protein interactions, and they always perform specific functions in biological processes. The determination of hotspot residues is by the commonly-used method of alanine scanning mutagenesis experiments, which is always costly and time consuming. To address this issue, computational methods have been developed. Most of them are structure based, i.e., using the information of solved protein structures. However, the number of solved protein structures is extremely less than that of sequences. Moreover, almost all of the predictors identified hotspots from the interfaces of protein complexes, seldom from the whole protein sequences. Therefore, determining hotspots from whole protein sequences by sequence information alone is urgent. To address the issue of hotspot predictions from the whole sequences of proteins, we proposed an ensemble system with random projections using statistical physicochemical properties of amino acids. First, an encoding scheme involving sequence profiles of residues and physicochemical properties from the AAindex1 dataset is developed. Then, the random projection technique was adopted to project the encoding instances into a reduced space. Then, several better random projections were obtained by training an IBk classifier based on the training dataset, which were thus applied to the test dataset. The ensemble of random projection classifiers is therefore obtained. Experimental results showed that although the performance of our method is not good enough for real applications of hotspots, it is very promising in the determination of hotspot residues from whole sequences

    Personalized Driver Gene Prediction Using Graph Convolutional Networks with Conditional Random Fields

    No full text
    Cancer is a complex and evolutionary disease mainly driven by the accumulation of genetic variations in genes. Identifying cancer driver genes is important. However, most related studies have focused on the population level. Cancer is a disease with high heterogeneity. Thus, the discovery of driver genes at the individual level is becoming more valuable but is a great challenge. Although there have been some computational methods proposed to tackle this challenge, few can cover all patient samples well, and there is still room for performance improvement. In this study, to identify individual-level driver genes more efficiently, we propose the PDGCN method. PDGCN integrates multiple types of data features, including mutation, expression, methylation, copy number data, and system-level gene features, along with network structural features extracted using Node2vec in order to construct a sample–gene interaction network. Prediction is performed using a graphical convolutional neural network model with a conditional random field layer, which is able to better combine the network structural features with biological attribute features. Experiments on the ACC (Adrenocortical Cancer) and KICH (Kidney Chromophobe) datasets from TCGA (The Cancer Genome Atlas) demonstrated that the method performs better compared to other similar methods. It can identify not only frequently mutated driver genes, but also rare candidate driver genes and novel biomarker genes. The results of the survival and enrichment analyses of these detected genes demonstrate that the method can identify important driver genes at the individual level

    Class Probability Propagation of Supervised Information Based on Sparse Subspace Clustering for Hyperspectral Images

    No full text
    Hyperspectral image (HSI) clustering has drawn increasing attention due to its challenging work with respect to the curse of dimensionality. In this paper, we propose a novel class probability propagation of supervised information based on sparse subspace clustering (CPPSSC) algorithm for HSI clustering. Firstly, we estimate the class probability of unlabeled samples by way of partial known supervised information, which can be addressed by sparse representation-based classification (SRC). Then, we incorporate the class probability into the traditional sparse subspace clustering (SSC) model to obtain a more accurate sparse representation coefficient matrix accompanied by obvious block diagonalization, which will be used to build the similarity matrix. Finally, the cluster results can be obtained by applying the spectral clustering on similarity matrix. Extensive experiments on a variety of challenging data sets illustrate that our proposed method is effective

    Divide-and-Attention Network for HE-Stained Pathological Image Classification

    No full text
    Since pathological images have some distinct characteristics that are different from natural images, the direct application of a general convolutional neural network cannot achieve good classification performance, especially for fine-grained classification problems (such as pathological image grading). Inspired by the clinical experience that decomposing a pathological image into different components is beneficial for diagnosis, in this paper, we propose a Divide-and-Attention Network (DANet) for Hematoxylin-and-Eosin (HE)-stained pathological image classification. The DANet utilizes a deep-learning method to decompose a pathological image into nuclei and non-nuclei parts. With such decomposed pathological images, the DANet first performs feature learning independently in each branch, and then focuses on the most important feature representation through the branch selection attention module. In this way, the DANet can learn representative features with respect to different tissue structures and adaptively focus on the most important ones, thereby improving classification performance. In addition, we introduce deep canonical correlation analysis (DCCA) constraints in the feature fusion process of different branches. The DCCA constraints play the role of branch fusion attention, so as to maximize the correlation of different branches and ensure that the fused branches emphasize specific tissue structures. The experimental results of three datasets demonstrate the superiority of the DANet, with an average classification accuracy of 92.5% on breast cancer classification, 95.33% on colorectal cancer grading, and 91.6% on breast cancer grading tasks

    Dissecting Spatiotemporal Structures in Spatial Transcriptomics via Diffusion-Based Adversarial Learning.

    No full text
    Recent advancements in spatial transcriptomics (ST) technologies offer unprecedented opportunities to unveil the spatial heterogeneity of gene expression and cell states within tissues. Despite these capabilities of the ST data, accurately dissecting spatiotemporal structures (e.g., spatial domains, temporal trajectories, and functional interactions) remains challenging. Here, we introduce a computational framework, PearlST (partial differential equation [PDE]-enhanced adversarial graph autoencoder of ST), for accurate inference of spatiotemporal structures from the ST data using PDE-enhanced adversarial graph autoencoder. PearlST employs contrastive learning to extract histological image features, integrates a PDE-based diffusion model to enhance characterization of spatial features at domain boundaries, and learns the latent low-dimensional embeddings via Wasserstein adversarial regularized graph autoencoders. Comparative analyses across multiple ST datasets with varying resolutions demonstrate that PearlST outperforms existing methods in spatial clustering, trajectory inference, and pseudotime analysis. Furthermore, PearlST elucidates functional regulations of the latent features by linking intercellular ligand-receptor interactions to most contributing genes of the low-dimensional embeddings, as illustrated in a human breast cancer dataset. Overall, PearlST proves to be a powerful tool for extracting interpretable latent features and dissecting intricate spatiotemporal structures in ST data across various biological contexts

    MDSCMF: Matrix Decomposition and Similarity-Constrained Matrix Factorization for miRNA–Disease Association Prediction

    No full text
    MicroRNAs (miRNAs) are small non-coding RNAs that are related to a number of complicated biological processes, and numerous studies have demonstrated that miRNAs are closely associated with many human diseases. In this study, we present a matrix decomposition and similarity-constrained matrix factorization (MDSCMF) to predict potential miRNA–disease associations. First of all, we utilized a matrix decomposition (MD) algorithm to get rid of outliers from the miRNA–disease association matrix. Then, miRNA similarity was determined by utilizing similarity kernel fusion (SKF) to integrate miRNA function similarity and Gaussian interaction profile (GIP) kernel similarity, and disease similarity was determined by utilizing SKF to integrate disease semantic similarity and GIP kernel similarity. Furthermore, we added L2 regularization terms and similarity constraint terms to non-negative matrix factorization to form a similarity-constrained matrix factorization (SCMF) algorithm, which was applied to make prediction. MDSCMF achieved AUC values of 0.9488, 0.9540, and 0.8672 based on fivefold cross-validation (5-CV), global leave-one-out cross-validation (global LOOCV), and local leave-one-out cross-validation (local LOOCV), respectively. Case studies on three common human diseases were also implemented to demonstrate the prediction ability of MDSCMF. All experimental results confirmed that MDSCMF was effective in predicting underlying associations between miRNAs and diseases

    Nuclei-Guided Network for Breast Cancer Grading in HE-Stained Pathological Images

    No full text
    Breast cancer grading methods based on hematoxylin-eosin (HE) stained pathological images can be summarized into two categories. The first category is to directly extract the pathological image features for breast cancer grading. However, unlike the coarse-grained problem of breast cancer classification, breast cancer grading is a fine-grained classification problem, so general methods cannot achieve satisfactory results. The second category is to apply the three evaluation criteria of the Nottingham Grading System (NGS) separately, and then integrate the results of the three criteria to obtain the final grading result. However, NGS is only a semiquantitative evaluation method, and there may be far more image features related to breast cancer grading. In this paper, we proposed a Nuclei-Guided Network (NGNet) for breast invasive ductal carcinoma (IDC) grading in pathological images. The proposed nuclei-guided attention module plays the role of nucleus attention, so as to learn more nuclei-related feature representations for breast IDC grading. In addition, the proposed nuclei-guided fusion module in the fusion process of different branches can further enable the network to focus on learning nuclei-related features. Overall, under the guidance of nuclei-related features, the entire NGNet can learn more fine-grained features for breast IDC grading. The experimental results show that the performance of the proposed method is better than that of state-of-the-art method. In addition, we released a well-labeled dataset with 3644 pathological images for breast IDC grading. This dataset is currently the largest publicly available breast IDC grading dataset and can serve as a benchmark to facilitate a broader study of breast IDC grading

    DMA-HPCNet: Dual Multi-Level Attention Hybrid Pyramid Convolution Neural Network for Alzheimer’s Disease Classification

    No full text
    Computer-aided diagnosis (CAD) plays a crucial role in the clinical application of Alzheimer’s disease (AD). In particular, convolutional neural network (CNN)-based methods are highly sensitive to subtle changes caused by brain atrophy in medical images (e.g., magnetic resonance imaging, MRI). Due to computational resource constraints, most CAD methods focus on quantitative features in specific regions, neglecting the holistic nature of the images, which poses a challenge for a comprehensive understanding of pathological changes in AD. To address this issue, we propose a lightweight dual multi-level hybrid pyramid convolutional neural network (DMA-HPCNet) to aid clinical diagnosis of AD. Specifically, we introduced ResNet as the backbone network and modularly extended the hybrid pyramid convolution (HPC) block and the dual multi-level attention (DMA) module. Among them, the HPC block is designed to enhance the acquisition of information at different scales, and the DMA module is proposed to sequentially extract different local and global representations from the channel and spatial domains. Our proposed DMA-HPCNet method was evaluated on baseline MRI slices of 443 subjects from the ADNI dataset. Experimental results show that our proposed DMA-HPCNet model performs efficiently in AD-related classification tasks with low computational cost
    corecore