891 research outputs found

    Lifelong Embedding Learning and Transfer for Growing Knowledge Graphs

    Full text link
    Existing knowledge graph (KG) embedding models have primarily focused on static KGs. However, real-world KGs do not remain static, but rather evolve and grow in tandem with the development of KG applications. Consequently, new facts and previously unseen entities and relations continually emerge, necessitating an embedding model that can quickly learn and transfer new knowledge through growth. Motivated by this, we delve into an expanding field of KG embedding in this paper, i.e., lifelong KG embedding. We consider knowledge transfer and retention of the learning on growing snapshots of a KG without having to learn embeddings from scratch. The proposed model includes a masked KG autoencoder for embedding learning and update, with an embedding transfer strategy to inject the learned knowledge into the new entity and relation embeddings, and an embedding regularization method to avoid catastrophic forgetting. To investigate the impacts of different aspects of KG growth, we construct four datasets to evaluate the performance of lifelong KG embedding. Experimental results show that the proposed model outperforms the state-of-the-art inductive and lifelong embedding baselines.Comment: Accepted in the 37th AAAI Conference on Artificial Intelligence (AAAI 2023

    Identification of Potential Prognostic Genes for Neuroblastoma

    Get PDF
    Background and Objective: Neuroblastoma (NB), the most common pediatric solid tumor apart from brain tumor, is associated with dismal long-term survival. The aim of this study was to identify a gene signature to predict the prognosis of NB patients.Materials and Methods: GSE49710 dataset from the Gene Expression Omnibus (GEO) database was downloaded and differentially expressed genes (DEGs) were analyzed using R package “limma” and SPSS software. The gene ontology (GO) and pathway enrichment analysis were established via DAVID database. Random forest (RF) and risk score model were used to pick out the gene signature in predicting the prognosis of NB patients. Simultaneously, the receiving operating characteristic (ROC) and Kaplan-Meier curve were plotted. GSE45480 and GSE16476 datasets were employed to validate the robustness of the gene signature.Results: A total of 131 DEGs were identified, which were mainly enriched in cancer-related pathways. Four genes (ERCC6L, AHCY, STK33, and NCAN) were selected as a gene signature, which was included in the top six important features in RF model, to predict the prognosis in NB patients, its area under the curve (AUC) could reach 0.86, and Cox regression analysis revealed that the 4-gene signature was an independent prognostic factor of overall survival and event-free survival. As well as in GSE16476. Additionally, the robustness of discriminating different groups of the 4-gene signature was verified to have a commendable performance in GSE45480 and GSE49710.Conclusion: The present study identified a gene-signature in predicting the prognosis in NB, which may provide novel prognostic markers, and some of the genes may be as treatment targets according to biological experiments in the future

    DMfold: A Novel Method to Predict RNA Secondary Structure With Pseudoknots Based on Deep Learning and Improved Base Pair Maximization Principle

    Get PDF
    While predicting the secondary structure of RNA is vital for researching its function, determining RNA secondary structure is challenging, especially for that with pseudoknots. Typically, several excellent computational methods can be utilized to predict the secondary structure (with or without pseudoknots), but they have their own merits and demerits. These methods can be classified into two categories: the multi-sequence method and the single-sequence method. The main advantage of the multi-sequence method lies in its use of the auxiliary sequences to assist in predicting the secondary structure, but it can only successfully predict in the presence of multiple highly homologous sequences. The single-sequence method is associated with the major merit of easy operation (only need the target sequence to predict secondary structure), but its folding parameters are the common features of diversity RNA, which cannot describe the unique characteristics of RNA, thus potentially resulting in the low prediction accuracy in some RNA. In this paper, “DMfold,” a method based on the Deep Learning and Improved Base Pair Maximization Principle, is proposed to predict the secondary structure with pseudoknots, which fully absorbs the advantages and avoids some disadvantages of those two methods. Notably, DMfold could predict the secondary structure of RNA by learning similar RNA in the known structures, which uses the similar RNA sequences instead of the highly homogeneous sequences in the multi-sequence method, thereby reducing the requirement for auxiliary sequences. In DMfold, it only needs to input the target sequence to predict the secondary structure. Its folding parameters are fully extracted automatically by deep learning, which could avoid the lack of folding parameters in the single-sequence method. Experiments show that our method is not only simple to operate, but also improves the prediction accuracy compared to multiple excellent prediction methods. A repository containing our code can be found at https://github.com/linyuwangPHD/RNA-Secondary-Structure-Database

    Efficient Iris Recognition Based on Optimal Subfeature Selection and Weighted Subregion Fusion

    Get PDF
    In this paper, we propose three discriminative feature selection strategies and weighted subregion matching method to improve the performance of iris recognition system. Firstly, we introduce the process of feature extraction and representation based on scale invariant feature transformation (SIFT) in detail. Secondly, three strategies are described, which are orientation probability distribution function (OPDF) based strategy to delete some redundant feature keypoints, magnitude probability distribution function (MPDF) based strategy to reduce dimensionality of feature element, and compounded strategy combined OPDF and MPDF to further select optimal subfeature. Thirdly, to make matching more effective, this paper proposes a novel matching method based on weighted sub-region matching fusion. Particle swarm optimization is utilized to accelerate achieve different sub-region’s weights and then weighted different subregions’ matching scores to generate the final decision. The experimental results, on three public and renowned iris databases (CASIA-V3 Interval, Lamp, andMMU-V1), demonstrate that our proposed methods outperform some of the existing methods in terms of correct recognition rate, equal error rate, and computation complexity

    Novel Approaches to Improve Iris Recognition System Performance Based on Local Quality Evaluation and Feature Fusion

    Get PDF
    For building a new iris template, this paper proposes a strategy to fuse different portions of iris based on machine learning method to evaluate local quality of iris. There are three novelties compared to previous work. Firstly, the normalized segmented iris is divided into multitracks and then each track is estimated individually to analyze the recognition accuracy rate (RAR). Secondly, six local quality evaluation parameters are adopted to analyze texture information of each track. Besides, particle swarm optimization (PSO) is employed to get the weights of these evaluation parameters and corresponding weighted coefficients of different tracks. Finally, all tracks’ information is fused according to the weights of different tracks. The experimental results based on subsets of three public and one private iris image databases demonstrate three contributions of this paper. (1) Our experimental results prove that partial iris image cannot completely replace the entire iris image for iris recognition system in several ways. (2) The proposed quality evaluation algorithm is a self-adaptive algorithm, and it can automatically optimize the parameters according to iris image samples’ own characteristics. (3) Our feature information fusion strategy can effectively improve the performance of iris recognition system

    Utilizing Selected Di- and Trinucleotides of siRNA to Predict RNAi Activity

    Get PDF
    Small interfering RNAs (siRNAs) induce posttranscriptional gene silencing in various organisms. siRNAs targeted to different positions of the same gene show different effectiveness; hence, predicting siRNA activity is a crucial step. In this paper, we developed and evaluated a powerful tool named “siRNApred” with a new mixed feature set to predict siRNA activity. To improve the prediction accuracy, we proposed 2-3NTs as our new features. A Random Forest siRNA activity prediction model was constructed using the feature set selected by our proposed Binary Search Feature Selection (BSFS) algorithm. Experimental data demonstrated that the binding site of the Argonaute protein correlates with siRNA activity. “siRNApred” is effective for selecting active siRNAs, and the prediction results demonstrate that our method can outperform other current siRNA activity prediction methods in terms of prediction accuracy

    Mining Magnaporthe oryzae sRNAs With Potential Transboundary Regulation of Rice Genes Associated With Growth and Defense Through Expression Profile Analysis of the Pathogen-Infected Rice

    Get PDF
    In recent years, studies have shown that phytopathogenic fungi possess the ability of cross-kingdom regulation of host plants through small RNAs (sRNAs). Magnaporthe oryzae, a causative agent of rice blast, introduces disease by penetrating the rice tissues through appressoria. However, little is known about the transboundary regulation of M. oryzae sRNAs during the interaction of the pathogen with its host rice. Therefore, investigation of the regulation of M. oryzae through sRNAs in the infected rice plants has important theoretical and practical significance for disease control and production improvement. Based on the high-throughput data of M. oryzae sRNAs and the mixed sRNAs during infection, the differential expressions of sRNAs in M. oryzae before and during infection were compared, it was found that expression levels of 366 M. oryzae sRNAs were upregulated significantly during infection. We trained a SVM model which can be used to predict differentially expressed sRNAs, which has reference significance for the prediction of differentially expressed sRNAs of M. oryzae homologous species, and can facilitate the research of M. oryzae in the future. Furthermore, fifty core targets were selected from the predicted target genes on rice for functional enrichment analysis, the analysis reveals that there are nine biological processes and one KEGG pathway associated with rice growth and disease defense. These functions correspond to thirteen rice genes. A total of fourteen M. oryzae sRNAs targeting the rice genes were identified by data analysis, and their authenticity was verified in the database of M. oryzae sRNAs. The 14 M. oryzae sRNAs may participate in the transboundary regulation process and act as sRNA effectors to manipulate the rice blast process

    The Impact of Ecological Construction Programs on Grassland Conservation in Inner Mongolia, China

    Get PDF
    A series of Ecological Construction Programs have been initiated to protect the condition of grasslands in China during recent decades. However, grassland degradation is still severe, and conditions have not been restored as intended. This paper aims to empirically examine the effectiveness of these programs for protecting the grassland condition in the extensive pastoral areas of China. We focus on one major program that has been implemented widely on the grasslands, the Subsidy and Incentive System for Grassland Conservation (SISGC). The normalized difference vegetation index, measured with remote sensing technology, is used to quantify the grassland condition between 2001 and 2014. With data from 54 counties in the pastoral areas of Inner Mongolia, we estimate the impact of SISGC on the grassland condition. A fixed effects model is employed to control for livestock production, climate, time trends, and time-invariant heterogeneity between counties. The model results provide quantitative evidence that the condition of the grasslands has improved significantly because of SISGC; but that the effectiveness of SISGC was offset to some extent by other socio-economic and climate factors, such as increased producer prices and high temperature. This may explain why the actual grassland degradation has not been prevented as effectively as was expected. In addition, the impact of SISGC was stronger in counties with worse initial grassland condition. Furthermore, the effects of producer prices and climate changes were also more pronounced in these counties

    A New Method of RNA Secondary Structure Prediction Based on Convolutional Neural Network and Dynamic Programming

    Get PDF
    In recent years, obtaining RNA secondary structure information has played an important role in RNA and gene function research. Although some RNA secondary structures can be gained experimentally, in most cases, efficient, and accurate computational methods are still needed to predict RNA secondary structure. Current RNA secondary structure prediction methods are mainly based on the minimum free energy algorithm, which finds the optimal folding state of RNA in vivo using an iterative method to meet the minimum energy or other constraints. However, due to the complexity of biotic environment, a true RNA structure always keeps the balance of biological potential energy status, rather than the optimal folding status that meets the minimum energy. For short sequence RNA its equilibrium energy status for the RNA folding organism is close to the minimum free energy status; therefore, the minimum free energy algorithm for predicting RNA secondary structure has higher accuracy. Nevertheless, in a longer sequence RNA, constant folding causes its biopotential energy balance to deviate far from the minimum free energy status. This deviation is because of its complex structure and results in a serious decline in the prediction accuracy of its secondary structure. In this paper, we propose a novel RNA secondary structure prediction algorithm using a convolutional neural network model combined with a dynamic programming method to improve the accuracy with large-scale RNA sequence and structure data. We analyze current experimental RNA sequences and structure data to construct a deep convolutional network model, and then we extract implicit features of an effective classification from large-scale data to predict the pairing probability of each base in an RNA sequence. For the obtained probabilities of RNA sequence base pairing, an enhanced dynamic programming method is applied to obtain the optimal RNA secondary structure. Results indicate that our proposed method is superior to the common RNA secondary structure prediction algorithms in predicting three benchmark RNA families. Based on the characteristics of deep learning algorithm, it can be inferred that the method proposed in this paper has a 30% higher prediction success rate when compared with other algorithms, which will be needed as the amount of real RNA structure data increases in the future

    Long Non-coding RNA LINC00941 as a Potential Biomarker Promotes the Proliferation and Metastasis of Gastric Cancer

    Get PDF
    Gastric cancer (GC) is a considerable global health burden. Accumulating evidence suggests that long non-coding RNAs (lncRNAs) are aberrantly expressed in many cancers and play important roles in GC. However, only a few lncRNAs have been functionally characterized. In this study, we identified that long intergenic non-protein coding RNA 941 (LINC00941) is a potential biomarker for diagnosis and prognosis from the cancer genome atlas (TCGA), and we found that the expression of LINC00941 is associated with tumor depth and distant metastasis in GC. Furthermore, functional enrichment analysis of LINC00941 co-expression network demonstrated that LINC00941 might be an essential regulator of tumor metastasis and cancer cell proliferation. To validate our findings, we utilized the loss-of-function analysis to reveal the biological function of LINC00941 in GC cells. Loss-of-function analysis revealed that silence of LINC00941 inhibits GC cells proliferation, migration, and invasion in vitro and modulates tumor growth in vivo. Our findings confirmed that LINC00941 plays an important oncogenic function in GC and may serve as a potential biomarker for diagnosis and prognosis of GC
    • …
    corecore