1 research outputs found

    Identifying disease-associated genes based on artificial intelligence

    Get PDF
    Identifying disease-gene associations can help improve the understanding of disease mechanisms, which has a variety of applications, such as early diagnosis and drug development. Although experimental techniques, such as linkage analysis, genome-wide association studies (GWAS), have identified a large number of associations, identifying disease genes is still challenging since experimental methods are usually time-consuming and expensive. To solve these issues, computational methods are proposed to predict disease-gene associations. Based on the characteristics of existing computational algorithms in the literature, we can roughly divide them into three categories: network-based methods, machine learning-based methods, and other methods. No matter what models are used to predict disease genes, the proper integration of multi-level biological data is the key to improving prediction accuracy. This thesis addresses some limitations of the existing computational algorithms, and integrates multi-level data via artificial intelligence techniques. The thesis starts with a comprehensive review of computational methods, databases, and evaluation methods used in predicting disease-gene associations, followed by one network-based method and four machine learning-based methods. The first chapter introduces the background information, objectives of the studies and structure of the thesis. After that, a comprehensive review is provided in the second chapter to discuss the existing algorithms as well as the databases and evaluation methods used in existing studies. Having the objectives and future directions, the thesis then presents five computational methods for predicting disease-gene associations. The first method proposed in Chapter 3 considers the issue of non-disease gene selection. A shortest path-based strategy is used to select reliable non-disease genes from a disease gene network and a differential network. The selected genes are then used by a network-energy model to improve its performance. The second method proposed in Chapter 4 constructs sample-based networks for case samples and uses them to predict disease genes. This strategy improves the quality of protein-protein interaction (PPI) networks, which further improves the prediction accuracy. Chapter 5 presents a generic model which applies multimodal deep belief nets (DBN) to fuse different types of data. Network embeddings extracted from PPI networks and gene ontology (GO) data are fused with the multimodal DBN to obtain cross-modality representations. Chapter 6 presents another deep learning model which uses a convolutional neural network (CNN) to integrate gene similarities with other types of data. Finally, the fifth method proposed in Chapter 7 is a nonnegative matrix factorization (NMF)-based method. This method maps diseases and genes onto a lower-dimensional manifold, and the geodesic distance between diseases and genes are used to predict their associations. The method can predict disease genes even if the disease under consideration has no known associated genes. In summary, this thesis has proposed several artificial intelligence-based computational algorithms to address the typical issues existing in computational algorithms. Experimental results have shown that the proposed methods can improve the accuracy of disease-gene prediction
    corecore