8 research outputs found

    Identification of pathway and gene markers using enhanced directed random walk for multiclass cancer expression data

    Get PDF
    Cancer markers play a significant role in the diagnosis of the origin of cancers and in the detection of cancers from initial treatments. This is a challenging task owing to the heterogeneity nature of cancers. Identification of these markers could help in improving the survival rate of cancer patients, in which dedicated treatment can be provided according to the diagnosis or even prevention. Previous investigations show that the use of pathway topology information could help in the detection of cancer markers from gene expression. Such analysis reduces its complexity from thousands of genes to a few hundreds of pathways. However, most of the existing methods group different cancer subtypes into just disease samples, and consider all pathways contribute equally in the analysis process. Meanwhile, the interaction between multiple genes and the genes with missing edges has been ignored in several other methods, and hence could lead to the poor performance of the identification of cancer markers from gene expression. Thus, this research proposes enhanced directed random walk to identify pathway and gene markers for multiclass cancer gene expression data. Firstly, an improved pathway selection with analysis of variances (ANOVA) that enables the consideration of multiple cancer subtypes is performed, and subsequently the integration of k-mean clustering and average silhouette method in the directed random walk that considers the interaction of multiple genes is also conducted. The proposed methods are tested on benchmark gene expression datasets (breast, lung, and skin cancers) and biological pathways. The performance of the proposed methods is then measured and compared in terms of classification accuracy and area under the receiver operating characteristics curve (AUC). The results indicate that the proposed methods are able to identify a list of pathway and gene markers from the datasets with better classification accuracy and AUC. The proposed methods have improved the classification performance in the range of between 1% and 35% compared with existing methods. Cell cycle and p53 signaling pathway were found significantly associated with breast, lung, and skin cancers, while the cell cycle was highly enriched with squamous cell carcinoma and adenocarcinoma

    Multi-stage feature selection in identifying potential biomarkers for cancer classification

    Get PDF
    Biomarkers are indicators that show the disease state or its progression of certain health conditions. Identification of biomarkers greatly raises the probability of earlier diagnosis and could be further applied in developing effective treatment for the disease. Besides conducting laboratory analysis, potential biomarkers also can be identified by analysing gene expression data through feature selection and machine learning. Many algorithms have been applied and introduced in this area, yet the challenge of high dimensionality of gene expression data remains and it could lead to the existence of noise that could negatively impact the analysis outcome. Therefore, this study aims to investigate and develop a better feature selection to identify potential biomarkers from gene expression data and construct a deep neural network classification model using these selected features. Thus, a multistage feature selection, namely CIR is proposed, that composed of Chi-square, Information Gain and Recursive Feature Elimination. The dataset used in this study consists of the integration of seven ovarian cancer gene expression datasets from GEO database. Both selected genes and classification model are evaluated through biological context verification and classification performance respectively. The proposed method shows improvements over the existing methods in terms of accuracy (+2.2294%), precision (+8.1415%), recall (+2.2294%), Fl-score (+4.5494%) and AUC scores (+0.2302). The proposed CIR method successfully identified eight genes that could be potential biomarkers for ovarian cancer, including WFDC2,S100A13, PRG4, NRCAM, OGN, B3GALT2, VGLL3, and GATM which are further verified through literature

    Comparative analysis of deep learning algorithm for cancer classification using multi-omics feature selection

    Get PDF
    Advancement of high-throughput technologies in omics studies had produced large amount of information that enables integrated analysis of complex diseases. Complex diseases such as cancer are often caused by a series of interactions that involve multiple biological mechanisms. Integration of multi-omics data allows more advanced analysis using features from various aspects of biology. However, analysing cancer multi-omics data on a large scale could be challenging due to the high dimensionality of the data. The recent development of advanced computational algorithms, especially deep learning, had sparked numerous efforts in applying these algorithms in multi-omics studies. This study aims to investigate how deep learning algorithms, namely stacked denoising autoencoder (SDAE) and variational autoencoder (VAE) can be used in cancer classification using multi-omics data. Moreover, this study also investigates the impact of feature selection in multi-omics analysis through the implementation of an embedded feature selection. The multi-omics data used in this study includes genomics, methylomics, transcriptomics and clinical data for a case study of lung squamous cell carcinoma. The classification performance has been compared and discussed in terms of the effectiveness of different models and the impact of feature selection. Results showed that VAE outperforms SDAE with 91.86% accuracy, 22.73% specificity and 0.21% Matthews Correlation Coefficient (MCC)

    Analysis of web design visual element attention based on user educational background

    No full text
    Abstract The evolution of Internet technology has led to an increase in online users. This study focuses on the pivotal role of visual elements in web content conveyance and their impact on user browsing behavior. Therefore, the use of visual elements in web design based on big data has aroused widespread concern among web designers, they apply visual elements to their web design works to make the web more attractive. This study examines the composition and distribution characteristics of key visual elements identified through user behavior data in a big data environment and discusses the use of visual elements in web design in the era of network economy. In addition, this paper issued 200 questionnaires to investigate the degree of attention to visual elements in web pages for users of different occupations and different educational backgrounds. Our survey indicated that visual elements captured the attention of 41% of corporate employees, whereas a mere 1% of social welfare workers focused on web content; 36% of undergraduates pay attention to visual elements of web pages, but only 5% and 4% of postgraduates and doctoral degrees and above. Therefore, the visual elements of the designed web page need to conform to the user's cultural background and professional background

    Improved support vector machine using multiple SVM-RFE for cancer classification

    Get PDF
    Support Vector Machine (SVM) is a machine learning method and widely used in the area of cancer studies especially in microarray data. A common problem related to the microarray data is that the size of genes is essentially larger than the number of samples. Although SVM is capable of handling a large number of genes, better accuracy of classification can be obtained using a small number of gene subset. This research proposed Multiple Support Vector Machine- Recursive Feature Elimination (MSVMRFE) as a gene selection to identify the small number of informative genes. This method is implemented in order to improve the performance of SVM during classification. The effectiveness of the proposed method has been tested on two different datasets of gene expression which are leukemia and lung cancer. In order to see the effectiveness of the proposed method, some methods such as Random Forest and C4.5 Decision Tree are compared in this paper. The result shows that this MSVM-RFE is effective in reducing the number of genes in both datasets thus providing a better accuracy for SVM in cancer classification

    A review of computational methods for clustering genes with similar biological functions

    Get PDF
    Clustering techniques can group genes based on similarity in biological functions. However, the drawback of using clustering techniques is the inability to identify an optimal number of potential clusters beforehand. Several existing optimization techniques can address the issue. Besides, clustering validation can predict the possible number of potential clusters and hence increase the chances of identifying biologically informative genes. This paper reviews and provides examples of existing methods for clustering genes, optimization of the objective function, and clustering validation. Clustering techniques can be categorized into partitioning, hierarchical, grid-based, and density-based techniques. We also highlight the advantages and the disadvantages of each category. To optimize the objective function, here we introduce the swarm intelligence technique and compare the performances of other methods. Moreover, we discuss the differences of measurements between internal and external criteria to validate a cluster quality. We also investigate the performance of several clustering techniques by applying them on a leukemia dataset. The results show that grid-based clustering techniques provide better classification accuracy; however, partitioning clustering techniques are superior in identifying prognostic markers of leukemia. Therefore, this review suggests combining clustering techniques such as CLIQUE and k-means to yield high-quality gene clusters
    corecore