Early diagnosis methods in cancer diagnosis studies are making great challenge as they
require the involvement of different fields. Deoxyribonucleic acid (DNA) microarray
analysis is one of the modern cancer diagnosis techniques used by scientists to measure
the gene expression level changes in gene expression data. From the perspective of
computing, an algorithm is developed to ease the diagnosis process, but the feasibility
is not reliable. Numerous cancer studies have combined different machine learning
techniques for the cancer diagnosis to improve the accuracy of cancer classification.
This study is conducted to improve the accuracy of cancer classification by introducing
an improved directed random walk (DRW) framework. This improved DRW
framework is proposed to identify risk pathway while correctly predict the significant
genes. It is named as significant directed walk (SDW) because of its ability to identify
significant genes for cancer. In this study, six gene expression datasets are applied to
study the effectiveness of the sub-algorithm, directed graph and classifier in SDW in
terms of cancer prediction and cancer classification. Sub-algorithms of SDW can be
further divided into data pre-processing phase, specific tuning parameter selection,
weight as additional variable, and exclusion of unwanted adjacency matrix. Besides
that, SDW also incorporated four directed graphs to study the usability of the directed
graph. The best directed graph among the four is chosen to be part of the structure in
SDW. This directed graph is the combination between KEGG pathway and PPI
network and named as walker network. The experimental results showed that the
combination of SDW with walker network and linear regression is the best among all.
SDW achieves an accuracy of 95.03% in average which is higher by 8.97% compare
to conventional DRW for all cancer datasets. This study provides a foundation for
further studies and research on early diagnosis of cancer with machine learning
technique. It is found that these findings would improve the early diagnosis methods
of cancer classification