4 research outputs found

    Pattern Recognition of Food Security in Indonesia Using Biclustering Plaid Model

    Get PDF
    Biclustering come in various algorithms, selecting the most suitable biclustering algorithm can be a challenging task. The performance of algorithms can vary significantly depending on the specific data characteristics. The Plaid model is one of popular biclustering algorithms, has gained recognition for its efficiency and versatility across various applications, including food security. Indonesia deals with complex food security challenges. The nation's unique geographic and socioeconomic diversity demands region-specific food security solutions. Identifying province-specific food security patterns is crucial for effective policymaking and resource allocation, ultimately promoting food sufficiency and stability at the regional level. This study assesses the performance of the Plaid model in identifying food security patterns at the provincial level in Indonesia. To optimize biclusters, we explore various parameter tuning scenarios (the choice of model, the number of layers, and the threshold value for row and column releases). The selection criteria are based on the change ratio of the initial matrix's mean square residue to the mean square residue of the Plaid model, the average mean square residue, and the number of biclusters. The constant column model was selected with a mean square residue change ratio of 0.52, an average mean square plaid model residue of 4.81, and it generates 6 overlapping biclusters. The results show each bicluster has unique characteristics. Notably, Bicluster 1 that consist of 2 provinces, exhibits the lowest food security levels, marked by variables X1, X2, X4, and X7. Furthermore, the variables X1, X4, and X7 consistently appear across several biclusters. This highlights the importance of prioritizing these three variables to improve the food security status of the regions.

    Identification of pathway and gene markers using enhanced directed random walk for multiclass cancer expression data

    Get PDF
    Cancer markers play a significant role in the diagnosis of the origin of cancers and in the detection of cancers from initial treatments. This is a challenging task owing to the heterogeneity nature of cancers. Identification of these markers could help in improving the survival rate of cancer patients, in which dedicated treatment can be provided according to the diagnosis or even prevention. Previous investigations show that the use of pathway topology information could help in the detection of cancer markers from gene expression. Such analysis reduces its complexity from thousands of genes to a few hundreds of pathways. However, most of the existing methods group different cancer subtypes into just disease samples, and consider all pathways contribute equally in the analysis process. Meanwhile, the interaction between multiple genes and the genes with missing edges has been ignored in several other methods, and hence could lead to the poor performance of the identification of cancer markers from gene expression. Thus, this research proposes enhanced directed random walk to identify pathway and gene markers for multiclass cancer gene expression data. Firstly, an improved pathway selection with analysis of variances (ANOVA) that enables the consideration of multiple cancer subtypes is performed, and subsequently the integration of k-mean clustering and average silhouette method in the directed random walk that considers the interaction of multiple genes is also conducted. The proposed methods are tested on benchmark gene expression datasets (breast, lung, and skin cancers) and biological pathways. The performance of the proposed methods is then measured and compared in terms of classification accuracy and area under the receiver operating characteristics curve (AUC). The results indicate that the proposed methods are able to identify a list of pathway and gene markers from the datasets with better classification accuracy and AUC. The proposed methods have improved the classification performance in the range of between 1% and 35% compared with existing methods. Cell cycle and p53 signaling pathway were found significantly associated with breast, lung, and skin cancers, while the cell cycle was highly enriched with squamous cell carcinoma and adenocarcinoma
    corecore