Search CORE

13 research outputs found

Using Biological Networks and Gene-Expression Profiles for the Analysis of Diseases

Author: LIM JUNLIANG KEVIN
Publication venue
Publication date: 28/11/2014
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Fuzzy-FishNET: a highly reproducible protein complex-based approach for feature selection in comparative proteomics

Author
Publication venue: BioMed Central
Publication date: 05/12/2016
Field of study

Springer - Publisher Connector

Avoid Oversimplifications in Machine Learning: Going beyond the Class-Prediction Accuracy.

Author: Goh WWB
Ho SY
Wong L
Publication venue: 'Elsevier BV'
Publication date: 18/12/2020
Field of study

Class-prediction accuracy provides a quick but superficial way of determining classifier performance. It does not inform on the reproducibility of the findings or whether the selected or constructed features used are meaningful and specific. Furthermore, the class-prediction accuracy oversummarizes and does not inform on how training and learning have been accomplished: two classifiers providing the same performance in one validation can disagree on many future validations. It does not provide explainability in its decision-making process and is not objective, as its value is also affected by class proportions in the validation set. Despite these issues, this does not mean we should omit the class-prediction accuracy. Instead, it needs to be enriched with accompanying evidence and tests that supplement and contextualize the reported accuracy. This additional evidence serves as augmentations and can help us perform machine learning better while avoiding naive reliance on oversimplified metrics

OPUS - University of Technology Sydney

Incorporating Pathway Information into Feature Selection Towards Better Performed Gene Signatures

Author: Tian Suyan
Wang Bing
Wang Chi
Publication venue: UKnowledge
Publication date: 03/04/2019
Field of study

To analyze gene expression data with sophisticated grouping structures and to extract hidden patterns from such data, feature selection is of critical importance. It is well known that genes do not function in isolation but rather work together within various metabolic, regulatory, and signaling pathways. If the biological knowledge contained within these pathways is taken into account, the resulting method is a pathway-based algorithm. Studies have demonstrated that a pathway-based method usually outperforms its gene-based counterpart in which no biological knowledge is considered. In this article, a pathway-based feature selection is firstly divided into three major categories, namely, pathway-level selection, bilevel selection, and pathway-guided gene selection. With bilevel selection methods being regarded as a special case of pathway-guided gene selection process, we discuss pathway-guided gene selection methods in detail and the importance of penalization in such methods. Last, we point out the potential utilizations of pathway-guided gene selection in one active research avenue, namely, to analyze longitudinal gene expression data. We believe this article provides valuable insights for computational biologists and biostatisticians so that they can make biology more computable

University of Kentucky

Weighted-SAMGSR: Combining Significance Analysis of Microarray-Gene Set Reduction Algorithm with Pathway Topology-Based Weights to Select Relevant Genes

Author: Chang Howard H.
Tian Suyan
Wang Chi
Publication venue: UKnowledge
Publication date: 12/05/2016
Field of study

Background: It has been demonstrated that a pathway-based feature selection method that incorporates biological information within pathways during the process of feature selection usually outperforms a gene-based feature selection algorithm in terms of predictive accuracy and stability. Significance analysis of microarray-gene set reduction algorithm (SAMGSR), an extension to a gene set analysis method with further reduction of the selected pathways to their respective core subsets, can be regarded as a pathway-based feature selection method. Methods: In SAMGSR, whether a gene is selected is mainly determined by its expression difference between the phenotypes, and partially by the number of pathways to which this gene belongs. It ignores the topology information among pathways. In this study, we propose a weighted version of the SAMGSR algorithm by constructing weights based on the connectivity among genes and then combing these weights with the test statistics. Results: Using both simulated and real-world data, we evaluate the performance of the proposed SAMGSR extension and demonstrate that the weighted version outperforms its original version. Conclusions: To conclude, the additional gene connectivity information does faciliatate feature selection

arXiv.org e-Print Archive

PubMed Central

University of Kentucky

GFS: fuzzy preprocessing for effective gene expression analysis

Author: Abha Belorkar
C Cheadle
D Soh
EJ Yeoh
J Luo
JM Raser
JN Haslett
JT Leek
K Lim
L Geistlinger
L Shi
Limsoon Wong
M Pescatori
ME Ross
PJ Rousseeuw
SA Armstrong
TR Golub
WWB Goh
WWB Goh
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Identification of Prognostic Genes and Gene Sets for Early-Stage Non-Small Cell Lung Cancer Using Bi-Level Selection Methods

Author: Chang Howard H.
Sun Jianguo
Tian Suyan
Wang Chi
Publication venue: UKnowledge
Publication date: 07/04/2017
Field of study

In contrast to feature selection and gene set analysis, bi-level selection is a process of selecting not only important gene sets but also important genes within those gene sets. Depending on the order of selections, a bi-level selection method can be classified into three categories – forward selection, which first selects relevant gene sets followed by the selection of relevant individual genes; backward selection which takes the reversed order; and simultaneous selection, which performs the two tasks simultaneously usually with the aids of a penalized regression model. To test the existence of subtype-specific prognostic genes for non-small cell lung cancer (NSCLC), we had previously proposed the Cox-filter method that examines the association between patients’ survival time after diagnosis with one specific gene, the disease subtypes, and their interaction terms. In this study, we further extend it to carry out forward and backward bi-level selection. Using simulations and a NSCLC application, we demonstrate that the forward selection outperforms the backward selection and other relevant algorithms in our setting. Both proposed methods are readily understandable and interpretable. Therefore, they represent useful tools for the researchers who are interested in exploring the prognostic value of gene expression data for specific subtypes or stages of a disease

PubMed Central

University of Kentucky

Computational Proteomics Using Network-Based Strategies

Author: Goh Wen
Publication venue: Computing, Imperial College London
Publication date: 01/03/2014
Field of study

This thesis examines the productive application of networks towards proteomics, with a specific biological focus on liver cancer. Contempory proteomics (shot- gun) is plagued by coverage and consistency issues. These can be resolved via network-based approaches. The application of 3 classes of network-based approaches are examined: A traditional cluster based approach termed Proteomics Expansion Pipeline), a generalization of PEP termed Maxlink and a feature-based approach termed Proteomics Signature Profiling. PEP is an improvement on prevailing cluster-based approaches. It uses a state- of-the-art cluster identification algorithm as well as network-cleaning approaches to identify the critical network regions indicated by the liver cancer data set. The top PARP1 associated-cluster was identified and independently validated. Maxlink allows identification of undetected proteins based on the number of links to identified differential proteins. It is more sensitive than PEP due to more relaxed requirements. Here, the novel roles of ARRB1/2 and ACTB are identified and discussed in the context of liver cancer. Both PEP and Maxlink are unable to deal with consistency issues, PSP is the first method able to deal with both, and is termed feature-based since the network- based clusters it uses are predicted independently of the data. It is also capable of using real complexes or predicted pathway subnets. By combining pathways and complexes, a novel basis of liver cancer progression implicating nucleotide pool imbalance aggravated by mutations of key DNA repair complexes was identified. Finally, comparative evaluations suggested that pure network-based methods are vastly outperformed by feature-based network methods utilizing real complexes. This is indicative that the quality of current networks are insufficient to provide strong biological rigor for data analysis, and should be carefully evaluated before further validations.Open Acces

Spiral - Imperial College Digital Repository