Search CORE

106 research outputs found

Microarray missing data imputation based on a set theoretic framework and biological knowledge

Author: Gan Xiangchao
Liew Alan Wee-Chung
Yan Hong
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

Gene expressions measured using microarrays usually suffer from the missing value problem. However, in many data analysis methods, a complete data matrix is required. Although existing missing value imputation algorithms have shown good performance to deal with missing values, they also have their limitations. For example, some algorithms have good performance only when strong local correlation exists in data while some provide the best estimate when data is dominated by global structure. In addition, these algorithms do not take into account any biological constraint in their imputation. In this paper, we propose a set theoretic framework based on projection onto convex sets (POCS) for missing data imputation. POCS allows us to incorporate different types of a priori knowledge about missing values into the estimation process. The main idea of POCS is to formulate every piece of prior knowledge into a corresponding convex set and then use a convergence-guaranteed iterative procedure to obtain a solution in the intersection of all these sets. In this work, we design several convex sets, taking into consideration the biological characteristic of the data: the first set mainly exploit the local correlation structure among genes in microarray data, while the second set captures the global correlation structure among arrays. The third set (actually a series of sets) exploits the biological phenomenon of synchronization loss in microarray experiments. In cyclic systems, synchronization loss is a common phenomenon and we construct a series of sets based on this phenomenon for our POCS imputation algorithm. Experiments show that our algorithm can achieve a significant reduction of error compared to the KNNimpute, SVDimpute and LSimpute methods

CiteSeerX

Crossref

PubMed Central

Discovering biclusters in gene expression data based on high-dimensional linear geometries

Author: Gan Xiangchao
Liew Alan Wee-Chung
Yan Hong
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Ensemble learning based on classifier prediction confidence and comprehensive learning particle swarm optimisation for medical image segmentation.

Author: Dang Truong
Liew Alan Wee-Chung
McCall John
Nguyen Tien Thanh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/12/2022
Field of study

Segmentation, a process of partitioning an image into multiple segments to locate objects and boundaries, is considered one of the most essential medical imaging process. In recent years, Deep Neural Networks (DNN) have achieved many notable successes in medical image analysis, including image segmentation. Due to the fact that medical imaging applications require robust, reliable results, it is necessary to devise effective DNN models for medical applications. One solution is to combine multiple DNN models in an ensemble system to obtain better results than using each single DNN model. Ensemble learning is a popular machine learning technique in which multiple models are combined to improve the final results and has been widely used in medical image analysis. In this paper, we propose to measure the confidence in the prediction of each model in the ensemble system and then use an associate threshold to determine whether the confidence is acceptable or not. A segmentation model is selected based on the comparison between the confidence and its associated threshold. The optimal threshold for each segmentation model is found by using Comprehensive Learning Particle Swarm Optimisation (CLPSO), a swarm intelligence algorithm. The Dice coefficient, a popular performance metric for image segmentation, is used as the fitness criteria. The experimental results on three medical image segmentation datasets confirm that our ensemble achieves better results compared to some well-known segmentation models

Open Access Institutional Repository at Robert Gordon University

Heterogeneous ensemble selection for evolving data streams.

Author: Liew Alan Wee-Chung
Luong Anh Vu
Nguyen Tien Thanh
Wang Shilin
Publication venue: 'Elsevier BV'
Publication date: 02/11/2020
Field of study

Ensemble learning has been widely applied to both batch data classification and streaming data classification. For the latter setting, most existing ensemble systems are homogenous, which means they are generated from only one type of learning model. In contrast, by combining several types of different learning models, a heterogeneous ensemble system can achieve greater diversity among its members, which helps to improve its performance. Although heterogeneous ensemble systems have achieved many successes in the batch classification setting, it is not trivial to extend them directly to the data stream setting. In this study, we propose a novel HEterogeneous Ensemble Selection (HEES) method, which dynamically selects an appropriate subset of base classifiers to predict data under the stream setting. We are inspired by the observation that a well-chosen subset of good base classifiers may outperform the whole ensemble system. Here, we define a good candidate as one that expresses not only high predictive performance but also high confidence in its prediction. Our selection process is thus divided into two sub-processes: accurate-candidate selection and confident-candidate selection. We define an accurate candidate in the stream context as a base classifier with high accuracy over the current concept, while a confident candidate as one with a confidence score higher than a certain threshold. In the first sub-process, we employ the prequential accuracy to estimate the performance of a base classifier at a specific time, while in the latter sub-process, we propose a new measure to quantify the predictive confidence and provide a method to learn the threshold incrementally. The final ensemble is formed by taking the intersection of the sets of confident classifiers and accurate classifiers. Experiments on a wide range of data streams show that the proposed method achieves competitive performance with lower running time in comparison to the state-of-the-art online ensemble methods

Open Access Institutional Repository at Robert Gordon University

Spectral estimation in unevenly sampled space of periodically expressed microarray time series data

Author: Liew Alan Wee-Chung
Smith David
Wu Shuanhu
Xian Jun
Yan Hong
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: Periodogram analysis of time-series is widespread in biology. A new challenge for analyzing the microarray time series data is to identify genes that are periodically expressed. Such challenge occurs due to the fact that the observed time series usually exhibit non-idealities, such as noise, short length, and unevenly sampled time points. Most methods used in the literature operate on evenly sampled time series and are not suitable for unevenly sampled time series. RESULTS: For evenly sampled data, methods based on the classical Fourier periodogram are often used to detect periodically expressed gene. Recently, the Lomb-Scargle algorithm has been applied to unevenly sampled gene expression data for spectral estimation. However, since the Lomb-Scargle method assumes that there is a single stationary sinusoid wave with infinite support, it introduces spurious periodic components in the periodogram for data with a finite length. In this paper, we propose a new spectral estimation algorithm for unevenly sampled gene expression data. The new method is based on signal reconstruction in a shift-invariant signal space, where a direct spectral estimation procedure is developed using the B-spline basis. Experiments on simulated noisy gene expression profiles show that our algorithm is superior to the Lomb-Scargle algorithm and the classical Fourier periodogram based method in detecting periodically expressed genes. We have applied our algorithm to the Plasmodium falciparum and Yeast gene expression data and the results show that the algorithm is able to detect biologically meaningful periodically expressed genes. CONCLUSION: We have proposed an effective method for identifying periodic genes in unevenly sampled space of microarray time series gene expression data. The method can also be used as an effective tool for gene expression time series interpolation or resampling

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Aggregation of classifiers: a justifiable information granularity approach.

Author: Liew Alan Wee-Chung
Nguyen Tien Thanh
Pedrycz Witold
Pham Xuan Cuong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/03/2017
Field of study

In this paper, we introduced a new approach of combining multiple classifiers in a heterogeneous ensemble system. Instead of using numerical membership values when combining, we constructed interval membership values for each class prediction from the meta-data of observation by using the concept of information granule. In the proposed method, the uncertainty (diversity) of the predictions produced by the base classifiers is quantified by the interval-based information granules. The decision model is then generated by considering both bound and length of the intervals. Extensive experimentation using the UCI datasets has demonstrated the superior performance of our algorithm over other algorithms including six fixed combining methods, one trainable combining method, AdaBoost, bagging, and random subspace

arXiv.org e-Print Archive

Crossref

Open Access Institutional Repository at Robert Gordon University

A weighted multiple classifier framework based on random projection.

Author: Bezdek James C.
Dang Manh Truong
Liew Alan Wee-Chung
Nguyen Tien Thanh
Publication venue: 'Elsevier BV'
Publication date: 26/03/2019
Field of study

In this paper, we propose a weighted multiple classifier framework based on random projections. Similar to the mechanism of other homogeneous ensemble methods, the base classifiers in our approach are obtained by a learning algorithm on different training sets generated by projecting the original up-space training set to lower dimensional down-spaces. We then apply a Least SquarE−based method to weigh the outputs of the base classifiers so that the contribution of each classifier to the final combined prediction is different. We choose Decision Tree as the learning algorithm in the proposed framework and conduct experiments on a number of real and synthetic datasets. The experimental results indicate that our framework is better than many of the benchmark algorithms, including three homogeneous ensemble methods (Bagging, RotBoost, and Random Subspace), several well-known algorithms (Decision Tree, Random Neural Network, Linear Discriminative Analysis, K Nearest Neighbor, L2-loss Linear Support Vector Machine, and Discriminative Restricted Boltzmann Machine), and random projection-based ensembles with fixed combining rules with regard to both classification error rates and F1 scores

Open Access Institutional Repository at Robert Gordon University