11,739 research outputs found
An Intelligent Decision Support System for Leukaemia Diagnosis using Microscopic Blood Images
This research proposes an intelligent decision support system for acute lymphoblastic leukaemia
diagnosis from microscopic blood images. A novel clustering algorithm with stimulating discriminant
measures (SDM) of both within- and between-cluster scatter variances is proposed to produce robust
segmentation of nucleus and cytoplasm of lymphocytes/lymphoblasts. Specifically, the proposed
between-cluster evaluation is formulated based on the trade-off of several between-cluster measures
of well-known feature extraction methods. The SDM measures are used in conjuction with Genetic
Algorithm for clustering nucleus, cytoplasm, and background regions. Subsequently, a total of eighty
features consisting of shape, texture, and colour information of the nucleus and cytoplasm subimages
are extracted. A number of classifiers (multi-layer perceptron, Support Vector Machine (SVM)
and Dempster-Shafer ensemble) are employed for lymphocyte/lymphoblast classification. Evaluated
with the ALL-IDB2 database, the proposed SDM-based clustering overcomes the shortcomings of
Fuzzy C-means which focuses purely on within-cluster scatter variance. It also outperforms Linear
Discriminant Analysis and Fuzzy Compactness and Separation for nucleus-cytoplasm separation.
The overall system achieves superior recognition rates of 96.72% and 96.67% accuracies using
bootstrapping and 10-fold cross validation with Dempster-Shafer and SVM, respectively. The results
also compare favourably with those reported in the literature, indicating the usefulness of the
proposed SDM-based clustering method
Transcription Factor-DNA Binding Via Machine Learning Ensembles
We present ensemble methods in a machine learning (ML) framework combining
predictions from five known motif/binding site exploration algorithms. For a
given TF the ensemble starts with position weight matrices (PWM's) for the
motif, collected from the component algorithms. Using dimension reduction, we
identify significant PWM-based subspaces for analysis. Within each subspace a
machine classifier is built for identifying the TF's gene (promoter) targets
(Problem 1). These PWM-based subspaces form an ML-based sequence analysis tool.
Problem 2 (finding binding motifs) is solved by agglomerating k-mer (string)
feature PWM-based subspaces that stand out in identifying gene targets. We
approach Problem 3 (binding sites) with a novel machine learning approach that
uses promoter string features and ML importance scores in a classification
algorithm locating binding sites across the genome. For target gene
identification this method improves performance (measured by the F1 score) by
about 10 percentage points over the (a) motif scanning method and (b) the
coexpression-based association method. Top motif outperformed 5 component
algorithms as well as two other common algorithms (BEST and DEME). For
identifying individual binding sites on a benchmark cross species database
(Tompa et al., 2005) we match the best performer without much human
intervention. It also improved the performance on mammalian TFs.
The ensemble can integrate orthogonal information from different weak
learners (potentially using entirely different types of features) into a
machine learner that can perform consistently better for more TFs. The TF gene
target identification component (problem 1 above) is useful in constructing a
transcriptional regulatory network from known TF-target associations. The
ensemble is easily extendable to include more tools as well as future PWM-based
information.Comment: 33 page
- …