27 research outputs found
Recommended from our members
Mining learning preferences in web-based instruction: Holists vs. Serialists
Web-based instruction programs are used by learners with diverse knowledge, skills and needs. These differences determine their preferences for the design of Web-based instruction programs and ultimately influence learners' success in using them. Cognitive style has been found to significantly affect learners' preferences of web-based instruction programs. However, the majority of previous studies focus on Field Dependence/Independence. Pask's Holist/Serialist dimension has conceptual links with Field Dependence/Independence but it is left mostly unstudied. Therefore, this study focuses on identifying how this dimension of cognitive style affects learner preferences of Web-based instruction programs. A data mining approach is used to illustrate the difference in preferences between Holists and Serialists. The findings show that there are clear differences in regard to content presentation and navigation support. A set of design features were then produced to help designers incorporate cognitive styles into the development of Web-based instruction programs to ensure that they can accommodate learners' different preferences.This work is partially funded by National Science Council, Taiwan, ROC (NSC 98-2511-S-008-012- MY3; NSC 99-
2511-S-008 -003 -MY2; NSC 99-2631-S-008-001)
Performance Evaluation of Exponential Discriminant Analysis with Feature Selection for Steganalysis
The performance of supervised learning-based seganalysis depends on the choice of both classifier and features which represent the image. Features extracted from images may contain irrelevant and redundant features which makes them inefficient for machine learning. Relevant features not only decrease the processing time to train a classifier but also provide better generalisation. Linear discriminant classifier which is commonly used for classification may not be able to classify in better way non-linearly separable data. Recently, exponential discriminant analysis, a variant of linear discriminant analysis (LDA), is proposed which transforms the scatter matrices to a new space by distance diffusion mapping. This provides exponential discriminant analysis (EDA) much more discriminant power to classify non-linearly separable data and helps in improving classification accuracy in comparison to LDA. In this paper, the performance of EDA in conjunction with feature selection methods has been investigated. For feature selection, Kullback divergence, Chernoff distance measures and linear regression measures are used to determine relevant features from higher-order statistics of images. The performance is evaluated in terms classification error and computation time. Experimental results show that exponential discriminate analysis in conjunction with linear regression significantly performs better in terms of both classification error and compilation time of training classifier.Defence Science Journal, 2012, 62(1), pp.19-24, DOI:http://dx.doi.org/10.14429/dsj.62.143
Feature selection for microarray gene expression data using simulated annealing guided by the multivariate joint entropy
In this work a new way to calculate the multivariate joint entropy is presented. This measure is the basis for a fast information-theoretic based evaluation of gene relevance in a Microarray Gene Expression data context. Its low complexity is based on the reuse of previous computations to calculate current feature relevance. The mu-TAFS algorithm --named as such to differentiate it from previous TAFS algorithms-- implements a simulated annealing technique specially designed for feature subset selection. The algorithm is applied to the maximization of gene subset relevance in several public-domain microarray data sets. The experimental results show a notoriously high classification performance and low size subsets formed by biologically meaningful genes.Postprint (published version
Clustering in Conjunction with Wrapper Approach to Select Discriminatory Genes for Microarray Dataset Classification
With the advent of microarray technology, it is possible to measure gene expression levels of thousands of genes simultaneously. This helps us diagnose and classify some particular cancers directly using DNA microarray. High-dimensionality and small sample size of microarray datasets has made the task of classification difficult. These datasets contain a large number of redundant and irrelevant genes. For efficient classification of samples there is a need of selecting a smaller set of relevant and non-redundant genes. In this paper, we have proposed a two stage algorithm for finding a set of discriminatory genes responsible for classification of high dimensional microarray datasets. In the first stage redundancy is reduced by grouping correlated genes into clusters and selecting a representative gene from each cluster. Maximal information compression index is used to measure similarity between genes. In the second stage a wrapper based forward feature selection method is used to obtain a set of discriminatory genes for a given classifier. We have investigated three different techniques for clustering and four classifiers in our experiments. The proposed algorithm is tested on six well known publicly available datasets. Comparison with the other state-of-the-art methods show that our proposed algorithm is able to achieve better classification accuracy with less number of genes
An incremental clustering of gene expression data
Abstract-This paper presents an incremental clustering algorithm based on DGC, a density-based algorithm we developed earlier [1]. We experimented with real-life datasets and both methods perform satisfactorily. The methods have been compared with some well-known clustering algorithms and they perform well in terms of z-score cluster validity measure
Metrik Öğrenmesi Kullanarak Çeşitli Kanser Dokularına Ait Mikro Dizi Gen Verilerinin Sınıflandırılması
Kanserli dokuların heterojen doğası gereği birçok kanserin alt türü vardır, ve bu alt türler tespit edilmedikçe kanser tedavisi hedefi bulamaz. Mikrodizi gen teknolojisi ve veri teknolojisinin gelişmesiyle beraber, son yıllarda kanserli dokulara ait mikro dizi gen ifadesi verilerini kullanarak makine öğrenmesi yardımıyla kanserlerin alt türünü tespit etmek yaygınlaşmıştır. Fakat burada asıl problem, veri setinde her bir gene bir özniteliğin karşılık gelmesi, bu yüzden yüksek boyut probleminin ortaya çıkmasıdır. Bu çalışmada üç farklı metrik öğrenmesi metodu (LMNN, ITML ve NCA) ayrı ayrı kullanılarak çeşitli kanser türlerine ait mikro dizi gen veri setleri boyutu azaltılmış uzaylara transfer edilmiştir. Bu sayede, PCA gibi klasik boyut azaltma yöntemlerinden farklı olarak boyutu azaltılmış uzayda, aynı sınıfa (kanser alt türüne) ait örnekleri birbirine yaklaştırılırken, farklı sınıflara ait örnekleri birbirinden uzaklaştırılmıştır. t-SNE metodu yardımıyla azaltılmış boyutlu uzaylar görüntülenerek sınıfların birbirinden ayrıştığı teyit edilmiştir. İlaveten, bu yeni uzaylarda sınıflama algoritmalarının daha performanslı çalıştığını göstermek amacıyla, k-NN, en yakın merkez ve LVQ gibi örnek temelli (instance-based) sınıflama algoritmaları çalıştırılmış ve bu algoritmaların kanser türlerini tespit etmede orjinal uzaydaki performanslarına göre yaklaşık %30'a kadar performanslarının arttığı gözlemlenmiştir
An Efficient High-Dimensional Gene Selection Approach based on Binary Horse Herd Optimization Algorithm for Biological Data Classification
The Horse Herd Optimization Algorithm (HOA) is a new meta-heuristic algorithm
based on the behaviors of horses at different ages. The HOA was introduced
recently to solve complex and high-dimensional problems. This paper proposes a
binary version of the Horse Herd Optimization Algorithm (BHOA) in order to
solve discrete problems and select prominent feature subsets. Moreover, this
study provides a novel hybrid feature selection framework based on the BHOA and
a minimum Redundancy Maximum Relevance (MRMR) filter method. This hybrid
feature selection, which is more computationally efficient, produces a
beneficial subset of relevant and informative features. Since feature selection
is a binary problem, we have applied a new Transfer Function (TF), called
X-shape TF, which transforms continuous problems into binary search spaces.
Furthermore, the Support Vector Machine (SVM) is utilized to examine the
efficiency of the proposed method on ten microarray datasets, namely Lymphoma,
Prostate, Brain-1, DLBCL, SRBCT, Leukemia, Ovarian, Colon, Lung, and MLL. In
comparison to other state-of-the-art, such as the Gray Wolf (GW), Particle
Swarm Optimization (PSO), and Genetic Algorithm (GA), the proposed hybrid
method (MRMR-BHOA) demonstrates superior performance in terms of accuracy and
minimum selected features. Also, experimental results prove that the X-Shaped
BHOA approach outperforms others methods