81 research outputs found

    Rule Acquisition for Cognitive Agents by Using Estimation of Distribution Algorithms

    Get PDF
    Cognitive Agents must be able to decide their actions based on their recognized states. In general, learning mechanisms are equipped for such agents in order to realize intellgent behaviors. In this paper, we propose a new Estimation of Distribution Algorithms (EDAs) which can acquire effective rules for cognitive agents. Basic calculation procedure of the EDAs is that 1) select better individuals, 2) estimate probabilistic models, and 3) sample new individuals. In the proposed method, instead of the use of individuals, input-output records in episodes are directory used for estimating the probabilistic model by Conditional Random Fields. Therefore, estimated probabilistic model can be regarded as policy so that new input-output records are generated by the interaction between the policy and environments. Computer simulations on Probabilistic Transition Problems show the effectiveness of the proposed method

    Feature Selection for Predicting Tumor Metastases in Microarray Experiments using Paired Design

    Get PDF
    Among the major issues in gene expression profile classification, feature selection is an important and necessary step in achieving and creating good classification rules given the high dimensionality of microarray data. Although different feature selection methods have been reported, there has been no method specifically proposed for paired microarray experiments. In this paper, we introduce a simple procedure based on a modified t-statistic for feature selection to microarray experiments using the popular matched case-control design and apply to our recent study on tumor metastasis in a low-malignant group of breast cancer patients for selecting genes that best predict metastases. Gene or feature selection is optimized by thresholding in a leaving one-pair out cross-validation. Model comparison through empirical application has shown that our method manifests improved efficiency with high sensitivity and specificity

    FiGS: a filter-based gene selection workbench for microarray data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The selection of genes that discriminate disease classes from microarray data is widely used for the identification of diagnostic biomarkers. Although various gene selection methods are currently available and some of them have shown excellent performance, no single method can retain the best performance for all types of microarray datasets. It is desirable to use a comparative approach to find the best gene selection result after rigorous test of different methodological strategies for a given microarray dataset.</p> <p>Results</p> <p>FiGS is a web-based workbench that automatically compares various gene selection procedures and provides the optimal gene selection result for an input microarray dataset. FiGS builds up diverse gene selection procedures by aligning different feature selection techniques and classifiers. In addition to the highly reputed techniques, FiGS diversifies the gene selection procedures by incorporating gene clustering options in the feature selection step and different data pre-processing options in classifier training step. All candidate gene selection procedures are evaluated by the .632+ bootstrap errors and listed with their classification accuracies and selected gene sets. FiGS runs on parallelized computing nodes that capacitate heavy computations. FiGS is freely accessible at <url>http://gexp.kaist.ac.kr/figs</url>.</p> <p>Conclusion</p> <p>FiGS is an web-based application that automates an extensive search for the optimized gene selection analysis for a microarray dataset in a parallel computing environment. FiGS will provide both an efficient and comprehensive means of acquiring optimal gene sets that discriminate disease states from microarray datasets.</p

    Discriminative Gene Selection Employing Linear Regression Model

    Get PDF
    Microarray datasets enables the analysis of expression of thousands of genes across hundreds of samples. Usually classifiers do not perform well for large number of features (genes) as is the case of microarray datasets. That is why a small number of informative and discriminative features are always desirable for efficient classification. Many existing feature selection approaches have been proposed which attempts sample classification based on the analysis of gene expression values. In this paper a linear regression based feature selection algorithm for two class microarray datasets has been developed which divides the training dataset into two subtypes based on the class information. Using one of the classes as the base condition, a linear regression based model is developed. Using this regression model the divergence of each gene across the two classes are calculated and thus genes with higher divergence values are selected as important features from the second subtype of the training data. The classification performance of the proposed approach is evaluated with SVM, Random Forest and AdaBoost classifiers. Results show that the proposed approach provides better accuracy values compared to other existing approaches i.e. ReliefF, CFS, decision tree based attribute selector and attribute selection using correlation analysis

    Supervised Clustering of Genes for Multi-Class Phenotype Classification

    Get PDF
    The paper presents the new approach to the supervised gene selection by means of gene clustering for the microarray data, which belong to more than two phenotypes (classes). The main distinction from the previous approaches, that are based on the splitting the multi-class task into several binary ones, is the application of the HUM (hypervolume under the manifold) score, that guides the search for the most discriminative gene clusters that simultaneously differentiate all the classes. The results of comparative analysis with other methods shows the advantages of our approach both in classification rate of the new samples and in the lower number of gene clusters. The application of our approach to a randomly permuted data shows that the identified structure is more than just a noise artifact

    Identification of disease-causing genes using microarray data mining and gene ontology

    Get PDF
    Background: One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. Methods: We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. Results: The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. Conclusions: The proposed method addresses the weakness of conventional methods by adding a redundancy reduction stage and utilizing Gene Ontology information. It predicts marker genes for colon, DLBCL and prostate cancer with a high accuracy. The predictions made in this study can serve as a list of candidates for subsequent wet-lab verification and might help in the search for a cure for cancers

    Utility of Adaptive Strategy and Adaptive Design for Biomarker-facilitated Patient Selection in Pharmacogenomic or Pharmacogenetic Clinical Development Program

    Get PDF
    In the early to late phases of conventional clinical trials, improvement of disease status at study baseline is the anchor of an effective treatment measured by therapeutic response. These population-based clinical trials do not formally account for disease-associated marker genotype or genome-associated therapeutic response. We discuss alternative study designs in pharmacogenomic or pharmacogenetic clinical trials for genomic or genetic biomarker development, and for formally assessing the clinical utility of genomic or genetic (composite) biomarkers. A two-stage adaptive strategy from completed, ongoing or prospectively planned pharmacogenomic or pharmacogenetic clinical trials is described for development of a genomic or genetic biomarker. We present two types of adaptive design: (1) the genomic biomarker is developed external to the clinical trial, which is designed for treatment effect inference; and (2) first-stage data are used to explore a genomic biomarker, but statistical inference of treatment effect in the genomically or genetically defined biomarker subset is only performed at the second stage of the same trial. When the null hypothesis of no treatment effect in all randomized patients and the genomic patient subset are prospectively specified, we compare the statistical power between fixed and adaptive designs. We also compare the two types of adaptive design. Results from simulation studies showed that adaptive design is more powerful than fixed design for those genomic or genetic biomarkers whose clinical utility is predictive of treatment effect. Pursuit of adaptive design gains at least 20% to more than 30% genomic patient subset power when the genomic biomarker status is readily usable at study initiation, in comparison to when it is explored using the first-stage data of the same clinical trial. In exploratory studies, adaptive strategy provides wide flexibility in the process of genomic or genetic biomarker development. In contrast, an adaptive design trial that employs limited flexibility, and is an adequate and well-controlled investigation, has a greater power gain than a fixed design trial, in which the genomic biomarker is capable of predicting treatment effects that pertain only to the prespecified genomic or genetic patient subset

    Classification and biomarker identification using gene network modules and support vector machines

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Classification using microarray datasets is usually based on a small number of samples for which tens of thousands of gene expression measurements have been obtained. The selection of the genes most significant to the classification problem is a challenging issue in high dimension data analysis and interpretation. A previous study with SVM-RCE (Recursive Cluster Elimination), suggested that classification based on groups of correlated genes sometimes exhibits better performance than classification using single genes. Large databases of gene interaction networks provide an important resource for the analysis of genetic phenomena and for classification studies using interacting genes.</p> <p>We now demonstrate that an algorithm which integrates network information with recursive feature elimination based on SVM exhibits good performance and improves the biological interpretability of the results. We refer to the method as SVM with Recursive Network Elimination (SVM-RNE)</p> <p>Results</p> <p>Initially, one thousand genes selected by t-test from a training set are filtered so that only genes that map to a gene network database remain. The Gene Expression Network Analysis Tool (GXNA) is applied to the remaining genes to form <it>n </it>clusters of genes that are highly connected in the network. Linear SVM is used to classify the samples using these clusters, and a weight is assigned to each cluster based on its importance to the classification. The least informative clusters are removed while retaining the remainder for the next classification step. This process is repeated until an optimal classification is obtained.</p> <p>Conclusion</p> <p>More than 90% accuracy can be obtained in classification of selected microarray datasets by integrating the interaction network information with the gene expression information from the microarrays.</p> <p>The Matlab version of SVM-RNE can be downloaded from <url>http://web.macam.ac.il/~myousef</url></p
    • ā€¦
    corecore