58 research outputs found

    Simple and Effective Visual Models for Gene Expression Cancer Diagnostics

    Get PDF
    In the paper we show that diagnostic classes in cancer gene expression data sets, which most often include thousands of features (genes), may be effectively separated with simple two-dimensional plots such as scatterplot and radviz graph. The principal innovation proposed in the paper is a method called VizRank, which is able to score and identify the best among possibly millions of candidate projections for visualizations. Compared to recently much applied techniques in the field of cancer genomics that include neural networks, support vector machines and various ensemble-based approaches, VizRank is fast and finds visualization models that can be easily examined and interpreted by domain experts. Our experiments on a number of gene expression data sets show that VizRank was always able to find data visualizations with a small number of (two to seven) genes and excellent class separation. In addition to providing grounds for gene expression cancer diagnosis, VizRank and its visualizations also identify small sets of relevant genes, uncover interesting gene interactions and point to outliers and potential misclassifications in cancer data sets

    VizRank: Data Visualization Guided by Machine Learning

    Get PDF
    Data visualization plays a crucial role in identifying interesting patterns in exploratory data analysis. Its use is, however, made difficult by the large number of possible data projections showing different attribute subsets that must be evaluated by the data analyst. In this paper, we introduce a method called VizRank, which is applied on classified data to automatically select the most useful data projections. VizRank can be used with any visualization method that maps attribute values to points in a two-dimensional visualization space. It assesses possible data projections and ranks them by their ability to visually discriminate between classes. The quality of class separation is estimated by computing the predictive accuracy of k-nearest neighbor classifier on the data set consisting of x and y positions of the projected data points and their class information. The paper introduces the method and presents experimental results which show that VizRank's ranking of projections highly agrees with subjective rankings by data analysts. The practical use of VizRank is also demonstrated by an application in the field of functional genomics

    Attribute Interactions in Medical Data Analysis

    Get PDF
    There is much empirical evidence about the success of naive Bayesian classification (NBC) in medical applications of attribute-based machine learning. NBC assumes conditional independence between attributes. In classification, such classifiers sum up the pieces of class-related evidence from individual attributes, independently of other attributes. The performance, however, deteriorates significantly when the “interactions” between attributes become critical. We propose an approach to handling attribute interactions within the framework of “voting” classifiers, such as NBC. We propose an operational test for detecting interactions in learning data and a procedure that takes the detected interactions into account while learning. This approach induces a structuring of the domain of attributes, it may lead to improved classifier’s performance and may provide useful novel information for the domain expert when interpreting the results of learning. We report on its application in data analysis and model construction for the prediction of clinical outcome in hip arthroplasty

    GenePath: a System for Automated Construction of Genetic Networks from Mutant Data

    Get PDF
    Motivation: Genetic pathways are often used in the analysis of biological phenomena. In classical genetics, they are constructed manually from experimental data on mutants. The field lacks formalism to guide such analysis, and accounting for all the data becomes complicated when large amounts of data are considered. Results: We have developed GenePath, an intelligent assistant that mimics expert geneticists in the analysis of genetic data. GenePath employs expert-defined patterns to uncover gene relations from the data, and uses these relations as constraints that guide the search for a plausible genetic network. GenePath provides formalism to genetic data analysis, facilitates the consideration of all the available data in a consistent and systematic manner, and aids in the examination of the large number of possible consequences of a planned experiment. It also provides an explanation mechanism that traces back every finding to the pertinent data. GenePath was successfully tested on several genetic problems. Availability: GenePath can be accessed at http://genepath.org. Supplementary information: Supplementary material is available at http://genepath.org/bi-supp

    Web-enabled knowledge-based analysis of genetic data

    Get PDF
    We present a web-based implementation of GenePath, an intelligent assistant tool for data analysis in functional genomics. GenePath considers mutant data and uses expert-defined patterns to find gene-to-gene or gene-to-outcome relations. It presents the results of analysis as genetic networks, wherein a set of genes has various influence on one another and on a biological outcome. In the paper, we particularly focus on its web-based interface and explanation mechanisms

    VizRank: Finding Informative Data Projections in Functional Genomics by Machine Learning

    Get PDF
    VizRank is a tool that finds interesting two-dimensional projections of class-labeled data. When applied to multi-dimensional functional genomics data sets, VizRank can systematically find relevant biological patterns

    Fossil Vertebrates and Paleomagnetism Update of One of the Earlier Stages of Cave Evolution in the Classical Karst, Slovenia: Pliocene of Črnotiče II Site and Račiška Pečina Cave

    Get PDF
    For the first time in the Classical Karst, paleontological data enabled to match the magnetostratigraphic record precisely with the geomagnetic polarity timescale in two studied sites: (i) a series of speleothems alternating with red clays in Račiška pečina Cave (Matarsko podolje), and (ii) an unroofed paleocave of the Črnotiče II site (Podgorski kras Plateau) completely filled by fluvial clastic sediments covered by speleothems. The later sites are also characterized by a rich appearance of fossil tubes of autochthonous stygobiont serpulid Marifugia cavatica. The vertebrate record is composed mostly of enamel fragments of rodents and soricomorphs. Absence of rootless arvicolids as well as taxonomic composition of the mammalian fauna suggests the Pliocene age of both sites. For (i) Račiška pečina (with Apodemus, cf. Borsodia) it was estimated to middle to late MN17 (ca 1.8–2.4 Ma), while (ii) the assemblage from Črnotiče II (with Deinsdorfia sp., Beremedia fissidens, Apodemus cf. atavus, Rhagapodemus cf. frequens, Glirulus sp., Cseria sp.) is obviously quite older: MN15–MN16 (ca 3.0–4.1 Ma). In respect to congruence of biostratigraphic and paleomagnetic data and a reliable sedimentary setting of the samples we propose to apply the respective datum also as the time of one ancient speleogenetic phase in the Classical Karst
    corecore