26,323 research outputs found

    Identification of Interaction Patterns and Classification with Applications to Microarray Data

    Get PDF
    Emerging patterns represent a class of interaction structures which has been recently proposed as a tool in data mining. In this paper, a new and more general definition refering to underlying probabilities is proposed. The defined interaction patterns carry information about the relevance of combinations of variables for distinguishing between classes. Since they are formally quite similar to the leaves of a classification tree, we propose a fast and simple method which is based on the CART algorithm to find the corresponding empirical patterns in data sets. In simulations, it can be shown that the method is quite effective in identifying patterns. In addition, the detected patterns can be used to define new variables for classification. Thus, we propose a simple scheme to use the patterns to improve the performance of classification procedures. The method may also be seen as a scheme to improve the performance of CARTs concerning the identification of interaction patterns as well as the accuracy of prediction

    Ontology-based knowledge representation of experiment metadata in biological data mining

    Get PDF
    According to the PubMed resource from the U.S. National Library of Medicine, over 750,000 scientific articles have been published in the ~5000 biomedical journals worldwide in the year 2007 alone. The vast majority of these publications include results from hypothesis-driven experimentation in overlapping biomedical research domains. Unfortunately, the sheer volume of information being generated by the biomedical research enterprise has made it virtually impossible for investigators to stay aware of the latest findings in their domain of interest, let alone to be able to assimilate and mine data from related investigations for purposes of meta-analysis. While computers have the potential for assisting investigators in the extraction, management and analysis of these data, information contained in the traditional journal publication is still largely unstructured, free-text descriptions of study design, experimental application and results interpretation, making it difficult for computers to gain access to the content of what is being conveyed without significant manual intervention. In order to circumvent these roadblocks and make the most of the output from the biomedical research enterprise, a variety of related standards in knowledge representation are being developed, proposed and adopted in the biomedical community. In this chapter, we will explore the current status of efforts to develop minimum information standards for the representation of a biomedical experiment, ontologies composed of shared vocabularies assembled into subsumption hierarchical structures, and extensible relational data models that link the information components together in a machine-readable and human-useable framework for data mining purposes

    Rank discriminants for predicting phenotypes from RNA expression

    Get PDF
    Statistical methods for analyzing large-scale biomolecular data are commonplace in computational biology. A notable example is phenotype prediction from gene expression data, for instance, detecting human cancers, differentiating subtypes and predicting clinical outcomes. Still, clinical applications remain scarce. One reason is that the complexity of the decision rules that emerge from standard statistical learning impedes biological understanding, in particular, any mechanistic interpretation. Here we explore decision rules for binary classification utilizing only the ordering of expression among several genes; the basic building blocks are then two-gene expression comparisons. The simplest example, just one comparison, is the TSP classifier, which has appeared in a variety of cancer-related discovery studies. Decision rules based on multiple comparisons can better accommodate class heterogeneity, and thereby increase accuracy, and might provide a link with biological mechanism. We consider a general framework ("rank-in-context") for designing discriminant functions, including a data-driven selection of the number and identity of the genes in the support ("context"). We then specialize to two examples: voting among several pairs and comparing the median expression in two groups of genes. Comprehensive experiments assess accuracy relative to other, more complex, methods, and reinforce earlier observations that simple classifiers are competitive.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS738 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Current advances in systems and integrative biology

    Get PDF
    Systems biology has gained a tremendous amount of interest in the last few years. This is partly due to the realization that traditional approaches focusing only on a few molecules at a time cannot describe the impact of aberrant or modulated molecular environments across a whole system. Furthermore, a hypothesis-driven study aims to prove or disprove its postulations, whereas a hypothesis-free systems approach can yield an unbiased and novel testable hypothesis as an end-result. This latter approach foregoes assumptions which predict how a biological system should react to an altered microenvironment within a cellular context, across a tissue or impacting on distant organs. Additionally, re-use of existing data by systematic data mining and re-stratification, one of the cornerstones of integrative systems biology, is also gaining attention. While tremendous efforts using a systems methodology have already yielded excellent results, it is apparent that a lack of suitable analytic tools and purpose-built databases poses a major bottleneck in applying a systematic workflow. This review addresses the current approaches used in systems analysis and obstacles often encountered in large-scale data analysis and integration which tend to go unnoticed, but have a direct impact on the final outcome of a systems approach. Its wide applicability, ranging from basic research, disease descriptors, pharmacological studies, to personalized medicine, makes this emerging approach well suited to address biological and medical questions where conventional methods are not ideal

    Transcriptomic effects of the non-steroidal anti-inflammatory drug Ibuprofen in the marine bivalve Mytilus galloprovincialis Lam

    Get PDF
    The transcriptomic effects of Ibuprofen (IBU) in the digestive gland tissue of Mytilus galloprovincialis Lam. specimens exposed at low environmental concentrations (250 ng L-1) are presented. Using a 1.7 K feature cDNA microarray along with linear models and empirical Bayes statistical methods 225 differentially expressed genes were identified in mussels treated with IBU across a 15-day period. Transcriptional dynamics were typical of an adaptive response with a peak of gene expression change at day 7 (177 features, representing about 11% of sequences available for analysis) and an almost full recovery at the end of the exposure period. Functional genomics by means of Gene Ontology term analysis unraveled typical mussel stress responses i.e. aminoglycan (chitin) metabolic processes but also more specific effects such as the regulation of NF-kappa B transcription factor activity. (C) 2016 Elsevier Ltd. All rights reserved

    Techniques for clustering gene expression data

    Get PDF
    Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered

    The unseen world: environmental microbial sequencing and identification methods for ecologists

    Get PDF
    Archaea, bacteria, microeukaryotes, and the viruses that infect them (collectively ā€œmicroorganismsā€) are foundational components of all ecosystems, inhabiting almost every imaginable environment and comprising the majority of the planetā€™s organismal and evolutionary diversity. Microorganisms play integral roles in ecosystem functioning; are important in the biogeochemical cycling of carbon (C), nitrogen (N), sulfur (S), phosphorus (P), and various metals (eg Barnard et al. 2005); and may be vital to ecosystem responses to large-scale climatic change (Mackelprang et al. 2011). Rarely found alone, microorganisms often form complex communities that are dynamic in space and time (Martiny et al. 2006). For these and other reasons, ecologists and environmental scientists have become increasingly interested in understanding microbial dynamics in ecosystems. Ecological studies of microbes in the environment generally focus on determining which organisms are present and what functional roles they are playing or could play. Rapid advances in molecular and bioinformatic approaches over the past decade have dramatically reduced the difficulty and cost of addressing such questions (Figure 1; WebTable 1). Yet the range of methodologies currently in use and the rapid pace of their ongoing development can be daunting for researchers unaccustomed to these technologies
    • ā€¦
    corecore