26,323 research outputs found
Identification of Interaction Patterns and Classification with Applications to Microarray Data
Emerging patterns represent a class of interaction structures which has been recently proposed as a tool in data mining. In this paper, a new and more general definition refering to underlying probabilities is proposed. The defined interaction patterns carry information about the relevance of combinations of variables for distinguishing between classes. Since they are formally quite similar to the leaves of a classification tree, we propose a fast and simple method which is based on the CART algorithm to find the corresponding empirical patterns in data sets. In simulations, it can be shown that the method is quite effective in identifying patterns. In addition, the detected patterns can be used to define new variables for classification. Thus, we propose a simple scheme to use the patterns to improve the performance of classification procedures. The method may also be seen as a scheme to improve the performance of CARTs concerning the identification of interaction patterns as well as the accuracy of prediction
Ontology-based knowledge representation of experiment metadata in biological data mining
According to the PubMed resource from the U.S. National Library of Medicine,
over 750,000 scientific articles have been published in the ~5000 biomedical journals
worldwide in the year 2007 alone. The vast majority of these publications include results from hypothesis-driven experimentation in overlapping biomedical research domains. Unfortunately, the sheer volume of information being generated by the biomedical research enterprise has made it virtually impossible for investigators to stay aware of the latest findings in their domain of interest, let alone to be able to assimilate and mine data from related investigations for purposes of meta-analysis. While computers have the potential for assisting investigators in the extraction, management and analysis of these data, information contained in the traditional journal publication is still largely unstructured, free-text descriptions of study design, experimental application and results interpretation, making it difficult for computers to gain access to the content of what is being conveyed without significant manual intervention. In order to circumvent these roadblocks and make the most of the output from the biomedical research enterprise, a variety of related standards in knowledge representation are being developed, proposed and adopted in the biomedical community. In this chapter, we will explore the current status of efforts to develop minimum information standards for the representation of a biomedical experiment, ontologies composed of shared vocabularies assembled into subsumption hierarchical structures, and extensible relational data models that link the information components together in a machine-readable and human-useable framework for data mining purposes
Rank discriminants for predicting phenotypes from RNA expression
Statistical methods for analyzing large-scale biomolecular data are
commonplace in computational biology. A notable example is phenotype prediction
from gene expression data, for instance, detecting human cancers,
differentiating subtypes and predicting clinical outcomes. Still, clinical
applications remain scarce. One reason is that the complexity of the decision
rules that emerge from standard statistical learning impedes biological
understanding, in particular, any mechanistic interpretation. Here we explore
decision rules for binary classification utilizing only the ordering of
expression among several genes; the basic building blocks are then two-gene
expression comparisons. The simplest example, just one comparison, is the TSP
classifier, which has appeared in a variety of cancer-related discovery
studies. Decision rules based on multiple comparisons can better accommodate
class heterogeneity, and thereby increase accuracy, and might provide a link
with biological mechanism. We consider a general framework ("rank-in-context")
for designing discriminant functions, including a data-driven selection of the
number and identity of the genes in the support ("context"). We then specialize
to two examples: voting among several pairs and comparing the median expression
in two groups of genes. Comprehensive experiments assess accuracy relative to
other, more complex, methods, and reinforce earlier observations that simple
classifiers are competitive.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS738 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Current advances in systems and integrative biology
Systems biology has gained a tremendous amount of interest in the last few years. This is partly due to the realization that traditional approaches focusing only on a few molecules at a time cannot describe the impact of aberrant or modulated molecular environments across a whole system. Furthermore, a hypothesis-driven study aims to prove or disprove its postulations, whereas a hypothesis-free systems approach can yield an unbiased and novel testable hypothesis as an end-result. This latter approach foregoes assumptions which predict how a biological system should react to an altered microenvironment within a cellular context, across a tissue or impacting on distant organs. Additionally, re-use of existing data by systematic data mining and re-stratification, one of the cornerstones of integrative systems biology, is also gaining attention. While tremendous efforts using a systems methodology have already yielded excellent results, it is apparent that a lack of suitable analytic tools and purpose-built databases poses a major bottleneck in applying a systematic workflow. This review addresses the current approaches used in systems analysis and obstacles often encountered in large-scale data analysis and integration which tend to go unnoticed, but have a direct impact on the final outcome of a systems approach. Its wide applicability, ranging from basic research, disease descriptors, pharmacological studies, to personalized medicine, makes this emerging approach well suited to address biological and medical questions where conventional methods are not ideal
Transcriptomic effects of the non-steroidal anti-inflammatory drug Ibuprofen in the marine bivalve Mytilus galloprovincialis Lam
The transcriptomic effects of Ibuprofen (IBU) in the digestive gland tissue of Mytilus galloprovincialis Lam. specimens exposed at low environmental concentrations (250 ng L-1) are presented. Using a 1.7 K feature cDNA microarray along with linear models and empirical Bayes statistical methods 225 differentially expressed genes were identified in mussels treated with IBU across a 15-day period. Transcriptional dynamics were typical of an adaptive response with a peak of gene expression change at day 7 (177 features, representing about 11% of sequences available for analysis) and an almost full recovery at the end of the exposure period. Functional genomics by means of Gene Ontology term analysis unraveled typical mussel stress responses i.e. aminoglycan (chitin) metabolic processes but also more specific effects such as the regulation of NF-kappa B transcription factor activity. (C) 2016 Elsevier Ltd. All rights reserved
Techniques for clustering gene expression data
Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered
The unseen world: environmental microbial sequencing and identification methods for ecologists
Archaea, bacteria, microeukaryotes, and the viruses that infect them (collectively āmicroorganismsā) are foundational components of all ecosystems, inhabiting almost every imaginable environment and comprising the majority of the planetās organismal and evolutionary diversity. Microorganisms play integral roles in ecosystem functioning; are important in the biogeochemical cycling of carbon (C), nitrogen (N), sulfur (S), phosphorus (P), and various metals (eg Barnard et al. 2005); and may be vital to ecosystem responses to large-scale climatic change (Mackelprang et al. 2011). Rarely found alone, microorganisms often form complex communities that are dynamic in space and time (Martiny et al. 2006). For these and other reasons, ecologists and environmental scientists have become increasingly interested in understanding microbial dynamics in ecosystems. Ecological studies of microbes in the environment generally focus on determining which organisms are present and what functional roles they are playing or could play. Rapid advances in molecular and bioinformatic approaches over the past decade have dramatically reduced the difficulty and cost of addressing such questions (Figure 1; WebTable 1). Yet the range of methodologies currently in use and the rapid pace of their ongoing development can be daunting for researchers unaccustomed to these technologies
- ā¦