6,955 research outputs found

    Gene ranking and biomarker discovery under correlation

    Full text link
    Biomarker discovery and gene ranking is a standard task in genomic high throughput analysis. Typically, the ordering of markers is based on a stabilized variant of the t-score, such as the moderated t or the SAM statistic. However, these procedures ignore gene-gene correlations, which may have a profound impact on the gene orderings and on the power of the subsequent tests. We propose a simple procedure that adjusts gene-wise t-statistics to take account of correlations among genes. The resulting correlation-adjusted t-scores ("cat" scores) are derived from a predictive perspective, i.e. as a score for variable selection to discriminate group membership in two-class linear discriminant analysis. In the absence of correlation the cat score reduces to the standard t-score. Moreover, using the cat score it is straightforward to evaluate groups of features (i.e. gene sets). For computation of the cat score from small sample data we propose a shrinkage procedure. In a comparative study comprising six different synthetic and empirical correlation structures we show that the cat score improves estimation of gene orderings and leads to higher power for fixed true discovery rate, and vice versa. Finally, we also illustrate the cat score by analyzing metabolomic data. The shrinkage cat score is implemented in the R package "st" available from URL http://cran.r-project.org/web/packages/st/Comment: 18 pages, 5 figures, 1 tabl

    An Algorithm for Cellular Reprogramming

    Full text link
    The day we understand the time evolution of subcellular elements at a level of detail comparable to physical systems governed by Newton's laws of motion seems far away. Even so, quantitative approaches to cellular dynamics add to our understanding of cell biology, providing data-guided frameworks that allow us to develop better predictions about and methods for control over specific biological processes and system-wide cell behavior. In this paper we describe an approach to optimizing the use of transcription factors in the context of cellular reprogramming. We construct an approximate model for the natural evolution of a synchronized population of fibroblasts, based on data obtained by sampling the expression of some 22,083 genes at several times along the cell cycle. (These data are based on a colony of cells that have been cell cycle synchronized) In order to arrive at a model of moderate complexity, we cluster gene expression based on the division of the genome into topologically associating domains (TADs) and then model the dynamics of the expression levels of the TADs. Based on this dynamical model and known bioinformatics, we develop a methodology for identifying the transcription factors that are the most likely to be effective toward a specific cellular reprogramming task. The approach used is based on a device commonly used in optimal control. From this data-guided methodology, we identify a number of validated transcription factors used in reprogramming and/or natural differentiation. Our findings highlight the immense potential of dynamical models models, mathematics, and data guided methodologies for improving methods for control over biological processes

    Bayesian Model Selection in Complex Linear Systems, as Illustrated in Genetic Association Studies

    Full text link
    Motivated by examples from genetic association studies, this paper considers the model selection problem in a general complex linear model system and in a Bayesian framework. We discuss formulating model selection problems and incorporating context-dependent {\it a priori} information through different levels of prior specifications. We also derive analytic Bayes factors and their approximations to facilitate model selection and discuss their theoretical and computational properties. We demonstrate our Bayesian approach based on an implemented Markov Chain Monte Carlo (MCMC) algorithm in simulations and a real data application of mapping tissue-specific eQTLs. Our novel results on Bayes factors provide a general framework to perform efficient model comparisons in complex linear model systems

    A graph-based representation of Gene Expression profiles in DNA microarrays

    Get PDF
    This paper proposes a new and very flexible data model, called gene expression graph (GEG), for genes expression analysis and classification. Three features differentiate GEGs from other available microarray data representation structures: (i) the memory occupation of a GEG is independent of the number of samples used to built it; (ii) a GEG more clearly expresses relationships among expressed and non expressed genes in both healthy and diseased tissues experiments; (iii) GEGs allow to easily implement very efficient classifiers. The paper also presents a simple classifier for sample-based classification to show the flexibility and user-friendliness of the proposed data structur

    Kernel methods in genomics and computational biology

    Full text link
    Support vector machines and kernel methods are increasingly popular in genomics and computational biology, due to their good performance in real-world applications and strong modularity that makes them suitable to a wide range of problems, from the classification of tumors to the automatic annotation of proteins. Their ability to work in high dimension, to process non-vectorial data, and the natural framework they provide to integrate heterogeneous data are particularly relevant to various problems arising in computational biology. In this chapter we survey some of the most prominent applications published so far, highlighting the particular developments in kernel methods triggered by problems in biology, and mention a few promising research directions likely to expand in the future

    Applications of Biological Cell Models in Robotics

    Full text link
    In this paper I present some of the most representative biological models applied to robotics. In particular, this work represents a survey of some models inspired, or making use of concepts, by gene regulatory networks (GRNs): these networks describe the complex interactions that affect gene expression and, consequently, cell behaviour
    • …
    corecore