Search CORE

3,116 research outputs found

Automated Discovery of Functional Generality of Human Gene Expression Programs

Author: Arend Sidow
David K Gifford
Georg K Gerber
GO
Robin D Dowell
Tommi S Jaakkola
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

An important research problem in computational biology is the identification of expression programs, sets of co-expressed genes orchestrating normal or pathological processes, and the characterization of the functional breadth of these programs. The use of human expression data compendia for discovery of such programs presents several challenges including cellular inhomogeneity within samples, genetic and environmental variation across samples, uncertainty in the numbers of programs and sample populations, and temporal behavior. We developed GeneProgram, a new unsupervised computational framework based on Hierarchical Dirichlet Processes that addresses each of the above challenges. GeneProgram uses expression data to simultaneously organize tissues into groups and genes into overlapping programs with consistent temporal behavior, to produce maps of expression programs, which are sorted by generality scores that exploit the automatically learned groupings. Using synthetic and real gene expression data, we showed that GeneProgram outperformed several popular expression analysis methods. We applied GeneProgram to a compendium of 62 short time-series gene expression datasets exploring the responses of human cells to infectious agents and immune-modulating molecules. GeneProgram produced a map of 104 expression programs, a substantial number of which were significantly enriched for genes involved in key signaling pathways and/or bound by NF-κB transcription factors in genome-wide experiments. Further, GeneProgram discovered expression programs that appear to implicate surprising signaling pathways or receptor types in the response to infection, including Wnt signaling and neurotransmitter receptors. We believe the discovered map of expression programs involved in the response to infection will be useful for guiding future biological experiments; genes from programs with low generality scores might serve as new drug targets that exhibit minimal “cross-talk,” and genes from high generality programs may maintain common physiological responses that go awry in disease states. Further, our method is multipurpose, and can be applied readily to novel compendia of biological data

CiteSeerX

Public Library of Science (PLOS)

Crossref

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

The computer revolution in science: steps towards the realization of computer-supported discovery environments

Author: Jong Hidde de
Rip Arie
Publication venue: Elsevier
Publication date: 01/01/1997
Field of study

The tools that scientists use in their search processes together form so-called discovery environments. The promise of artificial intelligence and other branches of computer science is to radically transform conventional discovery environments by equipping scientists with a range of powerful computer tools including large-scale, shared knowledge bases and discovery programs. We will describe the future computer-supported discovery environments that may result, and illustrate by means of a realistic scenario how scientists come to new discoveries in these environments. In order to make the step from the current generation of discovery tools to computer-supported discovery environments like the one presented in the scenario, developers should realize that such environments are large-scale sociotechnical systems. They should not just focus on isolated computer programs, but also pay attention to the question how these programs will be used and maintained by scientists in research practices. In order to help developers of discovery programs in achieving the integration of their tools in discovery environments, we will formulate a set of guidelines that developers could follow

Elsevier - Publisher Connector

University of Twente Research Information

Hierarchical Dirichlet Process-Based Models For Discovery of Cross-species Mammalian Gene Expression

Author: Dowell Robin D.
Gerber Georg K.
Gifford David K.
Jaakkola Tommi S.
Publication venue
Publication date: 06/07/2007
Field of study

An important research problem in computational biology is theidentification of expression programs, sets of co-activatedgenes orchestrating physiological processes, and thecharacterization of the functional breadth of these programs. Theuse of mammalian expression data compendia for discovery of suchprograms presents several challenges, including: 1) cellularinhomogeneity within samples, 2) genetic and environmental variationacross samples, and 3) uncertainty in the numbers of programs andsample populations. We developed GeneProgram, a new unsupervisedcomputational framework that uses expression data to simultaneouslyorganize genes into overlapping programs and tissues into groups toproduce maps of inter-species expression programs, which are sortedby generality scores that exploit the automatically learnedgroupings. Our method addresses each of the above challenges byusing a probabilistic model that: 1) allocates mRNA to differentexpression programs that may be shared across tissues, 2) ishierarchical, treating each tissue as a sample from a population ofrelated tissues, and 3) uses Dirichlet Processes, a non-parametricBayesian method that provides prior distributions over numbers ofsets while penalizing model complexity. Using real gene expressiondata, we show that GeneProgram outperforms several popularexpression analysis methods in recovering biologically interpretablegene sets. From a large compendium of mouse and human expressiondata, GeneProgram discovers 19 tissue groups and 100 expressionprograms active in mammalian tissues. Our method automaticallyconstructs a comprehensive, body-wide map of expression programs andcharacterizes their functional generality. This map can be used forguiding future biological experiments, such as discovery of genesfor new drug targets that exhibit minimal "cross-talk" withunintended organs, or genes that maintain general physiologicalresponses that go awry in disease states. Further, our method isgeneral, and can be applied readily to novel compendia of biologicaldata

DSpace@MIT

Discovering transcriptional modules by Bayesian data integration

Author: Antoniak
Bar-Joseph
Bernard J. de la Cruz
Bähler
Cho
Dahl
Datta
David L. Wild
Eisen
Falcon
Ferguson
Fritsch
Gasch
Gerber
Geweke
Harbison
Ideker
Ihmels
Jim E. Griffin
Kundaje
Lee
Liu
Liu
Medvedovic
Medvedovic
Qin
Rasmussen
Rasmussen
Reid
Richard S. Savage
Savage
Segal
Segal
Teh
Teh
Wild
Yao
Yeung
Zoubin Ghahramani
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2010
Field of study

Motivation: We present a method for directly inferring transcriptional modules (TMs) by integrating gene expression and transcription factor binding (ChIP-chip) data. Our model extends a hierarchical Dirichlet process mixture model to allow data fusion on a gene-by-gene basis. This encodes the intuition that co-expression and co-regulation are not necessarily equivalent and hence we do not expect all genes to group similarly in both datasets. In particular, it allows us to identify the subset of genes that share the same structure of transcriptional modules in both datasets. Results: We find that by working on a gene-by-gene basis, our model is able to extract clusters with greater functional coherence than existing methods. By combining gene expression and transcription factor binding (ChIP-chip) data in this way, we are better able to determine the groups of genes that are most likely to represent underlying TMs

Crossref

PubMed Central

Warwick Research Archives Portal Repository

Kent Academic Repository

CUED - Cambridge University Engineering Department

What Can Artificial Intelligence Do for Scientific Realism?

Author: Spelda Petr
Stritecky Vit
Publication venue
Publication date: 01/01/2020
Field of study

The paper proposes a synthesis between human scientists and artificial representation learning models as a way of augmenting epistemic warrants of realist theories against various anti-realist attempts. Towards this end, the paper fleshes out unconceived alternatives not as a critique of scientific realism but rather a reinforcement, as it rejects the retrospective interpretations of scientific progress, which brought about the problem of alternatives in the first place. By utilising adversarial machine learning, the synthesis explores possibility spaces of available evidence for unconceived alternatives providing modal knowledge of what is possible therein. As a result, the epistemic warrant of synthesised realist theories should emerge bolstered as the underdetermination by available evidence gets reduced. While shifting the realist commitment away from theoretical artefacts towards modalities of the possibility spaces, the synthesis comes out as a kind of perspectival modelling

PhilPapers

Computational discovery of gene modules, regulatory networks and expression programs

Author: Gerber Georg Kurt, 1970-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2007
Field of study

Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2007.Includes bibliographical references (p. 163-181).High-throughput molecular data are revolutionizing biology by providing massive amounts of information about gene expression and regulation. Such information is applicable both to furthering our understanding of fundamental biology and to developing new diagnostic and treatment approaches for diseases. However, novel mathematical methods are needed for extracting biological knowledge from high-dimensional, complex and noisy data sources. In this thesis, I develop and apply three novel computational approaches for this task. The common theme of these approaches is that they seek to discover meaningful groups of genes, which confer robustness to noise and compress complex information into interpretable models. I first present the GRAM algorithm, which fuses information from genome-wide expression and in vivo transcription factor-DNA binding data to discover regulatory networks of gene modules. I use the GRAM algorithm to discover regulatory networks in Saccharomyces cerevisiae, including rich media, rapamycin, and cell-cycle module networks. I use functional annotation databases, independent biological experiments and DNA-motif information to validate the discovered networks, and to show that they yield new biological insights. Second, I present GeneProgram, a framework based on Hierarchical Dirichlet Processes, which uses large compendia of mammalian expression data to simultaneously organize genes into overlapping programs and tissues into groups to produce maps of expression programs. I demonstrate that GeneProgram outperforms several popular analysis methods, and using mouse and human expression data, show that it automatically constructs a comprehensive, body-wide map of inter-species expression programs.(cont.) Finally, I present an extension of GeneProgram that models temporal dynamics. I apply the algorithm to a compendium of short time-series gene expression experiments in which human cells were exposed to various infectious agents. I show that discovered expression programs exhibit temporal pattern usage differences corresponding to classes of host cells and infectious agents, and describe several programs that implicate surprising signaling pathways and receptor types in human responses to infection.by Georg Kurt Gerber.Ph.D

CiteSeerX

DSpace@MIT

Recommended from our members

Inferring Dynamic Signatures of Microbes in Complex Host Ecosystems

Author: Bry Lynn
Gerber Georg Kurt
Onderdonk Andrew Bruce
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 02/08/2012
Field of study

The human gut microbiota comprise a complex and dynamic ecosystem that profoundly affects host development and physiology. Standard approaches for analyzing time-series data of the microbiota involve computation of measures of ecological community diversity at each time-point, or measures of dissimilarity between pairs of time-points. Although these approaches, which treat data as static snapshots of microbial communities, can identify shifts in overall community structure, they fail to capture the dynamic properties of individual members of the microbiota and their contributions to the underlying time-varying behavior of host ecosystems. To address the limitations of current methods, we present a computational framework that uses continuous-time dynamical models coupled with Bayesian dimensionality adaptation methods to identify time-dependent signatures of individual microbial taxa within a host as well as across multiple hosts. We apply our framework to a publicly available dataset of 16S rRNA gene sequences from stool samples collected over ten months from multiple human subjects, each of whom received repeated courses of oral antibiotics. Using new diversity measures enabled by our framework, we discover groups of both phylogenetically close and distant bacterial taxa that exhibit consensus responses to antibiotic exposure across multiple human subjects. These consensus responses reveal a timeline for equilibration of sub-communities of micro-organisms with distinct physiologies, yielding insights into the successive changes that occur in microbial populations in the human gut after antibiotic treatments. Additionally, our framework leverages microbial signatures shared among human subjects to automatically design optimal experiments to interrogate dynamic properties of the microbiota in new studies. Overall, our approach provides a powerful, general-purpose framework for understanding the dynamic behaviors of complex microbial ecosystems, which we believe will prove instrumental for future studies in this field

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

FigShare

Contextual Analysis of Large-Scale Biomedical Associations for the Elucidation and Prioritization of Genes and their Roles in Complex Disease

Author: Jay Jeremy J.
Publication venue: DigitalCommons@UMaine
Publication date: 01/12/2013
Field of study

Vast amounts of biomedical associations are easily accessible in public resources, spanning gene-disease associations, tissue-specific gene expression, gene function and pathway annotations, and many other data types. Despite this mass of data, information most relevant to the study of a particular disease remains loosely coupled and difficult to incorporate into ongoing research. Current public databases are difficult to navigate and do not interoperate well due to the plethora of interfaces and varying biomedical concept identifiers used. Because no coherent display of data within a specific problem domain is available, finding the latent relationships associated with a disease of interest is impractical. This research describes a method for extracting the contextual relationships embedded within associations relevant to a disease of interest. After applying the method to a small test data set, a large-scale integrated association network is constructed for application of a network propagation technique that helps uncover more distant latent relationships. Together these methods are adept at uncovering highly relevant relationships without any a priori knowledge of the disease of interest. The combined contextual search and relevance methods power a tool which makes pertinent biomedical associations easier to find, easier to assimilate into ongoing work, and more prominent than currently available databases. Increasing the accessibility of current information is an important component to understanding high-throughput experimental results and surviving the data deluge

University of Maine