Search CORE

Texas A&M Repository

Redescription Mining and Applications in Bioinformatics

Author: Mohammed J. Zaki
Naren Ramakrishnan
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2009
Field of study

Our ability to interrogate the cell and computationally assimilate its answers is improving at a dramatic pace. For instance, the study of even a focused aspect of cellular activity, such as gene action, now benefits from multiple high-throughput data acquisition technologies such as microarrays, genome-wide deletion screens, and RNAi assays. A critical need is the development of algorithms that can bridge, relate, and unify diverse categories of data descriptors. Redescription mining is such an approach. Given a set of biological objects (e.g., genes, proteins) and a collection of descriptors defined over this set, the goal of redescription mining is to use the given descriptors as a vocabulary and find subsets of data that afford multiple definitions. The premise of redescription mining is that subsets that afford multiple definitions are likely to exhibit concerted behavior and are, hence, interesting. We present algorithms for redescription mining based on formal concept analysis and applications of redescription mining to multiple biological datasets. We demonstrate how redescriptions identify conceptual clusters of data using mutually reinforcing features, without explicit training information.

CiteSeerX

A linear programming approach for estimating the structure of a sparse linear genetic network from transcript profiling data

Author: Bhadra Sahely
Bhattacharyya Chiranjib
Chandra Nagasuma R
Mian I Saira
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background A genetic network can be represented as a directed graph in which a node corresponds to a gene and a directed edge specifies the direction of influence of one gene on another. The reconstruction of such networks from transcript profiling data remains an important yet challenging endeavor. A transcript profile specifies the abundances of many genes in a biological sample of interest. Prevailing strategies for learning the structure of a genetic network from high-dimensional transcript profiling data assume sparsity and linearity. Many methods consider relatively small directed graphs, inferring graphs with up to a few hundred nodes. This work examines large undirected graphs representations of genetic networks, graphs with many thousands of nodes where an undirected edge between two nodes does not indicate the direction of influence, and the problem of estimating the structure of such a sparse linear genetic network (SLGN) from transcript profiling data. Results The structure learning task is cast as a sparse linear regression problem which is then posed as a LASSO (<it>l</it>1-constrained fitting) problem and solved finally by formulating a Linear Program (LP). A bound on the Generalization Error of this approach is given in terms of the Leave-One-Out Error. The accuracy and utility of LP-SLGNs is assessed quantitatively and qualitatively using simulated and real data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) initiative provides gold standard data sets and evaluation metrics that enable and facilitate the comparison of algorithms for deducing the structure of networks. The structures of LP-SLGNs estimated from the I<smcaps>N</smcaps>S<smcaps>ILICO</smcaps>1, I<smcaps>N</smcaps>S<smcaps>ILICO</smcaps>2 and I<smcaps>N</smcaps>S<smcaps>ILICO</smcaps>3 simulated DREAM2 data sets are comparable to those proposed by the first and/or second ranked teams in the DREAM2 competition. The structures of LP-SLGNs estimated from two published <it>Saccharomyces cerevisae </it>cell cycle transcript profiling data sets capture known regulatory associations. In each <it>S. cerevisiae </it>LP-SLGN, the number of nodes with a particular degree follows an approximate power law suggesting that its degree distributions is similar to that observed in real-world networks. Inspection of these LP-SLGNs suggests biological hypotheses amenable to experimental verification. Conclusion A statistically robust and computationally efficient LP-based method for estimating the topology of a large sparse undirected graph from high-dimensional data yields representations of genetic networks that are biologically plausible and useful abstractions of the structures of real genetic networks. Analysis of the statistical and topological properties of learned LP-SLGNs may have practical value; for example, genes with high random walk betweenness, a measure of the centrality of a node in a graph, are good candidates for intervention studies and hence integrated computational – experimental investigations designed to infer more realistic and sophisticated probabilistic directed graphical model representations of genetic networks. The LP-based solutions of the sparse linear regression problem described here may provide a method for learning the structure of transcription factor networks from transcript profiling and transcription factor binding motif data.</p

Springer - Publisher Connector

Open Access Repository of IISc Research Publications

UCL Discovery

Modeling Genetic Networks from Clonal Analysis

Author: Aubin Jane E.
Nagarajan Radhakrishnan
Peterson Charlotte A.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2004
Field of study

In this report a systematic approach is used to determine the approximate genetic network and robust dependencies underlying differentiation. The data considered is in the form of a binary matrix and represent the expression of the nine genes across the ninety-nine colonies. The report is divided into two parts: the first part identifies significant pair-wise dependencies from the given binary matrix using linear correlation and mutual information. A new method is proposed to determine statistically significant dependencies estimated using the mutual information measure. In the second, a Bayesian approach is used to obtain an approximate description (equivalence class) of network structures. The robustness of linear correlation, mutual information and the equivalence class of networks is investigated with perturbation and decreasing colony number. Perturbation of the data was achieved by generating bootstrap realizations. The results are refined with biological knowledge. It was found that certain dependencies in the network are immune to perturbation and decreasing colony number and may represent robust features, inherent in the differentiation program of osteoblast progenitor cells. The methods to be discussed are generic in nature and not restricted to the experimental paradigm addressed in this study.Comment: 59 pahes, 11 figures, 3 table

arXiv.org e-Print Archive

University of Kentucky

Discovery of time-delayed gene regulatory networks based on temporal gene expression profiling

Author: Du Lei
Guo Zheng
Jiang Wei
Li Chuanxing
Li Jing
Li Li
Li Xia
Rao Shaoqi
Wang Lihong
Wang Qing K
Xiao Yun
Zhang Qingpu
Zhang Tianwen
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: It is one of the ultimate goals for modern biological research to fully elucidate the intricate interplays and the regulations of the molecular determinants that propel and characterize the progression of versatile life phenomena, to name a few, cell cycling, developmental biology, aging, and the progressive and recurrent pathogenesis of complex diseases. The vast amount of large-scale and genome-wide time-resolved data is becoming increasing available, which provides the golden opportunity to unravel the challenging reverse-engineering problem of time-delayed gene regulatory networks. RESULTS: In particular, this methodological paper aims to reconstruct regulatory networks from temporal gene expression data by using delayed correlations between genes, i.e., pairwise overlaps of expression levels shifted in time relative each other. We have thus developed a novel model-free computational toolbox termed TdGRN (Time-delayed Gene Regulatory Network) to address the underlying regulations of genes that can span any unit(s) of time intervals. This bioinformatics toolbox has provided a unified approach to uncovering time trends of gene regulations through decision analysis of the newly designed time-delayed gene expression matrix. We have applied the proposed method to yeast cell cycling and human HeLa cell cycling and have discovered most of the underlying time-delayed regulations that are supported by multiple lines of experimental evidence and that are remarkably consistent with the current knowledge on phase characteristics for the cell cyclings. CONCLUSION: We established a usable and powerful model-free approach to dissecting high-order dynamic trends of gene-gene interactions. We have carefully validated the proposed algorithm by applying it to two publicly available cell cycling datasets. In addition to uncovering the time trends of gene regulations for cell cycling, this unified approach can also be used to study the complex gene regulations related to the development, aging and progressive pathogenesis of a complex disease where potential dependences between different experiment units might occurs

Springer - Publisher Connector

University of Regensburg Publication Server

Inferring cellular networks – a review

Author: A Bernard
A Butte
A de la Fuente
A Dobra
A Gelman
A Margolin
A Wagner
A Wagner
A Wagner
A Wille
A Wille
AHY Tong
AJ Hartemink
AJ Hartemink
AV Aho
AV Werhli
B Alberts
B Efron
B Schölkopf
BE Perrin
BL Drees
C Brown
C Rangel
C Rangel
C Yoo
CH Yeang
CH Yeang
CJ Needham
CJ Wolfe
D di Bernardo
D di Bernardo
D Edwards
D Geiger
D Heckerman
D Heckerman
D Husmeier
D Hwang
D Kostka
D Madigan
D Pe'er
D Pe'er
DE Zak
DE Zak
DM Chickering
DM Chickering
DR Bickel
E Segal
E Segal
E Segal
EH Davidson
F Markowetz
F Markowetz
F Markowetz
F Markowetz
F Markowetz
FC Wimberly
Florian Markowetz
G Schwarz
GF Cooper
GF Cooper
GW Carter
H De Jong
H Kishino
H Li
H Steck
H Steck
H Steck
I Gat-Viks
I Nachman
I Nachman
I Pournara
IM Ong
J Mandel
J Pearl
J Pearl
J Peña
J Rung
J Schäfer
J Schäfer
J Tegner
J van Leeuwen
J Yu
JA Papin
JJ Rice
JM Stuart
K Basso
K Murphy
K Sachs
L Avery
L Ljung
L Wessels
LA Soinov
M Ashburner
M Eisen
M Zou
MJ Beal
N Friedman
N Friedman
N Friedman
N Friedman
N Friedman
N Friedman
N Friedman
N Friedman
N Meinshausen
NV Driessche
OG Troyanskaya
P D'haeseleer
P Spellman
P Spirtes
PM Magwene
PWF Smith
R Bonneau
R Jansen
Rainer Spang
RW Robinson
S Bulashevska
S Imoto
S Imoto
S Imoto
S Rogers
S Yeung
SG Bøttcher
SL Lauritzen
SL Wong
T Aittokallio
T Akutsu
T Akutsu
T Ideker
T Kato
TS Gardner
TS Verma
V Filkov
VA Smith
W Hastings
W Wang
Y Tamada
Y Yamanishi
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

In this review we give an overview of computational and statistical methods to reconstruct cellular networks. Although this area of research is vast and fast developing, we show that most currently used methods can be organized by a few key concepts. The first part of the review deals with conditional independence models including Gaussian graphical models and Bayesian networks. The second part discusses probabilistic and graph-based methods for data from experimental interventions and perturbations

Springer - Publisher Connector