Search CORE

15,476 research outputs found

Defining a robust biological prior from Pathway Analysis to drive Network Inference

Author: Ambroise Christophe
Guedj Mickael
Jeanmougin Marine
Publication venue
Publication date: 01/01/2011
Field of study

Inferring genetic networks from gene expression data is one of the most challenging work in the post-genomic era, partly due to the vast space of possible networks and the relatively small amount of data available. In this field, Gaussian Graphical Model (GGM) provides a convenient framework for the discovery of biological networks. In this paper, we propose an original approach for inferring gene regulation networks using a robust biological prior on their structure in order to limit the set of candidate networks. Pathways, that represent biological knowledge on the regulatory networks, will be used as an informative prior knowledge to drive Network Inference. This approach is based on the selection of a relevant set of genes, called the "molecular signature", associated with a condition of interest (for instance, the genes involved in disease development). In this context, differential expression analysis is a well established strategy. However outcome signatures are often not consistent and show little overlap between studies. Thus, we will dedicate the first part of our work to the improvement of the standard process of biomarker identification to guarantee the robustness and reproducibility of the molecular signature. Our approach enables to compare the networks inferred between two conditions of interest (for instance case and control networks) and help along the biological interpretation of results. Thus it allows to identify differential regulations that occur in these conditions. We illustrate the proposed approach by applying our method to a study of breast cancer's response to treatment

arXiv.org e-Print Archive

Numérisation de Documents Anciens Mathématiques

Recommended from our members

Bayesian Inference for Genomic Data Analysis

Author: Ogundijo Oyetunji Enoch
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

High-throughput genomic data contain gazillion of information that are influenced by the complex biological processes in the cell. As such, appropriate mathematical modeling frameworks are required to understand the data and the data generating processes. This dissertation focuses on the formulation of mathematical models and the description of appropriate computational algorithms to obtain insights from genomic data. Specifically, characterization of intra-tumor heterogeneity is studied. Based on the total number of allele copies at the genomic locations in the tumor subclones, the problem is viewed from two perspectives: the presence or absence of copy-neutrality assumption. With the presence of copy-neutrality, it is assumed that the genome contains mutational variability and the three possible genotypes may be present at each genomic location. As such, the genotypes of all the genomic locations in the tumor subclones are modeled by a ternary matrix. In the second case, in addition to mutational variability, it is assumed that the genomic locations may be affected by structural variabilities such as copy number variation (CNV). Thus, the genotypes are modeled with a pair of (Q + 1)-ary matrices. Using the categorical Indian buffet process (cIBP), state-space modeling framework is employed in describing the two processes and the sequential Monte Carlo (SMC) methods for dynamic models are applied to perform inference on important model parameters. Moreover, the problem of estimating gene regulatory network (GRN) from measurement with missing values is presented. Specifically, gene expression time series data may contain missing values for entire expression values of a single point or some set of consecutive time points. However, complete data is often needed to make inference on the underlying GRN. Using the missing measurement, a dynamic stochastic model is used to describe the evolution of gene expression and point-based Gaussian approximation (PBGA) filters with one-step or two-step missing measurements are applied for the inference. Finally, the problem of deconvolving gene expression data from complex heterogeneous biological samples is examined, where the observed data are a mixture of different cell types. A statistical description of the problem is used and the SMC method for static models is applied to estimate the cell-type specific expressions and the cell type proportions in the heterogeneous samples

Columbia University Academic Commons

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California

Recommended from our members

Understand Biology Using Single Cell RNA-Sequencing

Author: Ding Hongxu
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

This dissertation summarizes the development of experimental and analytical tools for single cell RNA sequencing (scRNA-Seq), including 1) scPLATE-Seq, a FACS- and plate-based scRNASeq platform, which is accurate, robust, fully automated and cost-efficient; 2) metaVIPER, an algorithm for transcriptional regulator activity inference based on scRNA-Seq profiles; and 3) iterClust, a statistical framework for iterative clustering analysis, especially suitable for dissecting hierarchy of heterogeneity among single cells. Further this dissertation summarizes biological questions answered by combining these tools, including 1) understanding inter- and intra-tumor heterogeneity of human glioblastoma; 2) elucidating regulators of β-cell de-differentiation in type-2 diabetes; and 3) developing novel therapeutics targeting cell-state regulators of breast cancer stem cells

Columbia University Academic Commons

Interactome comparison of human embryonic stem cell lines with the inner cell mass and trophectoderm

Author: Bates Nicola
Brison Daniel R.
Garner Terence
Kimber Susan J.
Minogue Ben
Oldershaw Rachel
Shaw Lisa
Smith Helen
Sneddon Sharon
Stevens Adam
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 07/09/2018
Field of study

Networks of interacting co-regulated genes distinguish the inner cell mass (ICM) from the differentiated trophectoderm (TE) in the preimplantation blastocyst, in a species specific manner. In mouse the ground state pluripotency of the ICM appears to be maintained in murine embryonic stem cells (ESCs) derived from the ICM. This is not the case for human ESCs. In order to gain insight into this phenomenon, we have used quantitative network analysis to identify how similar human (h)ESCs are to the human ICM. Using the hESC lines MAN1, HUES3 and HUES7 we have shown that all have only a limited overlap with ICM specific gene expression, but that this overlap is enriched for network properties that correspond to key aspects of function including transcription factor activity and the hierarchy of network modules. These analyses provide an important framework which highlights the developmental origins of hESCs

Enlighten

Influence of Statistical Estimators of Mutual Information and Data Heterogeneity on the Inference of Gene Regulatory Networks

Author: A Almudevar
A Butte
A Butte
A Kent
A Kraskov
A Margolin
B Bolstad
B Palsson
B Xing
C Olsen
C Shannon
C Steinhoff
D Husmeier
DJ Sheskin
EN Gilbert
F Emmert-Streib
F Emmert-Streib
F Harrell
Frank Emmert-Streib
G Altay
G Altay
G Csardi
G Miller
G Stolovitzky
I Nemenman
J Hausser
J Pearl
J Schäfer
J Watkinson
JJ Faith
K Liang
L Paninski
L von Bertalanffy
L von Bertalanffy
M Vidal
N Friedman
P Erdös
P Meyer
P Spirtes
R Leclerc
RA Irizarry
Ricardo de Matos Simoes
S Bulashevska
S Dudoit
S Khan
S Liang
S Stouffer
T Cover
T Fawcett
T Schurmann
T Van den Bulcke
T Verma
U Alon
Vladimir Brusic
W Li
W Luo
Y Yang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

The inference of gene regulatory networks from gene expression data is a difficult problem because the performance of the inference algorithms depends on a multitude of different factors. In this paper we study two of these. First, we investigate the influence of discrete mutual information (MI) estimators on the global and local network inference performance of the C3NET algorithm. More precisely, we study different MI estimators (Empirical, Miller-Madow, Shrink and Schürmann-Grassberger) in combination with discretization methods (equal frequency, equal width and global equal width discretization). We observe the best global and local inference performance of C3NET for the Miller-Madow estimator with an equal width discretization. Second, our numerical analysis can be considered as a systems approach because we simulate gene expression data from an underlying gene regulatory network, instead of making a distributional assumption to sample thereof. We demonstrate that despite the popularity of the latter approach, which is the traditional way of studying MI estimators, this is in fact not supported by simulated and biological expression data because of their heterogeneity. Hence, our study provides guidance for an efficient design of a simulation study in the context of network inference, supporting a systems approach

Queen's University Belfast Research Portal

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Data based identification and prediction of nonlinear and complex dynamical systems

Author: Grebogi Celso
Lai Ying-Cheng
Wang Wen-Xu
Publication venue: 'Elsevier BV'
Publication date: 27/04/2017
Field of study

We thank Dr. R. Yang (formerly at ASU), Dr. R.-Q. Su (formerly at ASU), and Mr. Zhesi Shen for their contributions to a number of original papers on which this Review is partly based. This work was supported by ARO under Grant No. W911NF-14-1-0504. W.-X. Wang was also supported by NSFC under Grants No. 61573064 and No. 61074116, as well as by the Fundamental Research Funds for the Central Universities, Beijing Nova Programme.Peer reviewedPostprin

arXiv.org e-Print Archive

Aberdeen University Research

Detection of regulator genes and eQTLs in gene networks

Author: A Butte
A Chatr-Aryamontri
A Clauset
A Joshi
A Joshi
A Kundaje
AA Shabalin
AJ Enright
AJ Walhout
AS Dimas
B Schwanhausser
B Zhang
B Zhang
C Cenik
CO Daub
D Koller
DA Cusanovich
DM Greenawalt
E Bonnet
E Ravasz
E Segal
EC Neto
EC Neto
EC Neto
EE Schadt
EE Schadt
EE Schadt
EE Schadt
EE Schadt
EJ Foss
F Grubert
F Yue
FA Cubillos
FW Albert
G Hemani
G Nicholson
GD Smith
GH Golub
H Foroughi Asl
H Talukdar
HN Kadarmideen
J Millstein
J Qi
J Zhu
J Zhu
J Zhu
JE Aten
JF Ayroles
JJ Faith
JL Björkegren
JS Liu
K Basso
K Qu
KG Ardlie
L Wu
LA Hindorff
LH Hartwell
LS Chen
M Ashburner
M Civelek
M Georges
M Gerstein
M Medvedovic
M Schmidt
M Scutari
MA Schaub
MB Eisen
MD Ritchie
ME Goddard
MEJ Newman
MEJ Newman
MV Rockman
MV Rockman
N Friedman
N Friedman
N Friedman
N Laird
O Stegle
P Langfelder
P Langfelder
P Langfelder
P Lu
R Sharan
R Sharan
RB Brem
RW Williams
S Lee
S Roy
S Tavazoie
SI Lee
SM Waszak
SS Rao
T Lappalainen
T Michoel
TA Manolio
TF Mackay
The ENCODE
TS Furey
VG Cheung
W Cookson
W Zhang
Y Chen
Y Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2016
Field of study

Genetic differences between individuals associated to quantitative phenotypic traits, including disease states, are usually found in non-coding genomic regions. These genetic variants are often also associated to differences in expression levels of nearby genes (they are "expression quantitative trait loci" or eQTLs for short) and presumably play a gene regulatory role, affecting the status of molecular networks of interacting genes, proteins and metabolites. Computational systems biology approaches to reconstruct causal gene networks from large-scale omics data have therefore become essential to understand the structure of networks controlled by eQTLs together with other regulatory genes, and to generate detailed hypotheses about the molecular mechanisms that lead from genotype to phenotype. Here we review the main analytical methods and softwares to identify eQTLs and their associated genes, to reconstruct co-expression networks and modules, to reconstruct causal Bayesian gene and module networks, and to validate predicted networks in silico.Comment: minor revision with typos corrected; review article; 24 pages, 2 figure

arXiv.org e-Print Archive

Crossref