26,499 research outputs found
Network estimation in State Space Model with L1-regularization constraint
Biological networks have arisen as an attractive paradigm of genomic science
ever since the introduction of large scale genomic technologies which carried
the promise of elucidating the relationship in functional genomics. Microarray
technologies coupled with appropriate mathematical or statistical models have
made it possible to identify dynamic regulatory networks or to measure time
course of the expression level of many genes simultaneously. However one of the
few limitations fall on the high-dimensional nature of such data coupled with
the fact that these gene expression data are known to include some hidden
process. In that regards, we are concerned with deriving a method for inferring
a sparse dynamic network in a high dimensional data setting. We assume that the
observations are noisy measurements of gene expression in the form of mRNAs,
whose dynamics can be described by some unknown or hidden process. We build an
input-dependent linear state space model from these hidden states and
demonstrate how an incorporated regularization constraint in an
Expectation-Maximization (EM) algorithm can be used to reverse engineer
transcriptional networks from gene expression profiling data. This corresponds
to estimating the model interaction parameters. The proposed method is
illustrated on time-course microarray data obtained from a well established
T-cell data. At the optimum tuning parameters we found genes TRAF5, JUND, CDK4,
CASP4, CD69, and C3X1 to have higher number of inwards directed connections and
FYB, CCNA2, AKT1 and CASP8 to be genes with higher number of outwards directed
connections. We recommend these genes to be object for further investigation.
Caspase 4 is also found to activate the expression of JunD which in turn
represses the cell cycle regulator CDC2.Comment: arXiv admin note: substantial text overlap with arXiv:1308.359
netgwas: An R Package for Network-Based Genome-Wide Association Studies
Graphical models are powerful tools for modeling and making statistical
inferences regarding complex associations among variables in multivariate data.
In this paper we introduce the R package netgwas, which is designed based on
undirected graphical models to accomplish three important and interrelated
goals in genetics: constructing linkage map, reconstructing linkage
disequilibrium (LD) networks from multi-loci genotype data, and detecting
high-dimensional genotype-phenotype networks. The netgwas package deals with
species with any chromosome copy number in a unified way, unlike other
software. It implements recent improvements in both linkage map construction
(Behrouzi and Wit, 2018), and reconstructing conditional independence network
for non-Gaussian continuous data, discrete data, and mixed
discrete-and-continuous data (Behrouzi and Wit, 2017). Such datasets routinely
occur in genetics and genomics such as genotype data, and genotype-phenotype
data. We demonstrate the value of our package functionality by applying it to
various multivariate example datasets taken from the literature. We show, in
particular, that our package allows a more realistic analysis of data, as it
adjusts for the effect of all other variables while performing pairwise
associations. This feature controls for spurious associations between variables
that can arise from classical multiple testing approach. This paper includes a
brief overview of the statistical methods which have been implemented in the
package. The main body of the paper explains how to use the package. The
package uses a parallelization strategy on multi-core processors to speed-up
computations for large datasets. In addition, it contains several functions for
simulation and visualization. The netgwas package is freely available at
https://cran.r-project.org/web/packages/netgwasComment: 32 pages, 9 figures; due to the limitation "The abstract field cannot
be longer than 1,920 characters", the abstract appearing here is slightly
shorter than that in the PDF fil
Iterative reconstruction of high-dimensional Gaussian Graphical Models based on a new method to estimate partial correlations under constraints.
In the context of Gaussian Graphical Models (GGMs) with high-dimensional small sample data, we present a simple procedure, called PACOSE - standing for PArtial COrrelation SElection - to estimate partial correlations under the constraint that some of them are strictly zero. This method can also be extended to covariance selection. If the goal is to estimate a GGM, our new procedure can be applied to re-estimate the partial correlations after a first graph has been estimated in the hope to improve the estimation of non-zero coefficients. This iterated version of PACOSE is called iPACOSE. In a simulation study, we compare PACOSE to existing methods and show that the re-estimated partial correlation coefficients may be closer to the real values in important cases. Plus, we show on simulated and real data that iPACOSE shows very interesting properties with regards to sensitivity, positive predictive value and stability
Mixed membership stochastic blockmodels
Observations consisting of measurements on relationships for pairs of objects
arise in many settings, such as protein interaction and gene regulatory
networks, collections of author-recipient email, and social networks. Analyzing
such data with probabilisic models can be delicate because the simple
exchangeability assumptions underlying many boilerplate models no longer hold.
In this paper, we describe a latent variable model of such data called the
mixed membership stochastic blockmodel. This model extends blockmodels for
relational data to ones which capture mixed membership latent relational
structure, thus providing an object-specific low-dimensional representation. We
develop a general variational inference algorithm for fast approximate
posterior inference. We explore applications to social and protein interaction
networks.Comment: 46 pages, 14 figures, 3 table
Incomplete graphical model inference via latent tree aggregation
Graphical network inference is used in many fields such as genomics or
ecology to infer the conditional independence structure between variables, from
measurements of gene expression or species abundances for instance. In many
practical cases, not all variables involved in the network have been observed,
and the samples are actually drawn from a distribution where some variables
have been marginalized out. This challenges the sparsity assumption commonly
made in graphical model inference, since marginalization yields locally dense
structures, even when the original network is sparse. We present a procedure
for inferring Gaussian graphical models when some variables are unobserved,
that accounts both for the influence of missing variables and the low density
of the original network. Our model is based on the aggregation of spanning
trees, and the estimation procedure on the Expectation-Maximization algorithm.
We treat the graph structure and the unobserved nodes as missing variables and
compute posterior probabilities of edge appearance. To provide a complete
methodology, we also propose several model selection criteria to estimate the
number of missing nodes. A simulation study and an illustration flow cytometry
data reveal that our method has favorable edge detection properties compared to
existing graph inference techniques. The methods are implemented in an R
package
- …