553 research outputs found
Network estimation in State Space Model with L1-regularization constraint
Biological networks have arisen as an attractive paradigm of genomic science
ever since the introduction of large scale genomic technologies which carried
the promise of elucidating the relationship in functional genomics. Microarray
technologies coupled with appropriate mathematical or statistical models have
made it possible to identify dynamic regulatory networks or to measure time
course of the expression level of many genes simultaneously. However one of the
few limitations fall on the high-dimensional nature of such data coupled with
the fact that these gene expression data are known to include some hidden
process. In that regards, we are concerned with deriving a method for inferring
a sparse dynamic network in a high dimensional data setting. We assume that the
observations are noisy measurements of gene expression in the form of mRNAs,
whose dynamics can be described by some unknown or hidden process. We build an
input-dependent linear state space model from these hidden states and
demonstrate how an incorporated regularization constraint in an
Expectation-Maximization (EM) algorithm can be used to reverse engineer
transcriptional networks from gene expression profiling data. This corresponds
to estimating the model interaction parameters. The proposed method is
illustrated on time-course microarray data obtained from a well established
T-cell data. At the optimum tuning parameters we found genes TRAF5, JUND, CDK4,
CASP4, CD69, and C3X1 to have higher number of inwards directed connections and
FYB, CCNA2, AKT1 and CASP8 to be genes with higher number of outwards directed
connections. We recommend these genes to be object for further investigation.
Caspase 4 is also found to activate the expression of JunD which in turn
represses the cell cycle regulator CDC2.Comment: arXiv admin note: substantial text overlap with arXiv:1308.359
Penalized EM algorithm and copula skeptic graphical models for inferring networks for mixed variables
In this article, we consider the problem of reconstructing networks for
continuous, binary, count and discrete ordinal variables by estimating sparse
precision matrix in Gaussian copula graphical models. We propose two
approaches: penalized extended rank likelihood with Monte Carlo
Expectation-Maximization algorithm (copula EM glasso) and copula skeptic with
pair-wise copula estimation for copula Gaussian graphical models. The proposed
approaches help to infer networks arising from nonnormal and mixed variables.
We demonstrate the performance of our methods through simulation studies and
analysis of breast cancer genomic and clinical data and maize genetics data
Model-based clustering for populations of networks
Until recently obtaining data on populations of networks was typically rare.
However, with the advancement of automatic monitoring devices and the growing
social and scientific interest in networks, such data has become more widely
available. From sociological experiments involving cognitive social structures
to fMRI scans revealing large-scale brain networks of groups of patients, there
is a growing awareness that we urgently need tools to analyse populations of
networks and particularly to model the variation between networks due to
covariates. We propose a model-based clustering method based on mixtures of
generalized linear (mixed) models that can be employed to describe the joint
distribution of a populations of networks in a parsimonious manner and to
identify subpopulations of networks that share certain topological properties
of interest (degree distribution, community structure, effect of covariates on
the presence of an edge, etc.). Maximum likelihood estimation for the proposed
model can be efficiently carried out with an implementation of the EM
algorithm. We assess the performance of this method on simulated data and
conclude with an example application on advice networks in a small business.Comment: The final (published) version of the article can be downloaded for
free (Open Access) from the editor's website (click on the DOI link below
Estimating Network Kinetics of the MAPK/ERK Pathway Using Biochemical Data
The MAPK/ERK pathway is a major signal transduction system which regulates many fundamental cellular processes including the growth control and the cell death. As a result of these roles, it has a crucial importance in cancer as well as normal developmental processes. Therefore, it has been intensively studied resulting in a wealth of knowledge about its activation. It is also well documented that the activation kinetics of the pathway is crucial to determine the nature of the biological response. However, while individual biochemical steps are well characterized, it is still difficult to predict or even understand how the activation kinetics works. The aim of this paper is to estimate the stochastic rate constants of the MAPK/ERK network dynamics. Accordingly, taking a Bayesian approach, we combined underlying qualitative biological knowledge in several competing dynamic models via sets of quasireactions and estimated the stochastic rate constants of these reactions. Comparing the resulting estimates via the BIC and DIC criteria, we chose a biological model which includes EGFR degradation—Raf-MEK-ERK cascade without the involvement of RKIPs.
High dimensional Sparse Gaussian Graphical Mixture Model
This paper considers the problem of networks reconstruction from
heterogeneous data using a Gaussian Graphical Mixture Model (GGMM). It is well
known that parameter estimation in this context is challenging due to large
numbers of variables coupled with the degeneracy of the likelihood. We propose
as a solution a penalized maximum likelihood technique by imposing an
penalty on the precision matrix. Our approach shrinks the parameters thereby
resulting in better identifiability and variable selection. We use the
Expectation Maximization (EM) algorithm which involves the graphical LASSO to
estimate the mixing coefficients and the precision matrices. We show that under
certain regularity conditions the Penalized Maximum Likelihood (PML) estimates
are consistent. We demonstrate the performance of the PML estimator through
simulations and we show the utility of our method for high dimensional data
analysis in a genomic application
De novo construction of polyploid linkage maps using discrete graphical models
Linkage maps are used to identify the location of genes responsible for
traits and diseases. New sequencing techniques have created opportunities to
substantially increase the density of genetic markers. Such revolutionary
advances in technology have given rise to new challenges, such as creating
high-density linkage maps. Current multiple testing approaches based on
pairwise recombination fractions are underpowered in the high-dimensional
setting and do not extend easily to polyploid species. We propose to construct
linkage maps using graphical models either via a sparse Gaussian copula or a
nonparanormal skeptic approach. Linkage groups (LGs), typically chromosomes,
and the order of markers in each LG are determined by inferring the conditional
independence relationships among large numbers of markers in the genome.
Through simulations, we illustrate the utility of our map construction method
and compare its performance with other available methods, both when the data
are clean and contain no missing observations and when data contain genotyping
errors and are incomplete. We apply the proposed method to two genotype
datasets: barley and potato from diploid and polypoid populations,
respectively. Our comprehensive map construction method makes full use of the
dosage SNP data to reconstruct linkage map for any bi-parental diploid and
polyploid species. We have implemented the method in the R package netgwas.Comment: 25 pages, 7 figure
Reproducing kernel Hilbert space based estimation of systems of ordinary differential equations
Non-linear systems of differential equations have attracted the interest in
fields like system biology, ecology or biochemistry, due to their flexibility
and their ability to describe dynamical systems. Despite the importance of such
models in many branches of science they have not been the focus of systematic
statistical analysis until recently. In this work we propose a general approach
to estimate the parameters of systems of differential equations measured with
noise. Our methodology is based on the maximization of the penalized likelihood
where the system of differential equations is used as a penalty. To do so, we
use a Reproducing Kernel Hilbert Space approach that allows to formulate the
estimation problem as an unconstrained numeric maximization problem easy to
solve. The proposed method is tested with synthetically simulated data and it
is used to estimate the unobserved transcription factor CdaR in Steptomyes
coelicolor using gene expression data of the genes it regulates.Comment: 16 pages, 6 figure
netgwas: An R Package for Network-Based Genome-Wide Association Studies
Graphical models are powerful tools for modeling and making statistical
inferences regarding complex associations among variables in multivariate data.
In this paper we introduce the R package netgwas, which is designed based on
undirected graphical models to accomplish three important and interrelated
goals in genetics: constructing linkage map, reconstructing linkage
disequilibrium (LD) networks from multi-loci genotype data, and detecting
high-dimensional genotype-phenotype networks. The netgwas package deals with
species with any chromosome copy number in a unified way, unlike other
software. It implements recent improvements in both linkage map construction
(Behrouzi and Wit, 2018), and reconstructing conditional independence network
for non-Gaussian continuous data, discrete data, and mixed
discrete-and-continuous data (Behrouzi and Wit, 2017). Such datasets routinely
occur in genetics and genomics such as genotype data, and genotype-phenotype
data. We demonstrate the value of our package functionality by applying it to
various multivariate example datasets taken from the literature. We show, in
particular, that our package allows a more realistic analysis of data, as it
adjusts for the effect of all other variables while performing pairwise
associations. This feature controls for spurious associations between variables
that can arise from classical multiple testing approach. This paper includes a
brief overview of the statistical methods which have been implemented in the
package. The main body of the paper explains how to use the package. The
package uses a parallelization strategy on multi-core processors to speed-up
computations for large datasets. In addition, it contains several functions for
simulation and visualization. The netgwas package is freely available at
https://cran.r-project.org/web/packages/netgwasComment: 32 pages, 9 figures; due to the limitation "The abstract field cannot
be longer than 1,920 characters", the abstract appearing here is slightly
shorter than that in the PDF fil
Identifying overlapping terrorist cells from the Noordin Top actor-event network
Actor-event data are common in sociological settings, whereby one registers
the pattern of attendance of a group of social actors to a number of events. We
focus on 79 members of the Noordin Top terrorist network, who were monitored
attending 45 events. The attendance or non-attendance of the terrorist to
events defines the social fabric, such as group coherence and social
communities. The aim of the analysis of such data is to learn about the
affiliation structure. Actor-event data is often transformed to actor-actor
data in order to be further analysed by network models, such as stochastic
block models. This transformation and such analyses lead to a natural loss of
information, particularly when one is interested in identifying, possibly
overlapping, subgroups or communities of actors on the basis of their
attendances to events. In this paper we propose an actor-event model for
overlapping communities of terrorists, which simplifies interpretation of the
network. We propose a mixture model with overlapping clusters for the analysis
of the binary actor-event network data, called {\tt manet}, and develop a
Bayesian procedure for inference. After a simulation study, we show how this
analysis of the terrorist network has clear interpretative advantages over the
more traditional approaches of affiliation network analysis.Comment: 24 pages, 5 figures; related R package (manet) available on CRA
- …