223 research outputs found
De novo construction of polyploid linkage maps using discrete graphical models
Linkage maps are used to identify the location of genes responsible for
traits and diseases. New sequencing techniques have created opportunities to
substantially increase the density of genetic markers. Such revolutionary
advances in technology have given rise to new challenges, such as creating
high-density linkage maps. Current multiple testing approaches based on
pairwise recombination fractions are underpowered in the high-dimensional
setting and do not extend easily to polyploid species. We propose to construct
linkage maps using graphical models either via a sparse Gaussian copula or a
nonparanormal skeptic approach. Linkage groups (LGs), typically chromosomes,
and the order of markers in each LG are determined by inferring the conditional
independence relationships among large numbers of markers in the genome.
Through simulations, we illustrate the utility of our map construction method
and compare its performance with other available methods, both when the data
are clean and contain no missing observations and when data contain genotyping
errors and are incomplete. We apply the proposed method to two genotype
datasets: barley and potato from diploid and polypoid populations,
respectively. Our comprehensive map construction method makes full use of the
dosage SNP data to reconstruct linkage map for any bi-parental diploid and
polyploid species. We have implemented the method in the R package netgwas.Comment: 25 pages, 7 figure
netgwas: An R Package for Network-Based Genome-Wide Association Studies
Graphical models are powerful tools for modeling and making statistical
inferences regarding complex associations among variables in multivariate data.
In this paper we introduce the R package netgwas, which is designed based on
undirected graphical models to accomplish three important and interrelated
goals in genetics: constructing linkage map, reconstructing linkage
disequilibrium (LD) networks from multi-loci genotype data, and detecting
high-dimensional genotype-phenotype networks. The netgwas package deals with
species with any chromosome copy number in a unified way, unlike other
software. It implements recent improvements in both linkage map construction
(Behrouzi and Wit, 2018), and reconstructing conditional independence network
for non-Gaussian continuous data, discrete data, and mixed
discrete-and-continuous data (Behrouzi and Wit, 2017). Such datasets routinely
occur in genetics and genomics such as genotype data, and genotype-phenotype
data. We demonstrate the value of our package functionality by applying it to
various multivariate example datasets taken from the literature. We show, in
particular, that our package allows a more realistic analysis of data, as it
adjusts for the effect of all other variables while performing pairwise
associations. This feature controls for spurious associations between variables
that can arise from classical multiple testing approach. This paper includes a
brief overview of the statistical methods which have been implemented in the
package. The main body of the paper explains how to use the package. The
package uses a parallelization strategy on multi-core processors to speed-up
computations for large datasets. In addition, it contains several functions for
simulation and visualization. The netgwas package is freely available at
https://cran.r-project.org/web/packages/netgwasComment: 32 pages, 9 figures; due to the limitation "The abstract field cannot
be longer than 1,920 characters", the abstract appearing here is slightly
shorter than that in the PDF fil
Identifying overlapping terrorist cells from the Noordin Top actor-event network
Actor-event data are common in sociological settings, whereby one registers
the pattern of attendance of a group of social actors to a number of events. We
focus on 79 members of the Noordin Top terrorist network, who were monitored
attending 45 events. The attendance or non-attendance of the terrorist to
events defines the social fabric, such as group coherence and social
communities. The aim of the analysis of such data is to learn about the
affiliation structure. Actor-event data is often transformed to actor-actor
data in order to be further analysed by network models, such as stochastic
block models. This transformation and such analyses lead to a natural loss of
information, particularly when one is interested in identifying, possibly
overlapping, subgroups or communities of actors on the basis of their
attendances to events. In this paper we propose an actor-event model for
overlapping communities of terrorists, which simplifies interpretation of the
network. We propose a mixture model with overlapping clusters for the analysis
of the binary actor-event network data, called {\tt manet}, and develop a
Bayesian procedure for inference. After a simulation study, we show how this
analysis of the terrorist network has clear interpretative advantages over the
more traditional approaches of affiliation network analysis.Comment: 24 pages, 5 figures; related R package (manet) available on CRA
A penalized inference approach to stochastic block modelling of community structure in the Italian Parliament
We analyse bill cosponsorship networks in the Italian Chamber of Deputies. In comparison with other parliaments, a distinguishing feature of the Chamber is the large number of political groups. Our analysis aims to infer the pattern of collaborations between these groups from data on bill cosponsorships. We propose an extension of stochastic block models for edge-valued graphs and derive measures of group productivity and of collaboration between political parties. As the model proposed encloses a large number of parameters, we pursue a penalized likelihood approach that enables us to infer a sparse reduced graph displaying collaborations between political parties
Convergence properties of multi-environment causal regularization
Causal regularization was introduced as a stable causal inference strategy in
a two-environment setting in \cite{kania2022causal}. We start with observing
that causal regularizer can be extended to several shifted environments. We
derive the multi-environment casual regularizer in the population setting. We
propose its plug-in estimator, and study its concentration in measure behavior.
Although the variance of the plug-in estimator is not well-defined in general,
we instead study its conditional variance both with respect to a natural
filtration of the empirical as well as conditioning with respect to certain
events. We also study generalizations where we consider conditional
expectations of higher central absolute moments of the estimator. The results
presented here are also new in the prior setting of \cite{kania2022causal} as
well as in \cite{Rot}
Inferring slowly-changing dynamic gene-regulatory networks
Dynamic gene-regulatory networks are complex since the interaction patterns between their components mean that it is impossible to study parts of the network in separation. This holistic character of gene-regulatory networks poses a real challenge to any type of modelling. Graphical models are a class of models that connect the network with a conditional independence relationships between random variables. By interpreting these random variables as gene activities and the conditional independence relationships as functional non-relatedness, graphical models have been used to describe gene-regulatory networks. Whereas the literature has been focused on static networks, most time-course experiments are designed in order to tease out temporal changes in the underlying network. It is typically reasonable to assume that changes in genomic networks are few, because biological systems tend to be stable. We introduce a new model for estimating slow changes in dynamic gene-regulatory networks, which is suitable for high-dimensional data, e.g. time-course microarray data. Our aim is to estimate a dynamically changing genomic network based on temporal activity measurements of the genes in the network. Our method is based on the penalized likelihood with l1-norm, that penalizes conditional dependencies between genes as well as differences between conditional independence elements across time points. We also present a heuristic search strategy to find optimal tuning parameters. We re-write the penalized maximum likelihood problem into a standard convex optimization problem subject to linear equality constraints. We show that our method performs well in simulation studies. Finally, we apply the proposed model to a time-course T-cell dataset
Efficient implementation of sets and multisets in R using hash tables
The package hset for the R language contains an implementation of a S4 class
for sets and multisets of numbers. The implementation, based on the hash table
data structure from the package hash (Brown, 2019), allows for quick operations
when the set is a dynamic object. An important example is when a set or a
multiset is part of the state of a Markov chain in which in each iteration
various elements are moved in and out of the set
- …