165 research outputs found
The large-spin asymptotics of the ferromagnetic XXZ chain
We present new results and give a concise review of recent previous results
on the asymptotics for large spin of the low-lying spectrum of the
ferromagnetic XXZ Heisenberg chain with kink boundary conditions. Our main
interest is to gain detailed information on the interface ground states of this
model and the low-lying excitations above them. The new and most detailed
results are obtained using a rigorous version of bosonization, which can be
interpreted as a quantum central limit theorem.Comment: 30 pages, submitted to the proceedings of the workshop "Low-energy
states in quantum many-body systems", 29 January 2003, Cergy-Pontois
Restricted maximum-likelihood method for learning latent variance components in gene expression data with known and unknown confounders
Random effect models are popular statistical models for detecting and
correcting spurious sample correlations due to hidden confounders in
genome-wide gene expression data. In applications where some confounding
factors are known, estimating simultaneously the contribution of known and
latent variance components in random effect models is a challenge that has so
far relied on numerical gradient-based optimizers to maximize the likelihood
function. This is unsatisfactory because the resulting solution is poorly
characterized and the efficiency of the method may be suboptimal. Here we prove
analytically that maximum-likelihood latent variables can always be chosen
orthogonal to the known confounding factors, in other words, that
maximum-likelihood latent variables explain sample covariances not already
explained by known factors. Based on this result we propose a restricted
maximum-likelihood method which estimates the latent variables by maximizing
the likelihood on the restricted subspace orthogonal to the known confounding
factors, and show that this reduces to probabilistic PCA on that subspace. The
method then estimates the variance-covariance parameters by maximizing the
remaining terms in the likelihood function given the latent variables, using a
newly derived analytic solution for this problem. Compared to gradient-based
optimizers, our method attains greater or equal likelihood values, can be
computed using standard matrix operations, results in latent factors that don't
overlap with any known factors, and has a runtime reduced by several orders of
magnitude. Hence the restricted maximum-likelihood method facilitates the
application of random effect modelling strategies for learning latent variance
components to much larger gene expression datasets than possible with current
methods.Comment: 15 pages, 4 figures, 3 supplementary figures, 19 pages supplementary
methods; minor revision with expanded Discussion sectio
Analytic solution and stationary phase approximation for the Bayesian lasso and elastic net
The lasso and elastic net linear regression models impose a
double-exponential prior distribution on the model parameters to achieve
regression shrinkage and variable selection, allowing the inference of robust
models from large data sets. However, there has been limited success in
deriving estimates for the full posterior distribution of regression
coefficients in these models, due to a need to evaluate analytically
intractable partition function integrals. Here, the Fourier transform is used
to express these integrals as complex-valued oscillatory integrals over
"regression frequencies". This results in an analytic expansion and stationary
phase approximation for the partition functions of the Bayesian lasso and
elastic net, where the non-differentiability of the double-exponential prior
has so far eluded such an approach. Use of this approximation leads to highly
accurate numerical estimates for the expectation values and marginal posterior
distributions of the regression coefficients, and allows for Bayesian inference
of much higher dimensional models than previously possible.Comment: Switched to new NeurIPS style file; 11 pages, 3 figures + appendices
29 pages, 3 supplementary figure
Central limit theorems for the large-spin asymptotics of quantum spins
We use a generalized form of Dyson's spin wave formalism to prove several
central limit theorems for the large-spin asymptotics of quantum spins in a
coherent state.Comment: 28 pages, uses package amsref
Analysis of a Gibbs sampler method for model based clustering of gene expression data
Over the last decade, a large variety of clustering algorithms have been
developed to detect coregulatory relationships among genes from microarray gene
expression data. Model based clustering approaches have emerged as
statistically well grounded methods, but the properties of these algorithms
when applied to large-scale data sets are not always well understood. An
in-depth analysis can reveal important insights about the performance of the
algorithm, the expected quality of the output clusters, and the possibilities
for extracting more relevant information out of a particular data set. We have
extended an existing algorithm for model based clustering of genes to
simultaneously cluster genes and conditions, and used three large compendia of
gene expression data for S. cerevisiae to analyze its properties. The algorithm
uses a Bayesian approach and a Gibbs sampling procedure to iteratively update
the cluster assignment of each gene and condition. For large-scale data sets,
the posterior distribution is strongly peaked on a limited number of
equiprobable clusterings. A GO annotation analysis shows that these local
maxima are all biologically equally significant, and that simultaneously
clustering genes and conditions performs better than only clustering genes and
assuming independent conditions. A collection of distinct equivalent
clusterings can be summarized as a weighted graph on the set of genes, from
which we extract fuzzy, overlapping clusters using a graph spectral method. The
cores of these fuzzy clusters contain tight sets of strongly coexpressed genes,
while the overlaps exhibit relations between genes showing only partial
coexpression.Comment: 8 pages, 7 figure
A Graph Feature Auto-Encoder for the Prediction of Unobserved Node Features on Biological Networks
Background
Molecular interaction networks summarize complex biological processes as graphs, whose structure is informative of biological function at multiple scales. Simultaneously, omics technologies measure the variation or activity of genes, proteins, or metabolites across individuals or experimental conditions. Integrating the complementary viewpoints of biological networks and omics data is an important task in bioinformatics, but existing methods treat networks as discrete structures, which are intrinsically difficult to integrate with continuous node features or activity measures. Graph neural networks map graph nodes into a low-dimensional vector space representation, and can be trained to preserve both the local graph structure and the similarity between node features.
Results
We studied the representation of transcriptional, protein–protein and genetic interaction networks in E. coli and mouse using graph neural networks. We found that such representations explain a large proportion of variation in gene expression data, and that using gene expression data as node features improves the reconstruction of the graph from the embedding. We further proposed a new end-to-end Graph Feature Auto-Encoder framework for the prediction of node features utilizing the structure of the gene networks, which is trained on the feature prediction task, and showed that it performs better at predicting unobserved node features than regular MultiLayer Perceptrons. When applied to the problem of imputing missing data in single-cell RNAseq data, the Graph Feature Auto-Encoder utilizing our new graph convolution layer called FeatGraphConv outperformed a state-of-the-art imputation method that does not use protein interaction information, showing the benefit of integrating biological networks and omics data with our proposed approach.
Conclusion
Our proposed Graph Feature Auto-Encoder framework is a powerful approach for integrating and exploiting the close relation between molecular interaction networks and functional genomics data.publishedVersio
Controlling false discoveries in Bayesian gene networks with lasso regression p-values
Bayesian networks can represent directed gene regulations and therefore are
favored over co-expression networks. However, hardly any Bayesian network study
concerns the false discovery control (FDC) of network edges, leading to low
accuracies due to systematic biases from inconsistent false discovery levels in
the same study. We design four empirical tests to examine the FDC of Bayesian
networks from three p-value based lasso regression variable selections --- two
existing and one we originate. Our method, lassopv, computes p-values for the
critical regularization strength at which a predictor starts to contribute to
lasso regression. Using null and Geuvadis datasets, we find that lassopv
obtains optimal FDC in Bayesian gene networks, whilst existing methods have
defective p-values. The FDC concept and tests extend to most network inference
scenarios and will guide the design and improvement of new and existing
methods. Our novel variable selection method with lasso regression also allows
FDC on other datasets and questions, even beyond network inference and
computational biology. Lassopv is implemented in R and freely available at
https://github.com/lingfeiwang/lassopv and
https://cran.r-project.org/package=lassopvComment: 9 pages, 6 figures, 3 tables. Supplementary info: 2 page
Alignment and integration of complex networks by hypergraph-based spectral clustering
Complex networks possess a rich, multi-scale structure reflecting the
dynamical and functional organization of the systems they model. Often there is
a need to analyze multiple networks simultaneously, to model a system by more
than one type of interaction or to go beyond simple pairwise interactions, but
currently there is a lack of theoretical and computational methods to address
these problems. Here we introduce a framework for clustering and community
detection in such systems using hypergraph representations. Our main result is
a generalization of the Perron-Frobenius theorem from which we derive spectral
clustering algorithms for directed and undirected hypergraphs. We illustrate
our approach with applications for local and global alignment of
protein-protein interaction networks between multiple species, for tripartite
community detection in folksonomies, and for detecting clusters of overlapping
regulatory pathways in directed networks.Comment: 16 pages, 5 figures; revised version with minor corrections and
figures printed in two-column format for better readability; algorithm
implementation and supplementary information available at Google code at
http://schype.googlecode.co
- …