Search CORE

165 research outputs found

The large-spin asymptotics of the ferromagnetic XXZ chain

Author: Michoel Tom
Nachtergaele Bruno
Publication venue
Publication date: 24/07/2003
Field of study

We present new results and give a concise review of recent previous results on the asymptotics for large spin of the low-lying spectrum of the ferromagnetic XXZ Heisenberg chain with kink boundary conditions. Our main interest is to gain detailed information on the interface ground states of this model and the low-lying excitations above them. The new and most detailed results are obtained using a rigorous version of bosonization, which can be interpreted as a quantum central limit theorem.Comment: 30 pages, submitted to the proceedings of the workshop "Low-energy states in quantum many-body systems", 29 January 2003, Cergy-Pontois

arXiv.org e-Print Archive

eScholarship - University of California

Restricted maximum-likelihood method for learning latent variance components in gene expression data with known and unknown confounders

Author: Malik Muhammad Ammar
Michoel Tom
Publication venue
Publication date: 04/11/2021
Field of study

Random effect models are popular statistical models for detecting and correcting spurious sample correlations due to hidden confounders in genome-wide gene expression data. In applications where some confounding factors are known, estimating simultaneously the contribution of known and latent variance components in random effect models is a challenge that has so far relied on numerical gradient-based optimizers to maximize the likelihood function. This is unsatisfactory because the resulting solution is poorly characterized and the efficiency of the method may be suboptimal. Here we prove analytically that maximum-likelihood latent variables can always be chosen orthogonal to the known confounding factors, in other words, that maximum-likelihood latent variables explain sample covariances not already explained by known factors. Based on this result we propose a restricted maximum-likelihood method which estimates the latent variables by maximizing the likelihood on the restricted subspace orthogonal to the known confounding factors, and show that this reduces to probabilistic PCA on that subspace. The method then estimates the variance-covariance parameters by maximizing the remaining terms in the likelihood function given the latent variables, using a newly derived analytic solution for this problem. Compared to gradient-based optimizers, our method attains greater or equal likelihood values, can be computed using standard matrix operations, results in latent factors that don't overlap with any known factors, and has a runtime reduced by several orders of magnitude. Hence the restricted maximum-likelihood method facilitates the application of random effect modelling strategies for learning latent variance components to much larger gene expression datasets than possible with current methods.Comment: 15 pages, 4 figures, 3 supplementary figures, 19 pages supplementary methods; minor revision with expanded Discussion sectio

arXiv.org e-Print Archive

University of Bergen

PubMed Central

NORA - Norwegian Open Research Archives

Analytic solution and stationary phase approximation for the Bayesian lasso and elastic net

Author: Michoel Tom
Publication venue
Publication date: 25/09/2017
Field of study

The lasso and elastic net linear regression models impose a double-exponential prior distribution on the model parameters to achieve regression shrinkage and variable selection, allowing the inference of robust models from large data sets. However, there has been limited success in deriving estimates for the full posterior distribution of regression coefficients in these models, due to a need to evaluate analytically intractable partition function integrals. Here, the Fourier transform is used to express these integrals as complex-valued oscillatory integrals over "regression frequencies". This results in an analytic expansion and stationary phase approximation for the partition functions of the Bayesian lasso and elastic net, where the non-differentiability of the double-exponential prior has so far eluded such an approach. Use of this approximation leads to highly accurate numerical estimates for the expectation values and marginal posterior distributions of the regression coefficients, and allows for Bayesian inference of much higher dimensional models than previously possible.Comment: Switched to new NeurIPS style file; 11 pages, 3 figures + appendices 29 pages, 3 supplementary figure

arXiv.org e-Print Archive

Edinburgh Research Explorer

Central limit theorems for the large-spin asymptotics of quantum spins

Author: Arecchi
Bruno Nachtergaele
Caputo
Conlon
Dyson
Dyson
Goderis
Goderis
Goderis
Goderis
Johansson
Koma
Kuperberg
Lieb
Michoel
Michoel
Tom Michoel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/10/2003
Field of study

We use a generalized form of Dyson's spin wave formalism to prove several central limit theorems for the large-spin asymptotics of quantum spins in a coherent state.Comment: 28 pages, uses package amsref

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Analysis of a Gibbs sampler method for model based clustering of gene expression data

Author: Joshi Anagha
Michoel Tom
Van de Peer Yves
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2008
Field of study

Over the last decade, a large variety of clustering algorithms have been developed to detect coregulatory relationships among genes from microarray gene expression data. Model based clustering approaches have emerged as statistically well grounded methods, but the properties of these algorithms when applied to large-scale data sets are not always well understood. An in-depth analysis can reveal important insights about the performance of the algorithm, the expected quality of the output clusters, and the possibilities for extracting more relevant information out of a particular data set. We have extended an existing algorithm for model based clustering of genes to simultaneously cluster genes and conditions, and used three large compendia of gene expression data for S. cerevisiae to analyze its properties. The algorithm uses a Bayesian approach and a Gibbs sampling procedure to iteratively update the cluster assignment of each gene and condition. For large-scale data sets, the posterior distribution is strongly peaked on a limited number of equiprobable clusterings. A GO annotation analysis shows that these local maxima are all biologically equally significant, and that simultaneously clustering genes and conditions performs better than only clustering genes and assuming independent conditions. A collection of distinct equivalent clusterings can be summarized as a weighted graph on the set of genes, from which we extract fuzzy, overlapping clusters using a graph spectral method. The cores of these fuzzy clusters contain tight sets of strongly coexpressed genes, while the overlaps exhibit relations between genes showing only partial coexpression.Comment: 8 pages, 7 figure

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

A Graph Feature Auto-Encoder for the Prediction of Unobserved Node Features on Biological Networks

Author: Hasibi Ramin
Michoel Tom
Publication venue: BioMed Central
Publication date: 01/01/2021
Field of study

Background Molecular interaction networks summarize complex biological processes as graphs, whose structure is informative of biological function at multiple scales. Simultaneously, omics technologies measure the variation or activity of genes, proteins, or metabolites across individuals or experimental conditions. Integrating the complementary viewpoints of biological networks and omics data is an important task in bioinformatics, but existing methods treat networks as discrete structures, which are intrinsically difficult to integrate with continuous node features or activity measures. Graph neural networks map graph nodes into a low-dimensional vector space representation, and can be trained to preserve both the local graph structure and the similarity between node features. Results We studied the representation of transcriptional, protein–protein and genetic interaction networks in E. coli and mouse using graph neural networks. We found that such representations explain a large proportion of variation in gene expression data, and that using gene expression data as node features improves the reconstruction of the graph from the embedding. We further proposed a new end-to-end Graph Feature Auto-Encoder framework for the prediction of node features utilizing the structure of the gene networks, which is trained on the feature prediction task, and showed that it performs better at predicting unobserved node features than regular MultiLayer Perceptrons. When applied to the problem of imputing missing data in single-cell RNAseq data, the Graph Feature Auto-Encoder utilizing our new graph convolution layer called FeatGraphConv outperformed a state-of-the-art imputation method that does not use protein interaction information, showing the benefit of integrating biological networks and omics data with our proposed approach. Conclusion Our proposed Graph Feature Auto-Encoder framework is a powerful approach for integrating and exploiting the close relation between molecular interaction networks and functional genomics data.publishedVersio

University of Bergen

NORA - Norwegian Open Research Archives

Controlling false discoveries in Bayesian gene networks with lasso regression p-values

Author: Michoel Tom
Wang Lingfei
Publication venue
Publication date: 24/01/2017
Field of study

Bayesian networks can represent directed gene regulations and therefore are favored over co-expression networks. However, hardly any Bayesian network study concerns the false discovery control (FDC) of network edges, leading to low accuracies due to systematic biases from inconsistent false discovery levels in the same study. We design four empirical tests to examine the FDC of Bayesian networks from three p-value based lasso regression variable selections --- two existing and one we originate. Our method, lassopv, computes p-values for the critical regularization strength at which a predictor starts to contribute to lasso regression. Using null and Geuvadis datasets, we find that lassopv obtains optimal FDC in Bayesian gene networks, whilst existing methods have defective p-values. The FDC concept and tests extend to most network inference scenarios and will guide the design and improvement of new and existing methods. Our novel variable selection method with lasso regression also allows FDC on other datasets and questions, even beyond network inference and computational biology. Lassopv is implemented in R and freely available at https://github.com/lingfeiwang/lassopv and https://cran.r-project.org/package=lassopvComment: 9 pages, 6 figures, 3 tables. Supplementary info: 2 page

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Alignment and integration of complex networks by hypergraph-based spectral clustering

Author: Michoel Tom
Nachtergaele Bruno
Publication venue: 'American Physical Society (APS)'
Publication date: 05/11/2012
Field of study

Complex networks possess a rich, multi-scale structure reflecting the dynamical and functional organization of the systems they model. Often there is a need to analyze multiple networks simultaneously, to model a system by more than one type of interaction or to go beyond simple pairwise interactions, but currently there is a lack of theoretical and computational methods to address these problems. Here we introduce a framework for clustering and community detection in such systems using hypergraph representations. Our main result is a generalization of the Perron-Frobenius theorem from which we derive spectral clustering algorithms for directed and undirected hypergraphs. We illustrate our approach with applications for local and global alignment of protein-protein interaction networks between multiple species, for tripartite community detection in folksonomies, and for detecting clusters of overlapping regulatory pathways in directed networks.Comment: 16 pages, 5 figures; revised version with minor corrections and figures printed in two-column format for better readability; algorithm implementation and supplementary information available at Google code at http://schype.googlecode.co

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer