2,644 research outputs found
Expectation Propagation for Approximate Inference: Free Probability Framework
We study asymptotic properties of expectation propagation (EP) -- a method
for approximate inference originally developed in the field of machine
learning. Applied to generalized linear models, EP iteratively computes a
multivariate Gaussian approximation to the exact posterior distribution. The
computational complexity of the repeated update of covariance matrices severely
limits the application of EP to large problem sizes. In this study, we present
a rigorous analysis by means of free probability theory that allows us to
overcome this computational bottleneck if specific data matrices in the problem
fulfill certain properties of asymptotic freeness. We demonstrate the relevance
of our approach on the gene selection problem of a microarray dataset.Comment: Both authors are co-first authors. The main body of this paper is
accepted for publication in the proceedings of the 2018 IEEE International
Symposium on Information Theory (ISIT
Factored expectation propagation for input-output FHMM models in systems biology
We consider the problem of joint modelling of metabolic signals and gene
expression in systems biology applications. We propose an approach based on
input-output factorial hidden Markov models and propose a structured
variational inference approach to infer the structure and states of the model.
We start from the classical free form structured variational mean field
approach and use a expectation propagation to approximate the expectations
needed in the variational loop. We show that this corresponds to a factored
expectation constrained approximate inference. We validate our model through
extensive simulations and demonstrate its applicability on a real world
bacterial data set
ProbCD: enrichment analysis accounting for categorization uncertainty
As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of efforts to create probabilistic annotations, especially in the Gene Ontology context, or to deal with uncertainty in high throughput-based datasets, current enrichment methods largely ignore this probabilistic information since they are mainly based on variants of the Fisher Exact Test. We developed an open-source R package to deal with probabilistic categorical data analysis, ProbCD, that does not require a static contingency table. The contingency table for
the enrichment problem is built using the expectation of a Bernoulli Scheme stochastic process given the categorization probabilities. An on-line interface was created to allow usage by non-programmers and is available at: http://xerad.systemsbiology.net/ProbCD/. We present an analysis framework and software tools to address the issue of uncertainty in categorical data analysis. In particular, concerning the enrichment analysis, ProbCD can accommodate: (i) the stochastic nature of the high-throughput experimental techniques and (ii) probabilistic gene annotation
Learning a Hybrid Architecture for Sequence Regression and Annotation
When learning a hidden Markov model (HMM), sequen- tial observations can
often be complemented by real-valued summary response variables generated from
the path of hid- den states. Such settings arise in numerous domains, includ-
ing many applications in biology, like motif discovery and genome annotation.
In this paper, we present a flexible frame- work for jointly modeling both
latent sequence features and the functional mapping that relates the summary
response variables to the hidden state sequence. The algorithm is com- patible
with a rich set of mapping functions. Results show that the availability of
additional continuous response vari- ables can simultaneously improve the
annotation of the se- quential observations and yield good prediction
performance in both synthetic data and real-world datasets.Comment: AAAI 201
Large-scale inference and graph theoretical analysis of gene-regulatory networks in B. stubtilis
We present the methods and results of a two-stage modeling process that
generates candidate gene-regulatory networks of the bacterium B. subtilis from
experimentally obtained, yet mathematically underdetermined microchip array
data. By employing a computational, linear correlative procedure to generate
these networks, and by analyzing the networks from a graph theoretical
perspective, we are able to verify the biological viability of our inferred
networks, and we demonstrate that our networks' graph theoretical properties
are remarkably similar to those of other biological systems. In addition, by
comparing our inferred networks to those of a previous, noisier implementation
of the linear inference process [17], we are able to identify trends in graph
theoretical behavior that occur both in our networks as well as in their
perturbed counterparts. These commonalities in behavior at multiple levels of
complexity allow us to ascertain the level of complexity to which our process
is robust to noise.Comment: 22 pages, 4 figures, accepted for publication in Physica A (2006
Clustering Algorithms: Their Application to Gene Expression Data
Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and iden-tify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure
Hierarchic Bayesian models for kernel learning
The integration of diverse forms of informative data by learning an optimal combination of base kernels in classification or regression problems can provide enhanced performance when compared to that obtained from any single data source. We present a Bayesian hierarchical model which enables kernel learning and present effective variational Bayes estimators for regression and classification. Illustrative experiments demonstrate the utility of the proposed method
- …