153 research outputs found
On the Complexity and Approximation of Binary Evidence in Lifted Inference
Lifted inference algorithms exploit symmetries in probabilistic models to
speed up inference. They show impressive performance when calculating
unconditional probabilities in relational models, but often resort to
non-lifted inference when computing conditional probabilities. The reason is
that conditioning on evidence breaks many of the model's symmetries, which can
preempt standard lifting techniques. Recent theoretical results show, for
example, that conditioning on evidence which corresponds to binary relations is
#P-hard, suggesting that no lifting is to be expected in the worst case. In
this paper, we balance this negative result by identifying the Boolean rank of
the evidence as a key parameter for characterizing the complexity of
conditioning in lifted inference. In particular, we show that conditioning on
binary evidence with bounded Boolean rank is efficient. This opens up the
possibility of approximating evidence by a low-rank Boolean matrix
factorization, which we investigate both theoretically and empirically.Comment: To appear in Advances in Neural Information Processing Systems 26
(NIPS), Lake Tahoe, USA, December 201
Discovering Coherent Biclusters from Gene Expression Data Using Zero-Suppressed Binary Decision Diagrams
The biclustering method can be a very useful analysis tool when some genes have multiple functions and experimental conditions are diverse in gene expression measurement. This is because the biclustering approach, in contrast to the conventional clustering techniques, focuses on finding a subset of the genes and a subset of the experimental conditions that together exhibit coherent behavior. However, the biclustering problem is inherently intractable, and it is often computationally costly to find biclusters with high levels of coherence. In this work, we propose a novel biclustering algorithm that exploits the zero-suppressed binary decision diagrams (ZBDDs) data structure to cope with the computational challenges. Our method can find all biclusters that satisfy specific input conditions, and it is scalable to practical gene expression data. We also present experimental results confirming the effectiveness of our approach
Multi-layered model of individual HIV infection progression and mechanisms of phenotypical expression
Cite as: Perrin, Dimitri (2008) Multi-layered model of individual HIV infection progression and mechanisms of phenotypical expression. PhD thesis, Dublin City University
Compositional Mining of Multi-Relational Biological Datasets
High-throughput biological screens are yielding ever-growing streams of
information about multiple aspects of cellular activity. As more and more
categories of datasets come online, there is a corresponding multitude of ways
in which inferences can be chained across them, motivating the need for
compositional data mining algorithms. In this paper, we argue that such
compositional data mining can be effectively realized by functionally cascading
redescription mining and biclustering algorithms as primitives. Both these
primitives mirror shifts of vocabulary that can be composed in arbitrary ways
to create rich chains of inferences. Given a relational database and its
schema, we show how the schema can be automatically compiled into a
compositional data mining program, and how different domains in the schema can
be related through logical sequences of biclustering and redescription
invocations. This feature allows us to rapidly prototype new data mining
applications, yielding greater understanding of scientific datasets. We
describe two applications of compositional data mining: (i) matching terms
across categories of the Gene Ontology and (ii) understanding the molecular
mechanisms underlying stress response in human cells
- …