Structured high-cardinality data arises in many domains, and poses a major
challenge for both modeling and inference. Graphical models are a popular
approach to modeling structured data but they are unsuitable for
high-cardinality variables. The count-min (CM) sketch is a popular approach to
estimating probabilities in high-cardinality data but it does not scale well
beyond a few variables. In this work, we bring together the ideas of graphical
models and count sketches; and propose and analyze several approaches to
estimating probabilities in structured high-cardinality streams of data. The
key idea of our approximations is to use the structure of a graphical model and
approximately estimate its factors by "sketches", which hash high-cardinality
variables using random projections. Our approximations are computationally
efficient and their space complexity is independent of the cardinality of
variables. Our error bounds are multiplicative and significantly improve upon
those of the CM sketch, a state-of-the-art approach to estimating probabilities
in streams. We evaluate our approximations on synthetic and real-world
problems, and report an order of magnitude improvements over the CM sketch.Comment: Proceedings of the European Conference on Machine Learning and
Knowledge Discovery in Database