1,218 research outputs found
Data Cube Approximation and Mining using Probabilistic Modeling
On-line Analytical Processing (OLAP) techniques commonly used in data warehouses allow the exploration of data cubes according to different analysis axes (dimensions) and under different abstraction levels in a dimension hierarchy. However, such techniques are not aimed at mining multidimensional data.
Since data cubes are nothing but multi-way tables, we propose to analyze the potential of two probabilistic modeling techniques, namely non-negative multi-way array factorization and log-linear modeling, with the ultimate objective of compressing and mining aggregate and multidimensional values. With the first technique, we compute the set of components that best fit the initial data set and whose superposition coincides with the original data; with the second technique we identify a parsimonious model (i.e., one with a reduced set of parameters), highlight strong associations among dimensions and discover possible outliers in data cells. A real life example will be
used to (i) discuss the potential benefits of the modeling output on cube exploration and mining, (ii) show how OLAP queries can be answered in an approximate way, and (iii) illustrate the strengths and limitations of these modeling approaches
From Statistical Relational to Neurosymbolic Artificial Intelligence: a Survey
This survey explores the integration of learning and reasoning in two
different fields of artificial intelligence: neurosymbolic and statistical
relational artificial intelligence. Neurosymbolic artificial intelligence
(NeSy) studies the integration of symbolic reasoning and neural networks, while
statistical relational artificial intelligence (StarAI) focuses on integrating
logic with probabilistic graphical models. This survey identifies seven shared
dimensions between these two subfields of AI. These dimensions can be used to
characterize different NeSy and StarAI systems. They are concerned with (1) the
approach to logical inference, whether model or proof-based; (2) the syntax of
the used logical theories; (3) the logical semantics of the systems and their
extensions to facilitate learning; (4) the scope of learning, encompassing
either parameter or structure learning; (5) the presence of symbolic and
subsymbolic representations; (6) the degree to which systems capture the
original logic, probabilistic, and neural paradigms; and (7) the classes of
learning tasks the systems are applied to. By positioning various NeSy and
StarAI systems along these dimensions and pointing out similarities and
differences between them, this survey contributes fundamental concepts for
understanding the integration of learning and reasoning.Comment: To appear in Artificial Intelligence. Shorter version at IJCAI 2020
survey track, https://www.ijcai.org/proceedings/2020/0688.pd
Computing the decomposable entropy of belief-function graphical models
In 2018, Jiroušek and Shenoy proposed a definition of entropy for Dempster-Shafer (D-S) belief functions called decomposable entropy (d-entropy). This paper provides an algorithm for computing the d-entropy of directed graphical D-S belief function models. We illustrate the algorithm using Almond's Captain's Problem example. For belief function undirected graphical models, assuming that the set of belief functions in the model is non-informative, the belief functions are distinct. We illustrate this using Haenni-Lehmann's Communication Network problem. As the joint belief function for this model is quasi-consonant, it follows from a property of d-entropy that the d-entropy of this model is zero, and no algorithm is required. For a class of undirected graphical models, we provide an algorithm for computing the d-entropy of such models. Finally, the d-entropy coincides with Shannon's entropy for the probability mass function of a single random variable and for a large multi-dimensional probability distribution expressed as a directed acyclic graph model called a Bayesian network. We illustrate this using Lauritzen-Spiegelhalter's Chest Clinic example represented as a belief-function directed graphical model
Entropy of regular timed languages
For timed languages, we define size measures: volume for languages with a fixed finite number of events, and entropy (growth rate) as asymptotic measure for an unbounded number of events. These measures can be used for quantitative comparison of languages, and the entropy can be viewed as information contents of a timed language. For languages accepted by deterministic timed automata, we give exact formulas for volumes. We show that automata with non-vanishing entropy ("thick") have a normal (non-Zeno, discretizable etc.) behavior for typical runs. Next, we characterize the entropy, using methods of functional analysis, as the logarithm of the leading eigenvalue (spectral radius) of a positive integral operator. We devise a couple of methods to compute the entropy: a symbolical one for so-called "1 1 ⁄2-clock" automata, and a numerical one (with a guarantee of convergence)
- …