24 research outputs found
Integrity Constraints Revisited: From Exact to Approximate Implication
Integrity constraints such as functional dependencies (FD), and multi-valued dependencies (MVD) are fundamental in database schema design. Likewise, probabilistic conditional independences (CI) are crucial for reasoning about multivariate probability distributions. The implication problem studies whether a set of constraints (antecedents) implies another constraint (consequent), and has been investigated in both the database and the AI literature, under the assumption that all constraints hold exactly. However, many applications today consider constraints that hold only approximately. In this paper we define an approximate implication as a linear inequality between the degree of satisfaction of the antecedents and consequent, and we study the relaxation problem: when does an exact implication relax to an approximate implication? We use information theory to define the degree of satisfaction, and prove several results. First, we show that any implication from a set of data dependencies (MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most quadratic in the number of variables; when the consequent is an FD, the factor can be reduced to 1. Second, we prove that there exists an implication between CIs that does not admit any relaxation; however, we prove that every implication between CIs relaxes "in the limit". Finally, we show that the implication problem for differential constraints in market basket analysis also admits a relaxation with a factor equal to 1. Our results recover, and sometimes extend, several previously known results about the implication problem: implication of MVDs can be checked by considering only 2-tuple relations, and the implication of differential constraints for frequent item sets can be checked by considering only databases containing a single transaction
Integrity Constraints Revisited: From Exact to Approximate Implication
Integrity constraints such as functional dependencies (FD), and multi-valued
dependencies (MVD) are fundamental in database schema design. Likewise,
probabilistic conditional independences (CI) are crucial for reasoning about
multivariate probability distributions. The implication problem studies whether
a set of constraints (antecedents) implies another constraint (consequent), and
has been investigated in both the database and the AI literature, under the
assumption that all constraints hold exactly. However, many applications today
consider constraints that hold only approximately. In this paper we define an
approximate implication as a linear inequality between the degree of
satisfaction of the antecedents and consequent, and we study the relaxation
problem: when does an exact implication relax to an approximate implication? We
use information theory to define the degree of satisfaction, and prove several
results. First, we show that any implication from a set of data dependencies
(MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most
quadratic in the number of variables; when the consequent is an FD, the factor
can be reduced to 1. Second, we prove that there exists an implication between
CIs that does not admit any relaxation; however, we prove that every
implication between CIs relaxes "in the limit". Finally, we show that the
implication problem for differential constraints in market basket analysis also
admits a relaxation with a factor equal to 1. Our results recover, and
sometimes extend, several previously known results about the implication
problem: implication of MVDs can be checked by considering only 2-tuple
relations, and the implication of differential constraints for frequent item
sets can be checked by considering only databases containing a single
transaction
Infinite Shannon entropy
Even if a probability distribution is properly normalizable, its associated
Shannon (or von Neumann) entropy can easily be infinite. We carefully analyze
conditions under which this phenomenon can occur. Roughly speaking, this
happens when arbitrarily small amounts of probability are dispersed into an
infinite number of states; we shall quantify this observation and make it
precise. We develop several particularly simple, elementary, and useful bounds,
and also provide some asymptotic estimates, leading to necessary and sufficient
conditions for the occurrence of infinite Shannon entropy. We go to some effort
to keep technical computations as simple and conceptually clear as possible. In
particular, we shall see that large entropies cannot be localized in state
space; large entropies can only be supported on an exponentially large number
of states. We are for the time being interested in single-channel Shannon
entropy in the information theoretic sense, not entropy in a stochastic field
theory or QFT defined over some configuration space, on the grounds that this
simple problem is a necessary precursor to understanding infinite entropy in a
field theoretic context.Comment: 13 pages; V2: 4 references adde
Abstraction in decision-makers with limited information processing capabilities
A distinctive property of human and animal intelligence is the ability to
form abstractions by neglecting irrelevant information which allows to separate
structure from noise. From an information theoretic point of view abstractions
are desirable because they allow for very efficient information processing. In
artificial systems abstractions are often implemented through computationally
costly formations of groups or clusters. In this work we establish the relation
between the free-energy framework for decision making and rate-distortion
theory and demonstrate how the application of rate-distortion for
decision-making leads to the emergence of abstractions. We argue that
abstractions are induced due to a limit in information processing capacity.Comment: Presented at the NIPS 2013 Workshop on Planning with Information
Constraint
Refined Coding Bounds and Code Constructions for Coherent Network Error Correction
Coherent network error correction is the error-control problem in network
coding with the knowledge of the network codes at the source and sink nodes.
With respect to a given set of local encoding kernels defining a linear network
code, we obtain refined versions of the Hamming bound, the Singleton bound and
the Gilbert-Varshamov bound for coherent network error correction. Similar to
its classical counterpart, this refined Singleton bound is tight for linear
network codes. The tightness of this refined bound is shown by two construction
algorithms of linear network codes achieving this bound. These two algorithms
illustrate different design methods: one makes use of existing network coding
algorithms for error-free transmission and the other makes use of classical
error-correcting codes. The implication of the tightness of the refined
Singleton bound is that the sink nodes with higher maximum flow values can have
higher error correction capabilities.Comment: 32 page
Regions, innovation systems, and the North-South divide in Italy
Innovation systems are not bound by administrative or political boundaries. Using information theory, we measure innovation-systemness as synergy among size-classes, postal addresses, and technological classes (NACE-codes) of firm-level data collected by Statistics Italy at different scales. Italy is organized in twenty regions, but there is also a traditional divide between the North and the South of the country. At which levels is how much innovation-systemness indicated? The greatest synergy is retrieved by considering the country in terms of Northern and Southern Italy as two sub-systems, with Tuscany included as part of Northern Italy. We suggest that separate innovation strategies could be developed for these two parts of the country. The current focus on regions for innovation policies may to some extent be an artifact of the statistics and EU policies. In terms of sectors, both medium- and high-tech manufacturing (MHTM) and knowledge-intensive services (KIS) are proportionally integrated in the various region