24 research outputs found

    Integrity Constraints Revisited: From Exact to Approximate Implication

    Get PDF
    Integrity constraints such as functional dependencies (FD), and multi-valued dependencies (MVD) are fundamental in database schema design. Likewise, probabilistic conditional independences (CI) are crucial for reasoning about multivariate probability distributions. The implication problem studies whether a set of constraints (antecedents) implies another constraint (consequent), and has been investigated in both the database and the AI literature, under the assumption that all constraints hold exactly. However, many applications today consider constraints that hold only approximately. In this paper we define an approximate implication as a linear inequality between the degree of satisfaction of the antecedents and consequent, and we study the relaxation problem: when does an exact implication relax to an approximate implication? We use information theory to define the degree of satisfaction, and prove several results. First, we show that any implication from a set of data dependencies (MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most quadratic in the number of variables; when the consequent is an FD, the factor can be reduced to 1. Second, we prove that there exists an implication between CIs that does not admit any relaxation; however, we prove that every implication between CIs relaxes "in the limit". Finally, we show that the implication problem for differential constraints in market basket analysis also admits a relaxation with a factor equal to 1. Our results recover, and sometimes extend, several previously known results about the implication problem: implication of MVDs can be checked by considering only 2-tuple relations, and the implication of differential constraints for frequent item sets can be checked by considering only databases containing a single transaction

    Integrity Constraints Revisited: From Exact to Approximate Implication

    Full text link
    Integrity constraints such as functional dependencies (FD), and multi-valued dependencies (MVD) are fundamental in database schema design. Likewise, probabilistic conditional independences (CI) are crucial for reasoning about multivariate probability distributions. The implication problem studies whether a set of constraints (antecedents) implies another constraint (consequent), and has been investigated in both the database and the AI literature, under the assumption that all constraints hold exactly. However, many applications today consider constraints that hold only approximately. In this paper we define an approximate implication as a linear inequality between the degree of satisfaction of the antecedents and consequent, and we study the relaxation problem: when does an exact implication relax to an approximate implication? We use information theory to define the degree of satisfaction, and prove several results. First, we show that any implication from a set of data dependencies (MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most quadratic in the number of variables; when the consequent is an FD, the factor can be reduced to 1. Second, we prove that there exists an implication between CIs that does not admit any relaxation; however, we prove that every implication between CIs relaxes "in the limit". Finally, we show that the implication problem for differential constraints in market basket analysis also admits a relaxation with a factor equal to 1. Our results recover, and sometimes extend, several previously known results about the implication problem: implication of MVDs can be checked by considering only 2-tuple relations, and the implication of differential constraints for frequent item sets can be checked by considering only databases containing a single transaction

    Infinite Shannon entropy

    Full text link
    Even if a probability distribution is properly normalizable, its associated Shannon (or von Neumann) entropy can easily be infinite. We carefully analyze conditions under which this phenomenon can occur. Roughly speaking, this happens when arbitrarily small amounts of probability are dispersed into an infinite number of states; we shall quantify this observation and make it precise. We develop several particularly simple, elementary, and useful bounds, and also provide some asymptotic estimates, leading to necessary and sufficient conditions for the occurrence of infinite Shannon entropy. We go to some effort to keep technical computations as simple and conceptually clear as possible. In particular, we shall see that large entropies cannot be localized in state space; large entropies can only be supported on an exponentially large number of states. We are for the time being interested in single-channel Shannon entropy in the information theoretic sense, not entropy in a stochastic field theory or QFT defined over some configuration space, on the grounds that this simple problem is a necessary precursor to understanding infinite entropy in a field theoretic context.Comment: 13 pages; V2: 4 references adde

    Abstraction in decision-makers with limited information processing capabilities

    Full text link
    A distinctive property of human and animal intelligence is the ability to form abstractions by neglecting irrelevant information which allows to separate structure from noise. From an information theoretic point of view abstractions are desirable because they allow for very efficient information processing. In artificial systems abstractions are often implemented through computationally costly formations of groups or clusters. In this work we establish the relation between the free-energy framework for decision making and rate-distortion theory and demonstrate how the application of rate-distortion for decision-making leads to the emergence of abstractions. We argue that abstractions are induced due to a limit in information processing capacity.Comment: Presented at the NIPS 2013 Workshop on Planning with Information Constraint

    Refined Coding Bounds and Code Constructions for Coherent Network Error Correction

    Full text link
    Coherent network error correction is the error-control problem in network coding with the knowledge of the network codes at the source and sink nodes. With respect to a given set of local encoding kernels defining a linear network code, we obtain refined versions of the Hamming bound, the Singleton bound and the Gilbert-Varshamov bound for coherent network error correction. Similar to its classical counterpart, this refined Singleton bound is tight for linear network codes. The tightness of this refined bound is shown by two construction algorithms of linear network codes achieving this bound. These two algorithms illustrate different design methods: one makes use of existing network coding algorithms for error-free transmission and the other makes use of classical error-correcting codes. The implication of the tightness of the refined Singleton bound is that the sink nodes with higher maximum flow values can have higher error correction capabilities.Comment: 32 page

    Regions, innovation systems, and the North-South divide in Italy

    Get PDF
    Innovation systems are not bound by administrative or political boundaries. Using information theory, we measure innovation-systemness as synergy among size-classes, postal addresses, and technological classes (NACE-codes) of firm-level data collected by Statistics Italy at different scales. Italy is organized in twenty regions, but there is also a traditional divide between the North and the South of the country. At which levels is how much innovation-systemness indicated? The greatest synergy is retrieved by considering the country in terms of Northern and Southern Italy as two sub-systems, with Tuscany included as part of Northern Italy. We suggest that separate innovation strategies could be developed for these two parts of the country. The current focus on regions for innovation policies may to some extent be an artifact of the statistics and EU policies. In terms of sectors, both medium- and high-tech manufacturing (MHTM) and knowledge-intensive services (KIS) are proportionally integrated in the various region
    corecore