    Integrity Constraints Revisited: From Exact to Approximate Implication

    Integrity constraints such as functional dependencies (FD), and multi-valued dependencies (MVD) are fundamental in database schema design. Likewise, probabilistic conditional independences (CI) are crucial for reasoning about multivariate probability distributions. The implication problem studies whether a set of constraints (antecedents) implies another constraint (consequent), and has been investigated in both the database and the AI literature, under the assumption that all constraints hold exactly. However, many applications today consider constraints that hold only approximately. In this paper we define an approximate implication as a linear inequality between the degree of satisfaction of the antecedents and consequent, and we study the relaxation problem: when does an exact implication relax to an approximate implication? We use information theory to define the degree of satisfaction, and prove several results. First, we show that any implication from a set of data dependencies (MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most quadratic in the number of variables; when the consequent is an FD, the factor can be reduced to 1. Second, we prove that there exists an implication between CIs that does not admit any relaxation; however, we prove that every implication between CIs relaxes "in the limit". Finally, we show that the implication problem for differential constraints in market basket analysis also admits a relaxation with a factor equal to 1. Our results recover, and sometimes extend, several previously known results about the implication problem: implication of MVDs can be checked by considering only 2-tuple relations, and the implication of differential constraints for frequent item sets can be checked by considering only databases containing a single transaction

    Polynomial Interpretations over the Natural, Rational and Real Numbers Revisited

    Polynomial interpretations are a useful technique for proving termination of term rewrite systems. They come in various flavors: polynomial interpretations with real, rational and integer coefficients. As to their relationship with respect to termination proving power, Lucas managed to prove in 2006 that there are rewrite systems that can be shown polynomially terminating by polynomial interpretations with real (algebraic) coefficients, but cannot be shown polynomially terminating using polynomials with rational coefficients only. He also proved the corresponding statement regarding the use of rational coefficients versus integer coefficients. In this article we extend these results, thereby giving the full picture of the relationship between the aforementioned variants of polynomial interpretations. In particular, we show that polynomial interpretations with real or rational coefficients do not subsume polynomial interpretations with integer coefficients. Our results hold also for incremental termination proofs with polynomial interpretations.Comment: 28 pages; special issue of RTA 201

    Beyond the Cut-Set Bound: Uncertainty Computations in Network Coding with Correlated Sources

    Cut-set bounds on achievable rates for network communication protocols are not in general tight. In this paper we introduce a new technique for proving converses for the problem of transmission of correlated sources in networks, that results in bounds that are tighter than the corresponding cut-set bounds. We also define the concept of "uncertainty region" which might be of independent interest. We provide a full characterization of this region for the case of two correlated random variables. The bounding technique works as follows: on one hand we show that if the communication problem is solvable, the uncertainty of certain random variables in the network with respect to imaginary parties that have partial knowledge of the sources must satisfy some constraints that depend on the network architecture. On the other hand, the same uncertainties have to satisfy constraints that only depend on the joint distribution of the sources. Matching these two leads to restrictions on the statistical joint distribution of the sources in communication problems that are solvable over a given network architecture.Comment: 12 pages, A short version appears in ISIT 201

    On the Universality of the Logistic Loss Function

    A loss function measures the discrepancy between the true values (observations) and their estimated fits, for a given instance of data. A loss function is said to be proper (unbiased, Fisher consistent) if the fits are defined over a unit simplex, and the minimizer of the expected loss is the true underlying probability of the data. Typical examples are the zero-one loss, the quadratic loss and the Bernoulli log-likelihood loss (log-loss). In this work we show that for binary classification problems, the divergence associated with smooth, proper and convex loss functions is bounded from above by the Kullback-Leibler (KL) divergence, up to a multiplicative normalization constant. It implies that by minimizing the log-loss (associated with the KL divergence), we minimize an upper bound to any choice of loss functions from this set. This property justifies the broad use of log-loss in regression, decision trees, deep neural networks and many other applications. In addition, we show that the KL divergence bounds from above any separable Bregman divergence that is convex in its second argument (up to a multiplicative normalization constant). This result introduces a new set of divergence inequalities, similar to the well-known Pinsker inequality

    The complexity of finite-valued CSPs

    We study the computational complexity of exact minimisation of rational-valued discrete functions. Let Γ\Gamma be a set of rational-valued functions on a fixed finite domain; such a set is called a finite-valued constraint language. The valued constraint satisfaction problem, VCSP(Γ)\operatorname{VCSP}(\Gamma), is the problem of minimising a function given as a sum of functions from Γ\Gamma. We establish a dichotomy theorem with respect to exact solvability for all finite-valued constraint languages defined on domains of arbitrary finite size. We show that every constraint language Γ\Gamma either admits a binary symmetric fractional polymorphism in which case the basic linear programming relaxation solves any instance of VCSP(Γ)\operatorname{VCSP}(\Gamma) exactly, or Γ\Gamma satisfies a simple hardness condition that allows for a polynomial-time reduction from Max-Cut to VCSP(Γ)\operatorname{VCSP}(\Gamma)