10,616 research outputs found

    The role of assumptions in causal discovery

    Get PDF
    The paper looks at the conditional independence search approach to causal discovery, proposed by Spirtes et al. and Pearl and Verma, from the point of view of the mechanism-based view of causality in econometrics, explicated by Simon. As demonstrated by Simon, the problem of determining the causal structure from data is severely underconstrained and the perceived causal structure depends on the a priori assumptions that one is willing to make. I discuss the assumptions made in the independence search-based causal discovery and their identifying strength

    Token-based typology and word order entropy: A study based on universal dependencies

    No full text
    The present paper discusses the benefits and challenges of token-based typology, which takes into account the frequencies of words and constructions in language use. This approach makes it possible to introduce new criteria for language classification, which would be difficult or impossible to achieve with the traditional, type-based approach. This point is illustrated by several quantitative studies of word order variation, which can be measured as entropy at different levels of granularity. I argue that this variation can be explained by general functional mechanisms and pressures, which manifest themselves in language use, such as optimization of processing (including avoidance of ambiguity) and grammaticalization of predictable units occurring in chunks. The case studies are based on multilingual corpora, which have been parsed using the Universal Dependencies annotation scheme

    Integrity Constraints Revisited: From Exact to Approximate Implication

    Full text link
    Integrity constraints such as functional dependencies (FD), and multi-valued dependencies (MVD) are fundamental in database schema design. Likewise, probabilistic conditional independences (CI) are crucial for reasoning about multivariate probability distributions. The implication problem studies whether a set of constraints (antecedents) implies another constraint (consequent), and has been investigated in both the database and the AI literature, under the assumption that all constraints hold exactly. However, many applications today consider constraints that hold only approximately. In this paper we define an approximate implication as a linear inequality between the degree of satisfaction of the antecedents and consequent, and we study the relaxation problem: when does an exact implication relax to an approximate implication? We use information theory to define the degree of satisfaction, and prove several results. First, we show that any implication from a set of data dependencies (MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most quadratic in the number of variables; when the consequent is an FD, the factor can be reduced to 1. Second, we prove that there exists an implication between CIs that does not admit any relaxation; however, we prove that every implication between CIs relaxes "in the limit". Finally, we show that the implication problem for differential constraints in market basket analysis also admits a relaxation with a factor equal to 1. Our results recover, and sometimes extend, several previously known results about the implication problem: implication of MVDs can be checked by considering only 2-tuple relations, and the implication of differential constraints for frequent item sets can be checked by considering only databases containing a single transaction

    Assessing evidence and testing appropriate hypotheses

    Get PDF
    It is crucial to identify the most appropriate hypotheses if one is to apply probabilistic reasoning to evaluate and properly understand the impact of evidence. Subtle changes to the choice of a prosecution hypothesis can result in drastically different posterior probabilities to a defence hypothesis from the same evidence. To illustrate the problem we consider a real case in which probabilistic arguments assumed that the prosecution hypothesis “both babies were murdered” was the appropriate alternative to the defence hypothesis “both babies died of Sudden Infant Death Syndrome (SIDS)”. Since it would have been sufficient for the prosecution to establish just one murder, a more appropriate alternative hypothesis was “at least one baby was murdered”. Based on the same assumptions used by one of the probability experts who examined the case, the prior odds in favour of the defence hypothesis over the double murder hypothesis are 30 to 1. However, the prior odds in favour of the defence hypothesis over the alternative ‘at least one murder’ hypothesis are only 5 to 2. Assuming that the medical and other evidence has a likelihood ratio of 5 in favour of the prosecution hypothesis results in very different conclusions about the posterior probability of the defence hypothesis
    • 

    corecore