10,616 research outputs found
The role of assumptions in causal discovery
The paper looks at the conditional independence search approach to causal discovery, proposed by Spirtes et al. and Pearl and Verma, from the point of view of the mechanism-based view of causality in econometrics, explicated by Simon. As demonstrated by Simon, the problem of determining the causal structure from data is severely underconstrained and the perceived causal structure depends on the a priori assumptions that one is willing to make. I discuss the assumptions made in the independence search-based causal discovery and their identifying strength
Token-based typology and word order entropy: A study based on universal dependencies
The present paper discusses the benefits and challenges of token-based typology, which takes into account the frequencies of words and constructions in language use. This approach makes it possible to introduce new criteria for language classification, which would be difficult or impossible to achieve with the traditional, type-based approach. This point is illustrated by several quantitative studies of word order variation, which can be measured as entropy at different levels of granularity. I argue that this variation can be explained by general functional mechanisms and pressures, which manifest themselves in language use, such as optimization of processing (including avoidance of ambiguity) and grammaticalization of predictable units occurring in chunks. The case studies are based on multilingual corpora, which have been parsed using the Universal Dependencies annotation scheme
Integrity Constraints Revisited: From Exact to Approximate Implication
Integrity constraints such as functional dependencies (FD), and multi-valued
dependencies (MVD) are fundamental in database schema design. Likewise,
probabilistic conditional independences (CI) are crucial for reasoning about
multivariate probability distributions. The implication problem studies whether
a set of constraints (antecedents) implies another constraint (consequent), and
has been investigated in both the database and the AI literature, under the
assumption that all constraints hold exactly. However, many applications today
consider constraints that hold only approximately. In this paper we define an
approximate implication as a linear inequality between the degree of
satisfaction of the antecedents and consequent, and we study the relaxation
problem: when does an exact implication relax to an approximate implication? We
use information theory to define the degree of satisfaction, and prove several
results. First, we show that any implication from a set of data dependencies
(MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most
quadratic in the number of variables; when the consequent is an FD, the factor
can be reduced to 1. Second, we prove that there exists an implication between
CIs that does not admit any relaxation; however, we prove that every
implication between CIs relaxes "in the limit". Finally, we show that the
implication problem for differential constraints in market basket analysis also
admits a relaxation with a factor equal to 1. Our results recover, and
sometimes extend, several previously known results about the implication
problem: implication of MVDs can be checked by considering only 2-tuple
relations, and the implication of differential constraints for frequent item
sets can be checked by considering only databases containing a single
transaction
Assessing evidence and testing appropriate hypotheses
It is crucial to identify the most appropriate hypotheses if one is to apply probabilistic reasoning to evaluate and properly understand the impact of evidence. Subtle changes to the choice of a prosecution hypothesis can result in drastically different posterior probabilities to a defence hypothesis from the same evidence. To illustrate the problem we consider a real case in which probabilistic arguments assumed that the prosecution hypothesis âboth babies were murderedâ was the appropriate alternative to the defence hypothesis âboth babies died of Sudden Infant Death Syndrome (SIDS)â. Since it would have been sufficient for the prosecution to establish just one murder, a more appropriate alternative hypothesis was âat least one baby was murderedâ. Based on the same assumptions used by one of the probability experts who examined the case, the prior odds in favour of the defence hypothesis over the double murder hypothesis are 30 to 1. However, the prior odds in favour of the defence hypothesis over the alternative âat least one murderâ hypothesis are only 5 to 2. Assuming that the medical and other evidence has a likelihood ratio of 5 in favour of the prosecution hypothesis results in very different conclusions about the posterior probability of the defence hypothesis
- âŠ